## 统计代写|贝叶斯分析代写Bayesian Analysis代考|Multinomial model for categorical data

The binomial distribution that was emphasized in Chapter 2 can be generalized to allow more than two possible outcomes. The multinomial sampling distribution is used to describe data for which each observation is one of $k$ possible outcomes. If $y$ is the vector of counts of the number of observations of each outcome, then
$$p(y \mid \theta) \propto \prod_{j=1}^k \theta_j^{y_j},$$
where the sum of the probabilities, $\sum_{j=1}^k \theta_j$, is 1 . The distribution is typically thought of as implicitly conditioning on the number of observations, $\sum_{j=1}^k y_j=n$. The conjugate prior distribution is a multivariate generalization of the beta distribution known as the Dirichlet,
$$p(\theta \mid \alpha) \propto \prod_{j=1}^k \theta_j^{\alpha_j-1},$$
where the distribution is restricted to nonnegative $\theta_j$ ‘s with $\sum_{j=1}^k \theta_j=1$; see Appendix A for details. The resulting posterior distribution for the $\theta_j$ ‘s is Dirichlet with parameters $\alpha_j+y_j$.

The prior distribution is mathematically equivalent to a likelihood resulting from $\sum_{j=1}^k \alpha_j$ observations with $\alpha_j$ observations of the $j$ th outcome category. As in the binomial there are several plausible noninformative Dirichlet prior distributions. A uniform density is obtained by setting $\alpha_j=1$ for all $j$; this distribution assigns equal density to any vector $\theta$ satisfying $\sum_{j=1}^k \theta_j=1$. Setting $\alpha_j=0$ for all $j$ results in an improper prior distribution that is uniform in the $\log \left(\theta_j\right)$ ‘s. The resulting posterior distribution is proper if there is at least one observation in each of the $k$ categories, so that each component of $y$ is positive. The bibliographic note at the end of this chapter points to other suggested noninformative prior distributions for the multinomial model.

## 统计代写|贝叶斯分析代写Bayesian Analysis代考|Multivariate normal model with known variance

The basic model to be discussed concerns an observable vector $y$ of $d$ components, with the multivariate normal distribution,
$$y \mid \mu, \Sigma \sim \mathrm{N}(\mu, \Sigma)$$
where $\mu$ is a (column) vector of length $d$ and $\Sigma$ is a $d \times d$ variance matrix, which is symmetric and positive definite. The likelihood function for a single observation is
$$p(y \mid \mu, \Sigma) \propto|\Sigma|^{-1 / 2} \exp \left(-\frac{1}{2}(y-\mu)^T \Sigma^{-1}(y-\mu)\right),$$
and for a sample of $n$ independent and identically distributed observations, $y_1, \ldots, y_n$, is
\begin{aligned} p\left(y_1, \ldots, y_n \mid \mu, \Sigma\right) & \propto|\Sigma|^{-n / 2} \exp \left(-\frac{1}{2} \sum_{i=1}^n\left(y_i-\mu\right)^T \Sigma^{-1}\left(y_i-\mu\right)\right) \ & =|\Sigma|^{-n / 2} \exp \left(-\frac{1}{2} \operatorname{tr}\left(\Sigma^{-1} S_0\right)\right), \end{aligned}
where $S_0$ is the matrix of ‘sums of squares’ relative to $\mu$,
$$S_0=\sum_{i=1}^n\left(y_i-\mu\right)\left(y_i-\mu\right)^T$$

