## 统计代写|主成分分析代写Principal Component Analysis代考|Incomplete PPCA by Expectation Maximization

In this section, we derive an EM algorithm (see Appendix B.2.1) for solving the PPCA problem with missing data. Recall from Section $2.2$ that in the PPCA model, each data point is drawn as $x \sim \mathcal{N}\left(\mu_{x}, \Sigma_{x}\right)$, where $\mu_{x}=\mu$ and $\Sigma_{x}=B B^{\top}+\sigma^{2} I_{D}$, where $\mu \in \mathbb{R}^{D}, B \in \mathbb{R}^{D \times d}$, and $\sigma>0$. Recall also from (2.56) that the log-likelihood of the PPCA model is given by
$$\mathscr{L}=-\frac{N D}{2} \log (2 \pi)-\frac{N}{2} \log \operatorname{det}\left(\Sigma_{x}\right)-\frac{1}{2} \sum_{j=1}^{N} \operatorname{trace}\left(\Sigma_{x}^{-1}\left(x_{j}-\mu\right)\left(x_{j}-\mu\right)^{\top}\right),$$
where $\left{x_{j}\right}_{j=1}^{N}$ are $N$ i.i.d. samples of $\boldsymbol{x}$. Since the samples are incomplete, we can partition each point $x$ and the parameters $\mu_{x}$ and $\Sigma_{x}$ as
$$\left[\begin{array}{l} x_{U} \ x_{O} \end{array}\right]=P x, \quad\left[\begin{array}{l} \mu_{U} \ \mu_{O} \end{array}\right]=P \mu, \quad \text { and }\left[\begin{array}{cc} \Sigma_{U U} & \Sigma_{U O} \ \Sigma_{O U} & \Sigma_{O O} \end{array}\right]=P \Sigma_{x} P^{\top} .$$
Here $\boldsymbol{x}{O}$ is the observed part of $\boldsymbol{x}, \boldsymbol{x}{U}$ is the unobserved part of $\boldsymbol{x}$, and $P$ is any permutation matrix that reorders the entries of $\boldsymbol{x}$ so that the unobserved entries appear first. Notice that $P$ is not unique, but we can use any such $P$. Notice also that the above partition of $\boldsymbol{x}, \boldsymbol{\mu}{x}$, and $\Sigma{x}$ could be different for each data point, because the missing entries could be different for different data points. When strictly necessary, we will use $x_{j U}$ and $x_{j o}$ to denote the unobserved and observed parts of point $x_{j}$, respectively, and $P_{j}$ to denote the permutation matrix. Otherwise, we will avoid using the index $j$ in referring to a generic point.

In what follows, we derive two variants of the EM algorithm for learning the parameters $\theta=(\mu, B, \sigma)$ of the PPCA model from incomplete samples $\left{x_{j}\right}_{j=1}^{N}$. The first variant, called Maximum a Posteriori Expectation Maximization (MAP-EM), is an approximate EM method whereby the unobserved variables are given by their MAP estimates (see Appendix B.2.2). The second variant is the exact EM algorithm (see Appendix B.2.1), where we take the conditional expectation of $\mathscr{L}$ over the incomplete entries. Interestingly, both variants lead to the same estimate for $\mu_{x}$, though the estimates for $\Sigma_{x}$ are slightly different. In our derivations, we will use the fact that the conditional distribution of $\boldsymbol{x}{U}$ given $\boldsymbol{x}{O}$ is Gaussian. More specifically, $x_{U} \mid x_{O} \sim \mathcal{N}\left(\mu_{U \mid O}, \Sigma_{U \mid O}\right)$, where
$$\mu_{U \mid O}=\mu_{U}+\Sigma_{U O} \Sigma_{O O}^{-1}\left(x_{O}-\mu_{O}\right) \text { and } \Sigma_{U \mid O}=\Sigma_{U U}-\Sigma_{U O} \Sigma_{O O}^{-1} \Sigma_{O U} .$$

## 统计代写|主成分分析代写Principal Component Analysis代考|Matrix Completion by Convex Optimization

The EM-based approaches to incomplete PPCA discussed in the previous section rely on (a) explicit parameterizations of the low-rank factors and (b) minimization of a nonconvex cost function in an alternating minimization fashion. Specifically, such approaches alternate between completing the missing entries given the parameters of a PPCA model for the data and estimating the parameters of the model from complete data. While simple and intuitive, such approaches suffer from two important disadvantages. First, the desired rank of the matrix needs to be known in advance. Second, due to the greedy nature of the EM algorithm, it is difficult to ensure convergence to the globally optimal solution. Therefore, a good initialization of the EM-based algorithm is critical for converging to a good solution.

In this section, we introduce an alternative approach that solves the low-rank matrix completion problem via a convex relaxation. As we will see, this approach allows us to complete a low-rank matrix by minimizing a convex objective function, which is guaranteed to have a globally optimal minimizer. Moreover, under rather benign conditions on the missing entries, the global minimizer is guaranteed to be the correct low-rank matrix, even without knowing the rank of the matrix in advance.
A rigorous justification for the correctness of the convex relaxation approach requires a deep knowledge of high-dimensional statistics and geometry that is beyond the scope of this book. However, this does not prevent us from introducing and summarizing here the main ideas and results, as well as the basic algorithms offered by this approach. Practitioners can apply the useful algorithm to their data and problems, whereas researchers who are more interested in the advanced theory behind the algorithm may find further details in (Cai et al. 2008; Candès and Recht 2009; Candès and Tao 2010; Gross 2011 ; Keshavan et al. 2010a; Zhou et al. 2010a).

$$\mathscr{L}=-\frac{N D}{2} \log (2 \pi)-\frac{N}{2} \log \operatorname{det}\left(\Sigma_{x}\right)-\frac{1}{2} \sum_{j=1}^{N} \operatorname{trace}\left(\Sigma_{x}^{-1}\left(x_{j}-\mu\right)\left(x_{j}-\mu\right)^{\top}\right)$$

where $\left\{x_{j}\right\}_{j=1}^{N}$ are $N$ i.i.d. samples of $\boldsymbol{x}$. Since the samples are incomplete, we can partition each point $x$ and the parameters $\mu_{x}$ and $\Sigma_{x}$ as

$$\left[\begin{array}{l} x_{U} \\ x_{O} \end{array}\right]=P x, \quad\left[\begin{array}{l} \mu_{U} \\ \mu_{O} \end{array}\right]=P \mu, \quad \text { and }\left[\begin{array}{cc} \Sigma_{U U} & \Sigma_{U O} \\ \Sigma_{O U} & \Sigma_{O O} \end{array}\right]=P \Sigma_{x} P^{\top} .$$

B.2.2）。第二个变体是精确的 EM 算法（见附录 B.2.1），我们将条件期望 $\mathscr{L}$ 在不完整的 条目上。有趣的是，这两种变体导致相同的估计 $\mu_{x}$ ，虽然估计为 $\Sigma_{x}$ 略有不同。在我们的 推导中，我们将使用条件分布 $\boldsymbol{x} U$ 给定 $\boldsymbol{x} O$ 是高斯的。进一步来说， $x_{U} \mid x_{O} \sim \mathcal{N}\left(\mu_{U \mid O}, \Sigma_{U \mid O}\right)$ ， 在哪里
$$\mu_{U \mid O}=\mu_{U}+\Sigma_{U O} \Sigma_{O O}^{-1}\left(x_{O}-\mu_{O}\right) \text { and } \Sigma_{U \mid O}=\Sigma_{U U}-\Sigma_{U O} \Sigma_{O O}^{-1} \Sigma_{O U}$$

