## 统计代写|主成分分析代写Principal Component Analysis代考|ICE 2022

2022年7月20日

Statistical Inference 统计推断
Statistical Computing 统计计算
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础
## 统计代写|主成分分析代写Principal Component Analysis代考|Robust PCA by Iteratively Reweighted Least Squares

One of the simplest algorithms for dealing with corrupted entries is the iteratively reweighted least squares (IRLS) approach proposed in (De la Torre and Black 2004). In this approach, a subspace is fit to the corrupted data points using standard $\mathrm{PCA}$. The corrupted entries are detected as those that have a large residual with respect to the identified subspace. A new subspace is estimated with the detected corruptions down-weighted. This process is then repeated until the estimated model stabilizes.
The first step is to apply standard PCA to the given data. Recall from Section 2.1.2 that when the data points $\left{x_{j} \in \mathbb{R}^{D}\right}_{j=1}^{N}$ have no gross corruptions, an optimal solution to $\mathrm{PCA}$ can be obtained as
$$\hat{\mu}=\frac{1}{N} \sum_{j=1}^{N} x_{j} \quad \text { and } \quad \hat{y}{j}=\hat{U}^{\top}\left(x{j}-\mu\right),$$
where $\hat{U}$ is a $D \times d$ matrix whose columns are the top $d$ eigenvectors of
$$\hat{\Sigma}{N}=\frac{1}{N} \sum{j=1}^{N}\left(x_{j}-\hat{\mu}\right)\left(x_{j}-\hat{\mu}\right)^{\top}$$
When the data points are corrupted by gross errors, we may improve the estimation of the subspace by recomputing the model parameters after downweighting samples that have large residuals. More specifically, let $w_{i j} \in[0,1]$ be a weight assigned to the $i$ th entry of $\boldsymbol{x}{j}$ such that $w{i j} \approx 1$ if $x_{i j}$ is not corrupted,

and $w_{i j} \approx 0$ otherwise. Then a new estimate of the subspace can be obtained by minimizing the weighted sum of the least-squares errors between a point $\boldsymbol{x}{j}$ and its projection $\mu+U y{j}$ onto the subspace $S$, i.e.,
$$\sum_{i=1}^{D} \sum_{j=1}^{N} w_{i j}\left(x_{i j}-\mu_{i}-\boldsymbol{u}{i}^{\top} \boldsymbol{y}{j}\right)^{2},$$
where $\mu_{i}$ is the ith entry of $\mu, u_{i}^{\top}$ is the $i$ th row of $U$, and $\boldsymbol{y}{j}$ is the vector of coordinates of the point $\boldsymbol{x}{j}$ in the subspace $S$.

## 统计代写|主成分分析代写Principal Component Analysis代考|Robust PCA by Convex Optimization

Although the IRLS scheme for robust PCA is very simple and efficient to implement, and widely used in practice, there is no immediate guarantee that the method converges. Moreover, even if the method were to converge, there is no guarantee that the solution to which it converges corresponds to the correct low-rank matrix. As we have seen in the low-rank matrix completion problem, we should not even expect the problem to have a meaningful solution unless proper conditions are imposed on the low-rank matrix and the matrix of errors.

In this section, we will derive conditions under which the robust PCA problem is well posed and admits an efficient solution. To this end, we will formulate the robust PCA problem as a (nonconvex and nonsmooth) rank minimization problem in which we seek to decompose the data matrix $X$ as the sum of a lowrank matrix $L$ and a matrix of errors $E$. Similar to the matrix completion case, we will study convex relaxations of the rank minimization problem and resort to advanced tools from high-dimensional statistics to show that under certain conditions, the convex relaxations can effectively and efficiently recover a low-rank matrix with intrasample outliers as long as the outliers are sparse enough. Although the mathematical theory that supports the correctness of these methods is far beyond the scope of this book, we will introduce the key ideas and results of this approach to $\mathrm{PCA}$ with intrasample outliers.

More specifically, we assume that the given data matrix $X$ is generated as the sum of two matrices
$$X=L_{0}+E_{0} .$$

Veft $\left{x_{-}{j} \backslash\right.$ in \mathbb ${R} \wedge{D} \backslash$ right $}$ _ ${j=1} \wedge{N}$ 没有严重的腐败，最佳解决方案 PCA可以得 到
$$\hat{\mu}=\frac{1}{N} \sum_{j=1}^{N} x_{j} \quad \text { and } \quad \hat{y} j=\hat{U}^{\top}(x j-\mu)$$

$$\hat{\Sigma} N=\frac{1}{N} \sum j=1^{N}\left(x_{j}-\hat{\mu}\right)\left(x_{j}-\hat{\mu}\right)^{\top}$$

$$\sum_{i=1}^{D} \sum_{j=1}^{N} w_{i j}\left(x_{i j}-\mu_{i}-\boldsymbol{u} i \boldsymbol{y} j\right)^{2}$$

$$X=L_{0}+E_{0} .$$

