计算机代写|机器学习代写machine learning代考|COMP4702

2022年12月27日

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础
## 计算机代写|机器学习代写machine learning代考|Outline and Online Toolbox

The remainder of the book is divided into two parts.
Chapter 2 introduces the basics of random matrix theory needed for machine learning applications in this book. In doing so, we shall first revisit the traditional approach found in math-oriented sources, such as Bai and Silverstein [2010], based on a Stieltjes transform and truncation machinery, Pastur and Shcherbina [2011], based on a Gaussian-method approach, Tao [2012] and Vershynin [2012], based on concentration inequalities and a nonasymptotic random matrix approach, and also say a few words on Mingo and Speicher [2017], which follows a free probability framework and on Anderson et al. [2010], which is more oriented toward a determinantal point process and large deviations direction. Unlike most of these references though (with the possible exception of Pastur and Shcherbina [2011]), our methodology is primarily centered on the statistical analysis of the resolvent (and only secondarily on the Stieltjes transform) of random matrices, which is the chief object of interest to us in most machine learning applications. The particular mathematical toolbox exploited to derive the results is of secondary importance.
In this chapter, we will successively introduce:

• the fundamental notion of the resolvent $\mathbf{Q}(z)=\left(\mathbf{X}-z \mathbf{I}_n\right)^{-1}$ of a (random) matrix $\mathbf{X}$, and its relations to the eigenvalues of $\mathbf{X}$, the limiting spectrum of $\mathbf{X}$, the eigenvectors and eigenspaces associated with some specific eigenvalues, as well as its relations to bilinear and quadratic forms often met in machine learning applications (linear or kernel regression, linear and quadratic discriminant analysis, support vector machines, as well as some simple neural networks);

## 计算机代写|机器学习代写machine learning代考|Spectral Measure and Stieltjes Transform

The first use of the resolvent $\mathbf{Q}{\mathbf{M}}$ is in its relation to the empirical spectral measure $\mu{\mathbf{M}}$ of the matrix $\mathbf{M}$ under study, through the associated Stieltjes transform $m_{\mu_{\mathbf{M}}}$, which we all define next.

Definition 2 (Empirical spectral measure). For a symmetric matrix $\mathbf{M} \in \mathbb{R}^{n \times n}$, the spectral measure or empirical spectral measure or empirical spectral distribution (e.s.d.) $\mu_{\mathbf{M}}$ of $\mathbf{M}$ is defined as the normalized counting measure of the eigenvalues $\lambda_1(\mathbf{M}), \ldots, \lambda_n(\mathbf{M})$ of $\mathbf{M}$
$$\mu_{\mathbf{M}} \equiv \frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(\mathbf{M})}$$
Since $\int \mu_{\mathbf{M}}(d x)=1$, the spectral measure $\mu_{\mathbf{M}}$ of a matrix $\mathbf{M} \in \mathbb{R}^{n \times n}$ (random or not) is a probability measure. For (probability) measures, we can define their associated Stieltjes transforms as follows.

Definition 3 (Stieltjes transform). For a real probability measure $\mu$ with support $\operatorname{supp}(\mu)$, the Stieltjes transform $m_\mu(z)$ is defined, for all $z \in \mathbb{C} \backslash \operatorname{supp}(\mu)$, as
$$m_\mu(z) \equiv \int \frac{1}{t-z} \mu(d t)$$

This definition and the Stieltjes transform framework in effect extend beyond probability measures to $\sigma$-finite real measures (i.e., measures $\mu$ such that $\mu(\mathbb{R})<\infty$ ), which will occasionally be discussed in this book.

The Stieltjes transform $m_\mu$ has numerous interesting properties: it is complex analytic on its domain of definition $\mathbb{C} \backslash \operatorname{supp}(\mu)$, it is bounded $\left|m_\mu(z)\right| \leq$ $1 / \operatorname{dist}(z, \operatorname{supp}(\mu))$, it satisfies $\mathcal{S}[z]>0 \Rightarrow \mathfrak{S}[m(z)]>0$, and it is an increasing function on all connected components of its restriction to $\mathbb{R} \backslash \operatorname{supp}(\mu)$ (since $m_\mu^{\prime}(x)=$ $\int(t-x)^{-2} \mu(d t)>0$ ) with $\lim {x \rightarrow \pm \infty} m\mu(x)=0$ if $\operatorname{supp}(\mu)$ is bounded.

As a transform, $m_\mu$ admits an inverse formula to recover $\mu$, as per the following result.

• 解决方案的基本概念 $\mathbf{Q}(z)=\left(\mathbf{X}-z \mathbf{I}_n\right)^{-1}-$ 个 (随机) 矩阵 $\mathbf{X}$ ，以及它与特征值的关系 $\mathbf{X}$, 的 极限光谱 $\mathbf{X}$ ，与某些特定特征值相关的特征向量 和特征空间，以及它与机器学习应用中经常遇到 的双线性和二次形式的关系（线性或核回归，线 性和二次判别分析，支持向量机，以及一些简单 的神经网络);

$$\mu_{\mathbf{M}} \equiv \frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(\mathbf{M})}$$

$$m_\mu(z) \equiv \int \frac{1}{t-z} \mu(d t)$$

Stieltjes 变换 $m_\mu$ 有许多有趣的特性: 它在其定义域上是 复杂的分析 $\mathbb{C} \backslash \operatorname{supp}(\mu)$ ，它是有界的 $\left|m_\mu(z)\right| \leq$ $1 / \operatorname{dist}(z, \operatorname{supp}(\mu))$, 它满足
$\mathcal{S}[z]>0 \Rightarrow \mathfrak{S}[m(z)]>0$, 它是对所有连接组件的递 增函数 $\mathbb{R} \backslash \operatorname{supp}(\mu)$ (自从 $m_\mu^{\prime}(x)=$ $\left.\int(t-x)^{-2} \mu(d t)>0\right)$ 和 $\lim x \rightarrow \pm \infty m \mu(x)=0$ 如果 $\operatorname{supp}(\mu)$ 是有界的。

