- Statistical Inference 统计推断
- Statistical Computing 统计计算
- Advanced Probability Theory 高等概率论
- Advanced Mathematical Statistics 高等数理统计学
- (Generalized) Linear Models 广义线性模型
- Statistical Machine Learning 统计机器学习
- Longitudinal Data Analysis 纵向数据分析
- Foundations of Data Science 数据科学基础

数学代写|基础数据分析代写Elementary data Analysis代考|Estimating the Optimal Linear Predictor
To actually estimate $\beta$ from data, we need to make some probabilistic assumptions about where the data comes from. A fairly weak but often sufficient assumption is that observations $\left(\vec{X}_i, Y_i\right)$ are independent for different values of $i$, with unchanging covariances. Then if we look at the sample covariances, they will, by the law of large numbers, converge on the true covariances:
&\frac{1}{n} \mathbf{X}^T \mathbf{Y} \rightarrow \operatorname{Cov}[\vec{X}, Y] \
&\frac{1}{n} \mathbf{X}^T \mathbf{X} \rightarrow \mathbf{v}
where as before $\mathbf{X}$ is the data-frame matrix with one row for each data point and one column for each variable, and similarly for $\mathbf{Y}$.
So, by continuity,
\hat{\beta}=\left(\mathbf{X}^T \mathbf{X}\right)^{-1} \mathbf{X}^T \mathbf{Y} \rightarrow \beta
and we have a consistent estimator.
On the other hand, we could start with the residual sum of squares
\operatorname{RSS}(\beta) \equiv \sum_{i=1}^n\left(y_i-\vec{x}_i \cdot \beta\right)^2
and try to minimize it. The minimizer is the same $\hat{\beta}$ we got by plugging in the sample covariances. No probabilistic assumption is needed to minimize the RSS, but it doesn’t let us say anything about the convergence of $\hat{\beta}$. For that, we do need some assumptions about $\vec{X}$ and $Y$ coming from distributions with unchanging covariances.
(One can also show that the least-squares estimate is the linear predictor with the minimax prediction risk. That is, its worst-case performance, when everything goes wrong and the data are horrible, will be better than any other linear method. This is some comfort, especially if you have a gloomy and pessimistic view of data, but other methods of estimation may work better in less-than-worst-case scenarios.)
数学代写|基础数据分析代写Elementary data Analysis代考|Omitted Variables and Shifting Distributions
That the optimal regression coefficients can change with the distribution of the predictor features is annoying, but one could after all notice that the distribution has shifted, and so be cautious about relying on the old regression. More subtle is that the regression coefficients can depend on variables which you do not measure, and those can shift without your noticing anything.
Mathematically, the issue is that
\mathbb{E}[Y \mid \vec{X}]=\mathbb{E}[\mathbb{E}[Y \mid Z, \vec{X}] \mid \vec{X}]
Now, if $Y$ is independent of $Z$ given $\vec{X}$, then the extra conditioning in the inner expectation does nothing and changing $Z$ doesn’t alter our predictions. But in general there will be plenty of variables $Z$ which we don’t measure (so they’re not included in $\vec{X}$ ) but which have some non-redundant information about the response (so that $Y$ depends on $Z$ even conditional on $\vec{X}$ ). If the distribution of $\vec{X}$ given $Z$ changes, then the optimal regression of $Y$ on $\vec{X}$ should change too.
Here’s an example. $X$ and $Z$ are both $\mathscr{N}(0,1)$, but with a positive correlation of $0.1$. In reality, $Y \sim \mathscr{N}(X+Z, 0.01)$. Figure $2.2$ shows a scatterplot of all three variables together $(n=100)$.
Now I change the correlation between $X$ and $Z$ to $-0.1$. This leaves both marginal distributions alone, and is barely detectable by eye (Figure 2.3).
Figure $2.4$ shows just the $X$ and $Y$ values from the two data sets, in black for the points with a positive correlation between $X$ and $Z$, and in blue when the correlation is negative. Looking by eye at the points and at the axis tick-marks, one sees that, as promised, there is very little change in the marginal distribution of either variable. Furthermore, the correlation between $X$ and $Y$ doesn’t change much, going only from $0.74$ to $0.63$. On the other hand, the regression lines are noticeably different. When $\operatorname{Cov}[X, Z]=0.1$, the slope of the regression line is $0.96$ – high values for $X$ tend to indicate high values for $Z$, which also increases $Y$. When $\operatorname{Cov}[X, Z]=-0.1$, the slope of the regression line is $0.84$, because now extreme values of $X$ are signs that $Z$ is at the opposite extreme, bringing $Y$ closer back to its mean. But, to repeat, the difference here is due to a change in the correlation between $X$ and $Z$, not how those variables themselves relate to $Y$. If I regress $Y$ on $X$ and $Z$, I get $\hat{\beta}=1,1$ in the first case and $\widehat{\beta}=1,1$ in the second.

实际估计 $\beta$ 从数据中,我们需要对数据的来源做出一些概率假设。一个相当弱但往往充分的假设是观察 $\left(\vec{X}_i, Y_i\right)$ 的不同值是独立的 $i$,协方差不变。然后,如果我们观察样本协方差,根据大数定律,它们将收敛于真实的协方差:
&\frac{1}{n} \mathbf{X}^T \mathbf{Y} \rightarrow \operatorname{Cov}[\vec{X}, Y] \
&\frac{1}{n} \mathbf{X}^T \mathbf{X} \rightarrow \mathbf{v}
where as before $\mathbf{X}$ 数据框架矩阵是否对每个数据点有一行,对每个变量有一列 $\mathbf{Y}$
\hat{\beta}=\left(\mathbf{X}^T \mathbf{X}\right)^{-1} \mathbf{X}^T \mathbf{Y} \rightarrow \beta
\operatorname{RSS}(\beta) \equiv \sum_{i=1}^n\left(y_i-\vec{x}_i \cdot \beta\right)^2
\mathbb{E}[Y \mid \vec{X}]=\mathbb{E}[\mathbb{E}[Y \mid Z, \vec{X}] \mid \vec{X}]
现在,如果$Y$独立于给定$\vec{X}$的$Z$,那么内部期望中的额外条件作用不起作用,改变$Z$也不会改变我们的预测。但通常会有很多变量$Z$,我们没有测量它们(因此它们不包含在$\vec{X}$中),但它们有一些关于响应的非冗余信息(因此$Y$依赖于$Z$,甚至有$\vec{X}$的条件)。如果给定$Z$, $\vec{X}$的分布发生变化,那么$Y$在$\vec{X}$上的最优回归也应该发生变化
这里有一个例子。$X$和$Z$都是$\mathscr{N}(0,1)$,但与$0.1$正相关。实际上,$Y \sim \mathscr{N}(X+Z, 0.01)$。图$2.2$显示了所有三个变量一起的散点图$(n=100)$ .
图 $2.4$ 显示了 $X$ 和 $Y$ 来自两个数据集的值,黑色表示与之呈正相关的点 $X$ 和 $Z$,当相关性为负时为蓝色。通过观察这些点和坐标轴上的勾号,人们可以看到,正如承诺的那样,这两个变量的边际分布几乎没有变化。此外,之间的相关性 $X$ 和 $Y$ 变化不大,只从 $0.74$ 到 $0.63$。另一方面,回归线明显不同。什么时候 $\operatorname{Cov}[X, Z]=0.1$,回归线的斜率为 $0.96$ -高值 $X$ 倾向于表示高值 $Z$,也会增加 $Y$。什么时候 $\operatorname{Cov}[X, Z]=-0.1$, the slope of the regression line is $0.84$的极值 $X$ are signs that $Z$ 是在另一个极端,带来 $Y$ 更接近它的平均值。但是,重复一遍,这里的差异是由于相关性的变化 $X$ 和 $Z$而不是这些变量之间的关系 $Y$。如果我倒退 $Y$ 在 $X$ 和 $Z$, I get $\hat{\beta}=1,1$ 在第一种情况下和 $\widehat{\beta}=1,1$ in the second.
