## 数学代写|基础数据分析代写Elementary data Analysis代考|Estimating the Optimal Linear Predictor

To actually estimate $\beta$ from data, we need to make some probabilistic assumptions about where the data comes from. A fairly weak but often sufficient assumption is that observations $\left(\vec{X}_i, Y_i\right)$ are independent for different values of $i$, with unchanging covariances. Then if we look at the sample covariances, they will, by the law of large numbers, converge on the true covariances:
\begin{aligned} &\frac{1}{n} \mathbf{X}^T \mathbf{Y} \rightarrow \operatorname{Cov}[\vec{X}, Y] \ &\frac{1}{n} \mathbf{X}^T \mathbf{X} \rightarrow \mathbf{v} \end{aligned}
where as before $\mathbf{X}$ is the data-frame matrix with one row for each data point and one column for each variable, and similarly for $\mathbf{Y}$.
So, by continuity,
$$\hat{\beta}=\left(\mathbf{X}^T \mathbf{X}\right)^{-1} \mathbf{X}^T \mathbf{Y} \rightarrow \beta$$
and we have a consistent estimator.

On the other hand, we could start with the residual sum of squares
$$\operatorname{RSS}(\beta) \equiv \sum_{i=1}^n\left(y_i-\vec{x}_i \cdot \beta\right)^2$$
and try to minimize it. The minimizer is the same $\hat{\beta}$ we got by plugging in the sample covariances. No probabilistic assumption is needed to minimize the RSS, but it doesn’t let us say anything about the convergence of $\hat{\beta}$. For that, we do need some assumptions about $\vec{X}$ and $Y$ coming from distributions with unchanging covariances.
(One can also show that the least-squares estimate is the linear predictor with the minimax prediction risk. That is, its worst-case performance, when everything goes wrong and the data are horrible, will be better than any other linear method. This is some comfort, especially if you have a gloomy and pessimistic view of data, but other methods of estimation may work better in less-than-worst-case scenarios.)

## 数学代写|基础数据分析代写Elementary data Analysis代考|Omitted Variables and Shifting Distributions

That the optimal regression coefficients can change with the distribution of the predictor features is annoying, but one could after all notice that the distribution has shifted, and so be cautious about relying on the old regression. More subtle is that the regression coefficients can depend on variables which you do not measure, and those can shift without your noticing anything.
Mathematically, the issue is that
$$\mathbb{E}[Y \mid \vec{X}]=\mathbb{E}[\mathbb{E}[Y \mid Z, \vec{X}] \mid \vec{X}]$$
Now, if $Y$ is independent of $Z$ given $\vec{X}$, then the extra conditioning in the inner expectation does nothing and changing $Z$ doesn’t alter our predictions. But in general there will be plenty of variables $Z$ which we don’t measure (so they’re not included in $\vec{X}$ ) but which have some non-redundant information about the response (so that $Y$ depends on $Z$ even conditional on $\vec{X}$ ). If the distribution of $\vec{X}$ given $Z$ changes, then the optimal regression of $Y$ on $\vec{X}$ should change too.

Here’s an example. $X$ and $Z$ are both $\mathscr{N}(0,1)$, but with a positive correlation of $0.1$. In reality, $Y \sim \mathscr{N}(X+Z, 0.01)$. Figure $2.2$ shows a scatterplot of all three variables together $(n=100)$.

Now I change the correlation between $X$ and $Z$ to $-0.1$. This leaves both marginal distributions alone, and is barely detectable by eye (Figure 2.3).

Figure $2.4$ shows just the $X$ and $Y$ values from the two data sets, in black for the points with a positive correlation between $X$ and $Z$, and in blue when the correlation is negative. Looking by eye at the points and at the axis tick-marks, one sees that, as promised, there is very little change in the marginal distribution of either variable. Furthermore, the correlation between $X$ and $Y$ doesn’t change much, going only from $0.74$ to $0.63$. On the other hand, the regression lines are noticeably different. When $\operatorname{Cov}[X, Z]=0.1$, the slope of the regression line is $0.96$ – high values for $X$ tend to indicate high values for $Z$, which also increases $Y$. When $\operatorname{Cov}[X, Z]=-0.1$, the slope of the regression line is $0.84$, because now extreme values of $X$ are signs that $Z$ is at the opposite extreme, bringing $Y$ closer back to its mean. But, to repeat, the difference here is due to a change in the correlation between $X$ and $Z$, not how those variables themselves relate to $Y$. If I regress $Y$ on $X$ and $Z$, I get $\hat{\beta}=1,1$ in the first case and $\widehat{\beta}=1,1$ in the second.

# 基础数据分析代考

## 数学代写|基础数据分析代写基本数据分析代考|估计最优线性预测器

\begin{aligned} &\frac{1}{n} \mathbf{X}^T \mathbf{Y} \rightarrow \operatorname{Cov}[\vec{X}, Y] \ &\frac{1}{n} \mathbf{X}^T \mathbf{X} \rightarrow \mathbf{v} \end{aligned}
where as before $\mathbf{X}$ 数据框架矩阵是否对每个数据点有一行，对每个变量有一列 $\mathbf{Y}$

$$\hat{\beta}=\left(\mathbf{X}^T \mathbf{X}\right)^{-1} \mathbf{X}^T \mathbf{Y} \rightarrow \beta$$

$$\operatorname{RSS}(\beta) \equiv \sum_{i=1}^n\left(y_i-\vec{x}_i \cdot \beta\right)^2$$

## 数学代写|基础数据分析代写基本数据分析代考|省略变量和移动分布

.

$$\mathbb{E}[Y \mid \vec{X}]=\mathbb{E}[\mathbb{E}[Y \mid Z, \vec{X}] \mid \vec{X}]$$

