## 数学代写|基础数据分析代写Elementary data Analysis代考|Adding Probabilistic Assumptions

The usual treatment of linear regression adds many more probabilistic assumptions, namely that
and that $Y$ values are independent conditional on their $\vec{X}$ values. So now we are assuming that the regression function is exactly linear; we are assuming that at each $\vec{X}$ the scatter of $Y$ around the regression function is Gaussian; we are assuming that the variance of this scatter is constant; and we are assuming that there is no dependence between this scatter and anything else.

None of these assumptions was needed in deriving the optimal linear predictor. None of them is so mild that it should go without comment or without at least some attempt at testing.

Leaving that aside just for the moment, why make those assumptions? As you know from your earlier classes, they let us write down the likelihood of the observed responses $y_1, y_2, \ldots y_n$ (conditional on the covariates $\vec{x}_1, \ldots \vec{x}_n$ ), and then estimate $\beta$ and $\sigma^2$ by maximizing this likelihood. As you also know, the maximum likelihood estimate of $\beta$ is exactly the same as the $\beta$ obtained by minimizing the residual sum of squares. This coincidence would not hold in other models, with non-Gaussian noise.
We saw earlier that $\hat{\beta}$ is consistent under comparatively weak assumptions that it converges to the optimal coefficients. But then there might, possibly, still be other estimators are also consistent, but which converge faster. If we make the extra statistical assumptions, so that $\hat{\beta}$ is also the maximum likelihood estimate, we can lay that worry to rest. The MLE is generically (and certainly here!) asymptotically efficient, meaning that it converges as fast as any other consistent estimator, at least in the long run. So we are not, so to speak, wasting any of our data by using the MLE.

A further advantage of the MLE is that, as $n \rightarrow \infty$, its sampling distribution is itself a Gaussian, centered around the true parameter values. This lets us calculate standard errors and confidence intervals quite easily.

## 数学代写|基础数据分析代写Elementary data Analysis代考|Examine the Residuals

By construction, the residuals of a fitted linear regression have mean zero and are uncorrelated with the predictor variables. If the usual probabilistic assumptions hold, however, they have many other properties as well.

1. The residuals have a Gaussian distribution at each $\vec{x}$.
2. The residuals have the same Gaussian distribution at each $\vec{x}$, i.e., they are $i n$ dependent of the predictor variables. In particular, they must have the same variance (i.e., they must be homoskedastic).
3. The residuals are independent of each other. In particular, they must be uncorrelated with each other.

These properties – Gaussianity, homoskedasticity, lack of correlation — are all testable properties. When they all hold, we say that the residuals are white noise. One would never expect them to hold exactly in any finite sample, but if you do test for them and find them strongly violated, you should be extremely suspicious of your model. These tests are much more important than checking whether the coefficients are significantly different from zero.

Every time someone uses linear regression with the standard assumptions for inference and does not test whether the residuals are white noise, an angel loses its wings.

# 基础数据分析代考

## 数学代写|基础数据分析代写基本数据分析代考|添加概率假设

MLE的另一个优点是，如$n \rightarrow \infty$，其抽样分布本身是一个高斯分布，以真实参数值为中心。这让我们可以很容易地计算标准误差和置信区间。

## 数学代写|基础数据分析代写基本数据分析代考|检验残差

1. 残差在每个点上都有高斯分布 $\vec{x}$
2. 残差在各点具有相同的高斯分布 $\vec{x}$，即，他们是 $i n$ 依赖于预测变量。特别是，它们必须具有相同的方差(即，它们必须是同方差)。残差是相互独立的。

