统计代写|线性回归代写linear regression代考|MATH839

Doug I. Jones

Doug I. Jones

Lorem ipsum dolor sit amet, cons the all tetur adiscing elit

如果你也在 怎样代写线性回归linear regression这个学科遇到相关的难题,请随时右上角联系我们的24/7代写客服。


couryes-lab™ 为您的留学生涯保驾护航 在代写线性回归linear regression方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写线性回归linear regression代写方面经验极为丰富,各种代写线性回归linear regression相关的作业也就用不着说。

我们提供的线性回归linear regression及其相关学科的代写,服务范围广, 其中包括但不限于:

  • Statistical Inference 统计推断
  • Statistical Computing 统计计算
  • Advanced Probability Theory 高等概率论
  • Advanced Mathematical Statistics 高等数理统计学
  • (Generalized) Linear Models 广义线性模型
  • Statistical Machine Learning 统计机器学习
  • Longitudinal Data Analysis 纵向数据分析
  • Foundations of Data Science 数据科学基础
统计代写|线性回归代写linear regression代考|MATH839

统计代写|线性回归代写linear regression代考|Complements

The Least Squares Central Limit Theorem $2.8$ is often a good approximation if $n \geq 10 p$ and the error distribution has “light tails,” i.e. the probability of an outlier is nearly 0 and the tails go to zero at an exponential rate or faster. For error distributions with heavier tails, much larger samples are needed, and the assumption that the variance $\sigma^2$ exists is crucial, e.g. Cauchy errors are not allowed. Norman and Streiner (1986, p. 63) recommend $n \geq 5 p$.
The classical MLR prediction interval does not work well and should be replaced by the Olive (2007) asymptotically optimal PI (2.20). Lei and Wasserman (2014) provide an alternative: use the Lei et al. (2013) PI $\left[\tilde{r}_L, \tilde{r}_L\right]$ on the residuals, then the PI for $Y_f$ is
\left[\hat{Y}_f+\tilde{r}_L, \hat{Y}_f+\tilde{r}_U\right] .
Bootstrap PIs need more theory and instead of using $B=1000$ samples, use $B=\max (1000, n)$. See Olive (2014, pp. 279-285).

For the additive error regression model $Y=m(\boldsymbol{x})+e$, the response plot of $\hat{Y}=\hat{m}(\boldsymbol{x})$ vs. $Y$, with the identity line added as a visual aid, is used like the MLR response plot. We want $n \geq 10 d f$ where $d f$ is the degrees of freedom from fitting $\hat{m}$. Olive (2013a) provides PIs for this model, including the location model. These PIs are large sample PIs provided that the sample quantiles of the residuals are consistent estimators of the population quantiles of the errors. The response plot and PIs could also be used for methods described in James et al. (2013) such as ridge regression, lasso, principal components regression, and partial least squares. See Pelawa Watagoda and Olive (2017) if $n$ is not large compared to $p$.

In addition to large sample theory, we want the PIs to work well on a single data set as future observations are gathered, but only have the training data $\left(\boldsymbol{x}_1, Y_1\right), \ldots,\left(\boldsymbol{x}_n, Y_n\right)$. Much like $k$-fold cross validation for discriminant analysis, randomly divide the data set into $k=5$ groups of approximately equal size. Compute the model from 4 groups and use the 5th group as a validation set: compute the PI for $\boldsymbol{x}_f=\boldsymbol{x}_j$ for each $j$ in the 5 th group. Repeat so each of the 5 groups is used as a validation set. Compute the proportion of times $Y_i$ was in its PI for $i=1, \ldots, n$ as well as the average length of the $n$ PIs. We want the proportion near the nominal proportion and short average length if two or more models or PIs are being considered.
Following Chapter 11, under the regularity conditions, much of the inference that is valid for the normal MLR model is approximately valid for the unimodal MLR model when the sample size is large. For example, confidence intervals for $\beta_i$ are asymptotically correct, as are $t$ tests for $\beta_i=0$ (see $\mathrm{Li}$ and Duan (1989, p. 1035)), the MSE is an estimator of $\sigma^2$ by Theorems $2.6$ and 2.7, and variable selection procedures perform well (see Chapter 3 and Olive and Hawkins 2005).

统计代写|线性回归代写linear regression代考|Lack of Fit Tests

Then $M S P E=S S P E /(n-c)$ is an unbiased estimator of $\sigma^2$ when model (2.29) holds, regardless of the form of $m$. The PE in SSPE stands for “pure error.”

Now SSLF $=S S E-S S P E=\sum_{j=1}^c n_j\left(\bar{Y}_j-\hat{Y}_j\right)^2$. Notice that $\bar{Y}_j$ is an unbiased estimator of $m\left(\boldsymbol{x}_j\right)$ while $\hat{Y}_j$ is an estimator of $m$ if the MLR model is appropriate: $m\left(\boldsymbol{x}_j\right)=\boldsymbol{x}_j^T \boldsymbol{\beta}$. Hence SSLF and MSLF can be very large if the MLR model is not appropriate.

The 4 step lack of fit test is i) Ho: no evidence of MLR lack of fit, $H_A$ : there is lack of fit for the MLR model.
ii) $F_{L F}=M S L F / M S P E$.
iii) The pval $=P\left(F_{c-p, n-c}>F_{L F}\right)$.
iv) Reject Ho if pval $\leq \delta$ and state the $H_A$ claim that there is lack of fit. Otherwise, fail to reject Ho and state that there is not enough evidence to conclude that there is MLR lack of fit.

Although the lack of fit test seems clever, examining the response plot and residual plot is a much more effective method for examining whether or not the MLR model fits the data well provided that $n \geq 10 p$. A graphical version of the lack of fit test would compute the $\bar{Y}_j$ and see whether they scatter about the identity line in the response plot. When there are no replicates, the range of $\hat{Y}$ could be divided into several narrow nonoverlapping intervals called slices. Then the mean $\bar{Y}_j$ of each slice could be computed and a step function with step height $\bar{Y}_j$ at the $j$ th slice could be plotted. If the step function follows the identity line, then there is no evidence of lack of fit. However, it is easier to check whether the $Y_i$ are scattered about the identity line. Examining the residual plot is useful because it magnifies deviations from the identity line that may be difficult to see until the linear trend is removed. The lack of fit test may be sensitive to the assumption that the errors are iid $N\left(0, \sigma^2\right)$.

统计代写|线性回归代写linear regression代考|MATH839



最小二乘中心极限定理$2.8$通常是一个很好的近似,如果$n \geq 10 p$和误差分布有“轻尾数”,即离群值的概率接近0,并且尾数以指数或更快的速度接近0。对于尾部较重的误差分布,需要更大的样本,方差$\sigma^2$存在的假设是至关重要的,例如柯西误差是不允许的。Norman和Streiner (1986, p. 63)推荐$n \geq 5 p$ .
经典的MLR预测区间效果不佳,应该用Olive(2007)渐近最优PI(2.20)代替。Lei和Wasserman(2014)提供了一种替代方案:在残差上使用Lei等人(2013)的PI $\left[\tilde{r}_L, \tilde{r}_L\right]$,那么$Y_f$的PI
\left[\hat{Y}_f+\tilde{r}_L, \hat{Y}_f+\tilde{r}_U\right] .
Bootstrap PI需要更多的理论,而不是使用$B=1000$样本,使用$B=\max (1000, n)$。参见Olive (2014, pp. 279-285)

对于加性误差回归模型$Y=m(\boldsymbol{x})+e$, $\hat{Y}=\hat{m}(\boldsymbol{x})$ vs. $Y$的响应图,添加标识线作为视觉辅助,与MLR响应图一样使用。我们想要$n \geq 10 d f$,其中$d f$是与$\hat{m}$匹配的自由度。Olive (2013a)为该模型提供了pi,包括位置模型。这些pi是大样本pi,只要残差的样本分位数是误差总体分位数的一致估计。响应图和pi也可以用于James等人(2013)描述的方法,如岭回归、套索、主成分回归和偏最小二乘。参见Pelawa Watagoda和Olive(2017),如果$n$与$p$相比不大。

除了大样本理论外,我们希望pi在收集未来观察时能在单个数据集上很好地工作,但只有训练数据$\left(\boldsymbol{x}_1, Y_1\right), \ldots,\left(\boldsymbol{x}_n, Y_n\right)$。很像用于判别分析的$k$ -fold交叉验证,将数据集随机分为$k=5$组,其大小大致相等。从4组中计算模型,并使用第5组作为验证集:为第5组中的每个$j$计算$\boldsymbol{x}_f=\boldsymbol{x}_j$的PI。重复这样5组中的每一组都被用作验证集。计算$i=1, \ldots, n$的PI中$Y_i$的占比以及$n$的PI的平均长度。如果考虑两个或两个以上的模型或pi,我们希望比例接近标称比例和较短的平均长度。在第11章之后,在规律性条件下,大部分对正常MLR模型有效的推理在样本量较大时对单模MLR模型近似有效。例如,$\beta_i$的置信区间是渐近正确的,$\beta_i=0$的$t$检验也是如此(见$\mathrm{Li}$和Duan (1989, p. 1035)),通过定理$2.6$和2.7,均方误差是$\sigma^2$的估计量,变量选择过程执行良好(见第3章和Olive和Hawkins 2005)



那么,当模型(2.29)成立时,无论$m$的形式如何,$M S P E=S S P E /(n-c)$是$\sigma^2$的无偏估计量。SSPE中的PE代表“纯错误”。

现在SSLF $=S S E-S S P E=\sum_{j=1}^c n_j\left(\bar{Y}_j-\hat{Y}_j\right)^2$。注意,如果MLR模型合适,$\bar{Y}_j$是$m\left(\boldsymbol{x}_j\right)$的一个无偏估计量,而$\hat{Y}_j$是$m$的一个估计量:$m\left(\boldsymbol{x}_j\right)=\boldsymbol{x}_j^T \boldsymbol{\beta}$。因此,如果MLR模型不合适,SSLF和MSLF可以非常大

4步缺乏适合度检验是i) Ho:没有证据表明MLR缺乏适合度, $H_A$ : MLR模型缺乏拟合。
ii) $F_{L F}=M S L F / M S P E$.
iii) pval $=P\left(F_{c-p, n-c}>F_{L F}\right)$.
iv)如果没有拒绝Ho $\leq \delta$ and state the $H_A$ 声称缺乏契合度。否则,不能拒绝Ho,并声明没有足够的证据得出MLR缺乏拟合的结论。 虽然缺乏拟合检验似乎很聪明,但检验响应图和残差图是检验MLR模型是否很好地拟合数据的一种更有效的方法,条件是$n \geq 10 p$。缺乏拟合检验的图形版本将计算$\bar{Y}_j$并查看它们是否分散在响应图中的标识线上。当没有重复时,$\hat{Y}$的范围可以被划分为几个狭窄的不重叠的区间,称为片。然后计算出每个切片的平均值$\bar{Y}_j$,并在第$j$切片处绘制出阶梯高度为$\bar{Y}_j$的阶梯函数。如果阶跃函数遵循恒等线,则没有缺乏拟合的证据。然而,更容易检查$Y_i$是否分散在标识线上。检查残差图是有用的,因为它放大了偏离恒等线的偏差,这些偏差在去除线性趋势之前可能很难看到。缺乏拟合检验可能对错误为iid $N\left(0, \sigma^2\right)$ .


统计代写|线性回归代写linear regression代考 请认准statistics-lab™

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。







术语 广义线性模型(GLM)通常是指给定连续和/或分类预测因素的连续响应变量的常规线性回归模型。它包括多元线性回归,以及方差分析和方差分析(仅含固定效应)。



有限元是一种通用的数值方法,用于解决两个或三个空间变量的偏微分方程(即一些边界值问题)。为了解决一个问题,有限元将一个大系统细分为更小、更简单的部分,称为有限元。这是通过在空间维度上的特定空间离散化来实现的,它是通过构建对象的网格来实现的:用于求解的数值域,它有有限数量的点。边界值问题的有限元方法表述最终导致一个代数方程组。该方法在域上对未知函数进行逼近。[1] 然后将模拟这些有限元的简单方程组合成一个更大的方程系统,以模拟整个问题。然后,有限元通过变化微积分使相关的误差函数最小化来逼近一个解决方案。





随机过程,是依赖于参数的一组随机变量的全体,参数通常是时间。 随机变量是随机现象的数量表现,其时间序列是一组按照时间发生先后顺序进行排列的数据点序列。通常一组时间序列的时间间隔为一恒定值(如1秒,5分钟,12小时,7天,1年),因此时间序列可以作为离散时间数据进行分析处理。研究时间序列数据的意义在于现实中,往往需要研究某个事物其随时间发展变化的规律。这就需要通过研究该事物过去发展的历史记录,以得到其自身发展的规律。


多元回归分析渐进(Multiple Regression Analysis Asymptotics)属于计量经济学领域,主要是一种数学上的统计分析方法,可以分析复杂情况下各影响因素的数学关系,在自然科学、社会和经济学等多个领域内应用广泛。


MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中,其中问题和解决方案以熟悉的数学符号表示。典型用途包括:数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发,包括图形用户界面构建MATLAB 是一个交互式系统,其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题,尤其是那些具有矩阵和向量公式的问题,而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问,这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展,得到了许多用户的投入。在大学环境中,它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域,MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要,工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数(M 文件)的综合集合,可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。


hurry up

15% OFF

On All Tickets

Don’t hesitate and buy tickets today – All tickets are at a special price until 15.08.2021. Hope to see you there :)