## 统计代写|线性回归分析代写linear regression analysis代考|The Partial F Test

2023年4月7日

## 统计代写|线性回归分析代写linear regression analysis代考|The Partial F Test

Suppose that there is data on variables $Z, w_1, \ldots, w_r$ and that a useful MLR model has been made using $Y=t(Z), x_1 \equiv 1, x_2, \ldots, x_p$ where each $x_i$ is some function of $w_1, \ldots, w_r$. This useful model will be called the full model. It is important to realize that the full model does not need to use every variable $w_j$ that was collected. For example, variables with outliers or missing values may not be used. Forming a useful full model is often very difficult, and it is often not reasonable to assume that the candidate full model is good based on a single data set, especially if the model is to be used for prediction.
Even if the full model is useful, the investigator will often be interested in checking whether a model that uses fewer predictors will work just as well. For example, perhaps $x_p$ is a very expensive predictor but is not needed given that $x_1, \ldots, x_{p-1}$ are in the model. Also a model with fewer predictors tends to be easier to understand.

Definition 2.21. Let the full model use $Y, x_1 \equiv 1, x_2, \ldots, x_p$ and let the reduced model use $Y, x_1, x_{i_2}, \ldots, x_{i_q}$ where $\left{i_2, \ldots, i_q\right} \subset{2, \ldots, p}$.
The partial $F$ test is used to test whether the reduced model is good in that it can be used instead of the full model. It is crucial that the reduced model be selected before looking at the data. If the reduced model is selected after looking at output and discarding the worst variables, then the $\mathrm{p}$-value for the partial $F$ test will be too high. For (ordinary) least squares, usually a constant is used, and we are assuming that both the full model and the reduced model contain a constant. The partial $F$ test has null hypothesis $H o: \beta_{i_{q+1}}=\cdots=\beta_{i_p}=0$, and alternative hypothesis $H_A:$ at least one of the $\beta_{i_j} \neq 0$ for $j>q$. The null hypothesis is equivalent to Ho: “the reduced model is good.” Since only the full model and reduced model are being compared, the alternative hypothesis is equivalent to $H_A$ : “the reduced model is not as good as the full model, so use the full model,” or more simply, $H_A$ : “use the full model.”

## 统计代写|线性回归分析代写linear regression analysis代考|The Wald t Test

Often investigators hope to examine $\beta_k$ in order to determine the importance of the predictor $x_k$ in the model; however, $\beta_k$ is the coefficient for $x_k$ given that the other predictors are in the model. Hence $\beta_k$ depends strongly on the other predictors in the model. Suppose that the model has an intercept: $x_1 \equiv 1$. The predictor $x_k$ is highly correlated with the other predictors if the OLS regression of $x_k$ on $x_1, \ldots, x_{k-1}, x_{k+1}, \ldots, x_p$ has a high coefficient of determination $R_k^2$. If this is the case, then often $x_k$ is not needed in the model given that the other predictors are in the model. If at least one $R_k^2$ is high for $k \geq 2$, then there is multicollinearity among the predictors.

As an example, suppose that $Y=$ height, $x_1 \equiv 1, x_2=$ left leg length, and $x_3=$ right leg length. Then $x_2$ should not be needed given $x_3$ is in the model and $\beta_2=0$ is reasonable. Similarly $\beta_3=0$ is reasonable. On the other hand, if the model only contains $x_1$ and $x_2$, then $x_2$ is extremely important with $\beta_2$ near 2. If the model contains $x_1, x_2, x_3, x_4=$ height at shoulder, $x_5=$ right arm length, $x_6=$ head length , and $x_7=$ length of back, then $R_i^2$ may be high for each $i \geq 2$. Hence $x_i$ is not needed in the MLR model for $Y$ given that the other predictors are in the model.

Definition 2.23. The $100(1-\delta) \%$ CI for $\beta_k$ is $\hat{\beta}k \pm t{n-p, 1-\delta / 2} \operatorname{se}\left(\hat{\beta}k\right)$. If the degrees of freedom $d=n-p \geq 30$, the $\mathrm{N}(0,1)$ cutoff $z{1-\delta / 2}$ may be used.
Know how to do the 4 step Wald $t$-test of hypotheses.
i) State the hypotheses Ho: $\beta_k=0$ Ha: $\beta_k \neq 0$.
ii) Find the test statistic $t_{o, k}=\hat{\beta}k / \operatorname{se}\left(\hat{\beta}_k\right)$ or obtain it from output. iii) Find pval from output or use the $t$-table: pval $=$ $$2 P\left(t{n-p}<-\left|t_{o, k}\right|\right)=2 P\left(t_{n-p}>\left|t_{o, k}\right|\right) .$$

## 统计代写|线性回归分析代写linear regression analysis代考|The Partial F Test

$Y=t(Z), x_1 \equiv 1, x_2, \ldots, x_p$ 每个 $x_i$ 是一些函数 $w_1, \ldots, w_r$. 这个有用的模型将被称为完整模型。重要 的是要认识到完整模型不需要使用每个变量 $w_j$ 那是收集 来的。例如，不得使用具有异常值或缺失值的变量。形 成一个有用的完整模型通常非常困难，并且基于单个数 据集假设候选完整模型是好的通常是不合理的，特别是 如果该模型要用于预测。

Veft{i_2, \ldots, i_q qright $}$ Isubset ${2, \backslash$ ddots, $p}$.

$H o: \beta_{i_{q+1}}=\cdots=\beta_{i_p}=0$ 和备择假设H_A: $H_A$ : 至少其中之一 $\beta_{i_j} \neq 0$ 为了 $j>q$. 零假设等同于 $\mathrm{Ho}$ : “简化模型是好的。”由于仅比较完整模型和简化模型， 备择假设等同于 $H_A$ ：“缩减模型不如完整模型，所以使 用完整模型”，或者更简单地说， $H_A$ ：“使用完整模 型。”

## 统计代写|线性回归分析代写linear regression analysis代考|The Wald t Test

i) 陈述假设 $\mathrm{Ho}: \beta_k=0$ 哈: $\beta_k \neq 0$.
ii) 找到检验统计量 $t_{o, k}=\hat{\beta} k / \operatorname{se}\left(\hat{\beta}k\right)$ 或者从输出中获 取。iii) 从输出中查找 pval 或使用 $t$-表: pval $=$ $$2 P\left(t n-p<-\left|t{o, k}\right|\right)=2 P\left(t_{n-p}>\left|t_{o, k}\right|\right)$$

