- Statistical Inference 统计推断
- Statistical Computing 统计计算
- Advanced Probability Theory 高等概率论
- Advanced Mathematical Statistics 高等数理统计学
- (Generalized) Linear Models 广义线性模型
- Statistical Machine Learning 统计机器学习
- Longitudinal Data Analysis 纵向数据分析
- Foundations of Data Science 数据科学基础

经济代写|计量经济学代写Econometrics代考|Maximum Likelihood and Generalized Least Squares
Up to this point, we have assumed that the errors adhering to regression models are independently distributed with constant variance. This is a strong assumption, which is often untenable in practice. In this chapter, we consider estimation techniques that allow it to be relaxed. These are generalized least squares, or GLS, and generalized nonlinear least squares, or GNLS, on the one hand, and various applications of the method of maximum likelihood on the other. We treat GLS and ML together because, when ML is applied to regression models with normal errors, the estimators that result are very closely related to GLS estimators.
The plan of the chapter is as follows. First of all, in Section 9.2, we relax the assumption that the error terms are independently distributed with constant variance. ML estimation of regression models without those assumptions turns out to be conceptually straightforward and to be closely related to the method of GNLS. In Section 9.3, we discuss the geometry of GLS and consider an important special case in which OLS and GLS estimates are identical. In Section 9.4, we show how a version of the Gauss-Newton regression may be used with models estimated by GNLS. In Section 9.5, we show how GNLS is related to feasible GNLS and discuss a number of fundamental results about both GNLS and feasible GNLS. The relationship between GNLS and ML is then treated in Section 9.6. In Sections $9.7$ through 9.9, we consider multivariate nonlinear regression models. Although such models may often seem very complicated, primarily because of the notational complexities of allowing for several jointly dependent variables, we show that they are actually quite straightforward to estimate by means of GNLS or ML. Finally, in Section 9.10, we discuss models for dealing with panel data and other data sets that combine time series and cross sections. In this chapter, we do not discuss plied work, namely, the estimation of regression models with serial correlation. The enormous literature on this subject will be the topic of Chapter 10.
经济代写|计量经济学代写Econometrics代考|Generalized Least Squares
In this section, we will consider the class of models
\boldsymbol{y}=\boldsymbol{x}(\boldsymbol{\beta})+\boldsymbol{u}, \quad \boldsymbol{u} \sim N(\mathbf{0}, \boldsymbol{\Omega}),
where $\Omega$, an $n \times n$ positive definite matrix, is the covariance matrix of the vector of error terms $\boldsymbol{u}$. The normality assumption can of course be relaxed, but we retain it for now since we want to use the method of maximum likelihood. In some applications the matrix $\Omega$ may be known. In others it may be known only up to a multiplicative constant, which implies that we can write $\boldsymbol{\Omega}=\sigma^2 \boldsymbol{\Delta}$, with $\boldsymbol{\Delta}$ a known $n \times n$ matrix and $\sigma^2$ an unknown positive scalar. In most applications, only the structure of $\Omega$ will be known; one might know for example that it arises from a particular pattern of heteroskedasticity or serial correlation and hence depends on a certain number of parameters in a certain way. We will consider all three cases.
The density of the vector $\boldsymbol{u}$ is the multivariate normal density
f(\boldsymbol{u})=(2 \pi)^{-n / 2}|\boldsymbol{\Omega}|^{-1 / 2} \exp \left(-\frac{1}{2} \boldsymbol{u}^{\top} \boldsymbol{\Omega}^{-1} \boldsymbol{u}\right) .
In order to pass from the density of the vector of error terms $\boldsymbol{u}$ to that of the vector of dependent variables $\boldsymbol{y}$, we must first replace $\boldsymbol{u}$ by $\boldsymbol{y}-\boldsymbol{x}(\boldsymbol{\beta})$ in (9.02) and then multiply by the absolute value of the determinant of the Jacobian matrix associated with the transformation that expresses $\boldsymbol{u}$ in terms of $\boldsymbol{y}$. This use of a Jacobian factor is analogous to what we did in Section $8.10$ with scalar random variables: For details, see Appendix B. In this case, the Jacobian matrix is the identity matrix, and so the determinant is unity. Hence the likelihood function is
L^n(\boldsymbol{y}, \boldsymbol{\beta}, \boldsymbol{\Omega})=(2 \pi)^{-n / 2}|\boldsymbol{\Omega}|^{-1 / 2} \exp \left(-\frac{1}{2}(\boldsymbol{y}-\boldsymbol{x}(\boldsymbol{\beta}))^{\top} \boldsymbol{\Omega}^{-1}(\boldsymbol{y}-\boldsymbol{x}(\boldsymbol{\beta}))\right),
and the loglikelihood function is
\ell^n(\boldsymbol{y}, \boldsymbol{\beta}, \boldsymbol{\Omega})=-\frac{n}{2} \log (2 \pi)-\frac{1}{2} \log |\boldsymbol{\Omega}|-\frac{1}{2}(\boldsymbol{y}-\boldsymbol{x}(\boldsymbol{\beta}))^{\top} \boldsymbol{\Omega}^{-1}(\boldsymbol{y}-\boldsymbol{x}(\boldsymbol{\beta})) .(9.03)
If the matrix $\Omega \Omega$ is known, it is clear that this function can be maximized by minimizing the generalized sum of squared residuals
\operatorname{SSR}(\boldsymbol{\beta} \mid \boldsymbol{\Omega})=(\boldsymbol{y}-\boldsymbol{x}(\boldsymbol{\beta}))^{\top} \boldsymbol{\Omega}^{-1}(\boldsymbol{y}-\boldsymbol{x}(\boldsymbol{\beta})) .

