In earlier sections, we have discussed the absence or presence of a relationship between two random variables, independence or nonindependence. But if there is a relationship, the relationship may be strong or weak. In this section we discuss two numerical measures of the strength of a relationship between two random variables, the covariance and correlation.

To illustrate what we mean by the strength of a relationship between two random variables, consider two different experiments. In the first, random variables $X$ and $Y$ are measured, where $X$ is the weight of a sample of water and $Y$ is the volume of the same sample of water. Clearly there is a strong relationship between $X$ and $Y$. If $(X, Y)$ pairs are measured on several samples and the observed data pairs are plotted, the data points should fall on a straight line because of the physical relationship between $X$ and $Y$. This will not be exactly the case because of measurement errors, impurities in the water, etc. But with careful laboratory technique, the data points will fall very nearly on a straight line. Now consider another experiment in which $X$ and $Y$ are measured, where $X$ is the body weight of a human and $Y$ is the same human’s height. Clearly there is also a relationship between $X$ and $Y$ here but the relationship is not nearly as strong. We would not expect a plot of $(X, Y)$ pairs measured on different people to form a straight line, although we might expect to see an upward trend in the plot. The covariance and correlation are two measures that quantify this difference in the strength of a relationship between two random variables.

Throughout this section we will frequently be referring to the mean and variance of $X$ and the mean and variance of $Y$. For these we will use the notation $E X=\mu_X$, $\mathrm{E} Y=\mu_Y, \operatorname{Var} X=\sigma_X^2$, and $\operatorname{Var} Y=\sigma_Y^2$. We will assume throughout that $0<\sigma_X^2<$ $\infty$ and $0<\sigma_Y^2<\infty$
Definition 4.5.1 The covariance of $X$ and $Y$ is the number defined by
$$\operatorname{Cov}(X, Y)=\mathrm{E}\left(\left(X-\mu_X\right)\left(Y-\mu_Y\right)\right)$$

## 统计代写|统计推断代写Statistical inference代考|Hence the correlation is nearer to 1 in this example

The next example jllustrates that there may be a strong relationship between $X$ and $Y$, but if the relationship is not: linear, the correlation may be small.

Example 4.5.9 (Correlation-III) In this oxample, let $X$ have a uniform $(-1,1)$ distribution and iet $Z$ have a uniform $\left(0, \frac{1}{10}\right)$ distribution. Let $X$ and $Z$ be independent. Let $Y=X^2+Z$ and consider the random vector $(X, Y)$. As in Example 4.5.8, given $X=x, Y=x^2+Z$ and the conditional distribution of $Y$ given $X=x$ is uniform $\left(x^2, x^2+\frac{1}{10}\right)$. The joint pdf of $X$ and $Y$, the product of this conditional pdf and the marginal pdf of $X$, is thus
$$f(x, y)=5, \quad-10 is illustrated in Figure 4.5.2. There is a strong relationship between X and Y, as indicated by the conditional distribution of Y given X=x. But the relationship is not lincar. The possible values of (X, Y) cluster around a parabola rather than a straight line. The correlation does not measure this nonlinear relationship. In fact, \rho_{X Y}=0. Since X has a uniform (-1,1) distribution, \mathrm{EX}=\mathrm{E} X^3=0, and since X and Z are indcperident, \mathrm{EXZ}=(\mathrm{E} X)(\mathrm{E} Z). Thus,$$
\begin{aligned}
\operatorname{Cov}(X, Y) & =\mathrm{E}\left(X\left(X^2+Z\right)\right)-(\mathrm{E} X)\left(\mathrm{E}\left(X^2+Z\right)\right) \
& =\mathrm{E} X^3+\mathrm{E} X Z-0 \mathrm{E}\left(X^2+Z\right) \
& =0+(\mathrm{E} X)(\mathrm{E} Z)=0(\mathrm{E} Z)=0
\end{aligned}
$$and \rho_{X Y}=\operatorname{Cov}(X, Y) /\left(\sigma_X \sigma_Y\right)=0 # 统计推断代考 ## 统计代写|统计推断代写Statistical inference代考|Covariance and Correlation 在前面的章节中，我们讨论了两个随机变量之间的关系，独立或非独立。但如果存在关系，这种关系可能是强的，也可能是弱的。在本节中，我们将讨论两个随机变量之间关系强度的两个数值度量，协方差和相关性。 为了说明我们所说的两个随机变量之间关系的强度，考虑两个不同的实验。首先，测量随机变量X和Y，其中X是水样的重量，Y是同一水样的体积。很明显，X和Y之间有很强的联系。如果在几个样本上测量(X, Y)对，并绘制观察到的数据对，由于X和Y之间的物理关系，数据点应该落在一条直线上。由于测量误差、水中杂质等原因，情况并非完全如此。但通过仔细的实验室技术，数据点几乎会落在一条直线上。现在考虑另一个测量X和Y的实验，其中X是一个人的体重，Y是同一个人的身高。显然，X和Y之间也有关系，但这种关系并没有那么强。我们不会期望在不同的人身上测量的(X, Y)对的图形成一条直线，尽管我们可能期望在图中看到上升趋势。协方差和相关性是量化两个随机变量之间关系强度差异的两个度量。 在本节中，我们将经常提到X的均值和方差以及Y的均值和方差。对于这些，我们将使用E X=\mu_X、\mathrm{E} Y=\mu_Y, \operatorname{Var} X=\sigma_X^2和\operatorname{Var} Y=\sigma_Y^2符号。我们将自始至终假设0<\sigma_X^2<$$\infty$和$0<\sigma_Y^2<\infty$4.5.1$X$和$Y$的协方差为定义的数 $$\operatorname{Cov}(X, Y)=\mathrm{E}\left(\left(X-\mu_X\right)\left(Y-\mu_Y\right)\right)$$ ## 统计代写|统计推断代写Statistical inference代考|Multivariate Distributions 在本章的开头，我们讨论了在实验中观察两个以上的随机变量。在前面几节中，我们主要讨论二元随机向量$(X, Y)$。在本节中，我们讨论一个多变量随机向量$\left(X_1, \ldots, X_n\right)$。在本章开头的例子中，观察了一个人的体温、身高、体重和血压。在这个例子中，$n=4$和观察到的随机向量是$\left(X_1, X_2, X_3, X_4\right)$，其中$X_1$是温度，$X_2$是高度，等等。前面章节中的概念，包括边际分布和条件分布，从二元环境推广到多元环境。在本节中，我们将介绍其中的一些概括。 关于符号的注意事项:我们将使用黑体字表示多个变量。因此，我们用$\mathbf{X}$表示随机变量$X_1, \ldots, X_n$，用$\mathbf{x}$表示样本$x_1, \ldots, x_n$。 随机向量$\mathbf{X}=\left(X_1, \ldots, X_n\right)$有一个样本空间，它是$\Re^n$的一个子集。如果$\left(X_1, \ldots, X_n\right)$是一个离散的随机向量(样本空间是可数的)，那么$\left(X_1, \ldots, X_n\right)$的联合pmf就是$f(\mathbf{x})=f\left(x_1, \ldots, x_n\right)=P\left(X_1=\right.$$\left.x_1, \ldots, X_n=x_n\right)为每个\left(x_1, \ldots, x_n\right) \in \Re^n定义的函数。对于任意A \subset \Re^n，$$
P(\mathbf{X} \in A)=\sum_{\mathbf{x} \in A} f(\mathbf{x})
$$如果\left(X_1, \cdots, X_n\right)是一个连续的随机向量，\left(X_1, \ldots, X_n\right)的关节p d f是一个满足的函数f\left(x_1, \ldots, x_n\right)$$
P(\mathbf{X} \in A)=\int \cdots \int_A f(\mathbf{x}) d \mathbf{x}=\int \cdots \int_A f\left(x_1, \ldots, x_n\right) d x_1 \cdots d x_n .


