## 机器学习代写|机器学习代写machine learning代考|COMP30027

2022年10月13日

## 机器学习代写|机器学习代写machine learning代考|Genomic Breeding Values and Their Estimation

In plant and animal breeding, it is a common practice to rank and select individuals (plants or animals) based on their true breeding values (TBVs), also called additive genetic values. However, since we cannot see genes and breeding values, this task is not straightforward, and it is therefore estimated indirectly using observed phenotypes. The estimated values are called estimated breeding values (EBVs), which means that TBV is a latent variable that is only approximated using the observable variable (phenotype).

When the TBVs are used, the genetic change is expected to be larger than when the EBVs are used, but this difference is small when the EBVs are accurately estimated. EBVs reflect the true genetic potential or true genetic transmitting ability of individuals (plants or animals). Traditionally, they are estimated based on the performance records of their parents, sibs, progenies, and their own after correcting for various environmental factors such as management, season, age, etc. When parents are selected based on their breeding values with high reliability, a faster genetic progress is expected in the resulting population. For this reason, the process of estimating breeding values is of paramount importance in any breeding program.
There are several methods to estimate genomic estimated breeding values (GEBVs), but first we will describe the best linear unbiased predictor (BLUP) method. When using the BLUP method to estimate the GEBVs, we need to use the mixed model equations (2.2) described above to estimate BLUEs and BLUPs. Using this equation (2.2) but depending on the form taken by the matrices $\boldsymbol{Z}$ and $\boldsymbol{\Sigma}$, we can end up with the GBLUP method or the SNP-BLUP method to estimate the breeding values. First, we explain the GBLUP method, where we substitute $\boldsymbol{Z}$ and $\boldsymbol{\Sigma}$ matrices for the incidence matrix of genotypes and genomic relationship matrix (GRM) derived from allele frequencies calculated with one of the methods of VanRaden (2008) given in Sect. 2.4. Under this GBLUP method, the GEBV can be obtained as the solution $\hat{\boldsymbol{u}}$ of the mixed model equation:
$$\left(\begin{array}{c} \widehat{\boldsymbol{\beta}} \ \widehat{\boldsymbol{u}} \end{array}\right)=\left(\begin{array}{cc} \boldsymbol{X}^{\mathrm{T}} \boldsymbol{R}^{-1} \boldsymbol{X} & \boldsymbol{X}^{\mathrm{T}} \boldsymbol{R}^{-1} \mathbf{1} \ \mathbf{1}^{\mathrm{T}} \boldsymbol{R}^{-1} \boldsymbol{X} & \mathbf{1}^{\mathrm{T}} \boldsymbol{R}^{-1} \mathbf{1}+\boldsymbol{\sigma}_g^{-2} \boldsymbol{G}^{-1} \end{array}\right)^{-1}\left(\begin{array}{c} \boldsymbol{X}^{\mathrm{T}} \boldsymbol{R}^{-1} \boldsymbol{y} \ \mathbf{1}^{\mathrm{T}} \boldsymbol{R}^{-1} \boldsymbol{y} \end{array}\right),$$
where $Z$ was replaced by $Z=1$ and $\boldsymbol{\Sigma}$ by $\sigma_g^2 G$, the genomic relationship matrix that was calculated with some of the methods described in Sect.

## 机器学习代写|机器学习代写machine learning代考|Normalization Methods

This section describes four types of normalization variables (inputs and outputs). In this case, normalization refers to the process of adjusting the different inputs or outputs that were originally measured in different scales to the same scale. It is very important to carry out the normalization process before giving the inputs and outputs for most statistical machine learning algorithms because it helps improve the numerical stability in the estimation process of some algorithms; it is suggested mostly when the inputs or outputs are in different scales. However, it is important to point out that in some statistical machine learning software, the normalization process is done internally, in which case this process does not need to be carried out manually. The five normalization methods we describe next are centering, scaling, standardization, max normalization, and minimax normalization.

Centering This normalization consists of subtracting from each variable (input or output) its mean, $\mu$; this means that the centered values are calculated as
$$X_i^=X_i-\mu$$ Thẻ cênteréd variablè $X_i^$ has a meañ ô zeroo.
Scaling This normalization consists of dividing each variable (input or output) by its standard deviation, $\sigma$. The scaled values are calculated as
$$X_i^=\frac{X_i}{\sigma} .$$ The scaled variable $X_i^$ has unit variance.
Standardization This process of normalization consists of calculating its mean, $\mu$, and standard deviation, $\sigma$, for each input or output. The standardized values are then calculated as
$$X_i^*=\frac{X_i-\mu}{\sigma} .$$
This process is carried out for each input or output variable, and this needs to be done with care, since we need to use the corresponding mean and standard deviation of each variable. The output of the standardized score has a mean of zero and a variance of one, which means that most standardized values range between $-3.5$ and $3.5$.

# 机器学习代考

## 机器学习代写|机器学习代写machine learning代考|基因组育种值及其估计

.

$$\left(\begin{array}{c} \widehat{\boldsymbol{\beta}} \ \widehat{\boldsymbol{u}} \end{array}\right)=\left(\begin{array}{cc} \boldsymbol{X}^{\mathrm{T}} \boldsymbol{R}^{-1} \boldsymbol{X} & \boldsymbol{X}^{\mathrm{T}} \boldsymbol{R}^{-1} \mathbf{1} \ \mathbf{1}^{\mathrm{T}} \boldsymbol{R}^{-1} \boldsymbol{X} & \mathbf{1}^{\mathrm{T}} \boldsymbol{R}^{-1} \mathbf{1}+\boldsymbol{\sigma}_g^{-2} \boldsymbol{G}^{-1} \end{array}\right)^{-1}\left(\begin{array}{c} \boldsymbol{X}^{\mathrm{T}} \boldsymbol{R}^{-1} \boldsymbol{y} \ \mathbf{1}^{\mathrm{T}} \boldsymbol{R}^{-1} \boldsymbol{y} \end{array}\right),$$
，其中$Z$被$Z=1$取代，$\boldsymbol{\Sigma}$被$\sigma_g^2 G$取代，基因组关系矩阵是用节中描述的一些方法计算出来的

## 机器学习代写|机器学习代写machine learning代考|归一化方法

$$X_i^=X_i-\mu$$ Thẻ cênteréd variablè $X_i^$有一个meañ ô零。这种归一化包括将每个变量(输入或输出)除以其标准差$\sigma$。缩放值计算为
$$X_i^=\frac{X_i}{\sigma} .$$缩放变量$X_i^$具有单位方差。标准化的过程包括计算每个输入或输出的平均值$\mu$和标准差$\sigma$。然后计算标准化值为
$$X_i^*=\frac{X_i-\mu}{\sigma} .$$

