## 机器学习代写|机器学习代写machine learning代考|Marker Depuration

First, we will define markers and their importance. Markers are beneficial in the construction of precise genetic relationships, for parental determination and for the identification and mapping of quantitative trait loci (QTL). Between 1970 and 2001, most of the genetic progress in the livestock industry was reached by using pedigree and phenotypic information. However, after the first draft of the human genome project was finished in 2001 (The International SNP Map Working Group 2001), the cost of genotyping using single nucleotide polymorphisms (SNPs) started to decrease considerably, and now its cost is at least 1000 times lower. For this reason, Stonecking (2001) points out that SNPs have become the bread and butter of DNA sequence variation and are essential in determining the genetic potential of livestock and plant breeding.

However, it is also important to point out that other types of DNA markers have been discovered, such as restriction fragment length polymorphisms (RFLP), simple sequence repeat (SSR), Diversity Arrays Technology (DArT), simple sequence length polymorphisms (SSLP), amplified fragment length polymorphisms (AFLP), etc. However, SNPs have become the main markers used to detect DNA variation for some of the following reasons: (a) SNPs are abundant and found throughout the entire genome, in intragenic and extragenic regions (Schork et al. 2000), (b) they represent the most common genetic variants, (c) the location in the DNA: they are found in introns, exons, promoters, enhancers, or intergenic regions, (d) they are easily evaluated by automated means, (e) many of them have direct repercussions on traits of interest in plant and animals, (f) they are generally biallelic, and ( $g)$ they are now cheap and easy to genotype.

It is important to remember that DNA (deoxyribonucleic acid) is organized in pairs of chromosomes, each inherited from one of the parents. The diversity found among organisms is a result of variations in DNA sequences and of environmental effects. Genetic variation is substantial and each individual of a species, with the exception of monozygotic twins, possesses a unique DNA sequence. DNA variations are mutations ressulting from the substitution of single nucleotides (single nucleotide polymorphisms-SNPs), the insertion or deletion of DNA fragments of various lengths (from a single to several thousand nucleotides), or the duplication or inversion of DNA fragments (Marsjan and Oldenbroek 2007). For this reason, the genome is composed of four different nucleotides $(\mathrm{A}, \mathrm{C}, \mathrm{T}$, and $\mathrm{G})$. Next, we provide two important definitions that are keys to understanding how markers are used in genomic selection.

## 机器学习代写|机器学习代写machine learning代考|Methods to Compute the Genomic Relationship Matrix

The three methods described here to calculate the genomic relationship matrix (GRM) are based on VanRaden’s (2008) paper “Efficient methods to compute genomic predictions” where more theoretical support for each of these methods can be found. We assume that we have a matrix of markers of order $J \times p$, where $J$ denotes the number of lines and $p$ the number of markers, and that this matrix does not contain missing values and is coded as 0,1 , and 2 , or $-1,0$, and 1 to refer homozygotes major allele, heterozygous, and homozygous minor allele, respectively. Note that the last codification is related to the first by the relation $\boldsymbol{X}_2=$ $\boldsymbol{X}+\mathbf{1}_J \mathbf{1}_p^{\mathrm{T}}$, where $\boldsymbol{X}_2$ is a matrix of markers information coded in terms of $-1,0$, and 1, while $\boldsymbol{X}$ is the coded marker information in terms of 0 , 1, and 2 , and $\mathbf{1}_q$ is the column vector of dimension $q$ with ones in all its entries.
Method 1. This method calculates the GRM as
$$\boldsymbol{G}=\frac{1}{p} \boldsymbol{X} \boldsymbol{X}^{\mathrm{T}}$$
where $\boldsymbol{X}$ is the matrix of marker genotypes of dimensions $J \times p$. When the marker information is coded as $-1,0$, and 1 as described before, the diagonal terms of $p \boldsymbol{G}$ count the number of homozygous loci for each line, and the off-diagonal of $p \boldsymbol{G}$ is a measure of the number of alleles shared by two lines (VanRaden 2008).

