• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础
## 计算机代写|机器学习代写machine learning代考|Large item sets

To introduce how association rules are generated from a given database, we first introduce some basic concepts. Denote a set of $N$ items by $I=\left{i_1, \ldots, i_N\right}$. An item set is a subset of $I$, and an item set containing $n^{\prime}$ items (where $n^{\prime} \in[1, \ldots, N]$ ) is also called an $n^{\prime}$ item set and is denoted by $I_{n^{\prime}}$. A database consisting of $M$ records, where each record is an item set, is denoted by $D=\left{t_1, \ldots, t_M\right}$. Define the event of observing the occurrence of a particular item set $I_{n^{\prime}}$ by $E\left(I_{n^{\prime}}\right)$, which means that all the items in $I_{n^{\prime}}$ are observed in one record. We further define $P\left(E\left(I_{n^{\prime}}\right)\right)$ as the proportion of the $M$ records that have all the items in $I_{n^{\prime}}$, which can also be interpreted as the probability of the occurrence of event $E\left(I_{n^{\prime}}\right)$. It should also be noted that a record that includes all the items in item set $I_{n^{\prime}}$ can also include items not in $I_{n^{\prime}}$ but in $I$. The probability of $E\left(I_{n^{\prime}}\right)$ is also called the Support of item set $I_{n^{\prime}}$, that is, $P\left(E\left(I_{n^{\prime}}\right)\right)=\operatorname{Sup}\left(I_{n^{\prime}}\right)$, and $P\left(E\left(I_{n^{\prime}}\right)\right) \in[0,1]$. A larger value of $P\left(E\left(I_{n^{\prime}}\right)\right)$ indicates that the more frequently item set $I_{n^{\prime}}$ occurs in $D$. In order to define a large item set denoted by $I_{n^{\prime}}^$ which frequently occurs in the database, we define the minimum threshold of Support for an item set to be a large item set by min_Sup. That is, if and only if $I_{n^{\prime}}^ \subseteq I$ and $I_{n^{\prime}}^* \neq \emptyset$ is a large item set, we have Sup $\left(I_{n^{\prime}}^*\right) \geq$ min_Sup.

A rule is generated by dividing a large $n^{\prime}$ item set, i.e., $I_{n^{\prime}}^$ and $n^{\prime} \geq 2$, into two mutually exclusive and non-empty item sets $I_j$ and $I_k$, with $I_j \cup I_k=I_{n^{\prime}}^$. A rule can be generated from $I_j$ to $I_k$ in form $I_j \rightarrow I_k$. To determine whether rule $I_j \rightarrow I_k$ is an association rule denoted by $I_j \Rightarrow I_k$, two indicators are further introduced: Confidence and Lift of $I_j \rightarrow I_k$ is calculated by
$$\operatorname{Conf}\left(I_j \rightarrow I_k\right)=\frac{P\left(E\left(I_j\right) \cap E\left(I_k\right)\right)}{P\left(E\left(I_j\right)\right)}=P\left(E\left(I_k\right) \mid E\left(I_j\right)\right),$$
and $\operatorname{Conf}\left(I_j \rightarrow I_k\right) \in[0,1]$. The larger value Confidence is, the more likely that the items in $I_k$ appear given that the items in item set $I_j$ appear. Lift of $I_j \rightarrow I_k$ is calculated by
$$L i f t\left(I_j \rightarrow I_k\right)=\frac{P\left(E\left(I_j\right) \cap E\left(I_k\right)\right)}{P\left(E\left(I_j\right)\right) \times P\left(E\left(I_k\right)\right)}=\frac{P\left(E\left(I_k\right) \mid E\left(I_j\right)\right)}{P\left(E\left(I_k\right)\right)},$$
and $\operatorname{Lift}\left(I_j \rightarrow I_k\right) \in[0,+\infty)$ presents the influence of the occurrence of event $E\left(I_j\right)$ on event $E\left(I_k\right)$, which is the ratio of the probability of the occurrence of event $E\left(I_k\right)$ under the condition that event $E\left(I_j\right)$ occurs and the probability that event $E\left(I_k\right)$ occurs unconditionally in the database. This can be interpreted as how the occurrence of event $E\left(I_j\right)$ can increase/decrease (i.e., lift) the occurrence of $E\left(I_k\right)$. To be more specific, if lift $\left.I_j \rightarrow I_k\right) \in[0,1)$, the occurrence of $E\left(I_j\right)$ decreases the probability of the occurrence of $E\left(I_k\right)$. If $L i f t\left(I_j \rightarrow I_k\right) \in(1,+\infty)$, the occurrence of $E\left(I_j\right)$ increases the probability of the occurrence of $E\left(I_k\right)$. If $\operatorname{Lift}\left(I_j \rightarrow I_k\right)=1$, the occurrence of $E\left(I_j\right)$ has no influence on the occurrence of $E\left(I_k\right)$, that is, $E\left(I_j\right)$ and $E\left(I_k\right)$ are independent. It is also interesting to find that as event $E\left(I_k\right)$ acts as the denominator to calculate Lift of rule $I_j \rightarrow I_k$, if $P\left(E\left(I_k\right)\right)$ is large, meaning that the occurrence probability of event $E\left(I_k\right)$ is high, the value of $L i f t\left(I_j \rightarrow I_k\right)$ would be reduced. This shows that a frequently occurring event would have less contribution to generating association rules compared to rare events.

## 计算机代写|机器学习代写machine learning代考|Distance measure in clustering

The key point in cluster analysis is how to measure the “similarity” between two examples in the data set, and this is usually achieved by the calculation of “distance”

of these two examples. Distance measure is an objective score used to measure the relative difference/dissimilarity between two examples in the problem of concern. The distance between two examples $\mathbf{x}_i$ and $\mathbf{x}_j$ is denoted by $\operatorname{dist}\left(\mathbf{x}_i, \mathbf{x}_j\right)$, which satisfies the following properties:

1. non-negativity: $\operatorname{dist}\left(\mathbf{x}_i, \mathbf{x}_j\right) \geq 0$;
2. identity: If and only if $\mathbf{x}_i=\mathbf{x}_j$, $\operatorname{dist}\left(\mathbf{x}_i, \mathbf{x}_j\right)=0$;
3. symmetry: $\operatorname{dist}\left(\mathbf{x}_i, \mathbf{x}_j\right)=\operatorname{dist}\left(\mathbf{x}_j, \mathbf{x}_i\right)$; and
4. triangle inequality: $\operatorname{dist}\left(\mathbf{x}i, \mathbf{x}_j\right) \leq \operatorname{dist}\left(\mathbf{x}_i, \mathbf{x}_k\right)+\operatorname{dist}\left(\mathbf{x}_k, \mathbf{x}_j\right)$. Features of an example can be numerical and categorical. Numerical features are ordinal, where the relative feature values are comparable. For example, ship age is a numerical feature, where a ship of age 5 is younger than a ship of age 10 by 5 years. Categorical features can be either ordinal, where the relative feature values are comparable like numerical features (e.g., low, medium, and high for ship company performance, where a ship company with high performance is better than a ship company with medium performance, and is much better than a ship company with low performance), or nominal, where the feature values only indicate the categories and cannot be compared (e.g., container ship, bulk carrier, and passenger ship belonging to the feature of ship type, and they cannot be compared directly with each other). As feature values are comparable in ordinal features and noncomparable in nominal features, different means of distance measure should be used in these two types of features. For data set $D$ with $m$ features, denote the number of its ordinal features by $m_1$ and the number of its nominal features by $m_2$, where $m=m_1+m_2$. For ordinal features, Minkowski distance taking the following form is the most popular one: $$\operatorname{dist}{m k k}\left(\mathbf{x}i, \mathbf{x}_j\right)=\left(\sum{m^{\prime}=1}^{m_1}\left|x_{i m^{\prime}}-x_{j m^{\prime}}\right|^p\right)^{\frac{1}{p}},$$
where the subscript is $m k$ short for Minkowski, and $p$ should be no less than 1, such that the properties of distance measure can be satisfied. Common values of $p$ are 1 and 2. When $p=1$, Equation (11.1) is also called Manhattan distance, and can be written as
$$\operatorname{dist}{\operatorname{man}}\left(\mathbf{x}_i, \mathbf{x}_j\right)=\left|\mathbf{x}_i-\mathbf{x}_j\right|_1=\sum{m^{\prime}=1}^{m_1}\left|x_{i m^{\prime}}-x_{j m^{\prime}}\right| .$$

