2023年2月6日

## 计算机代写|机器学习代写machine learning代考|Hyperparameters in an ANN model

Hyperparameters in an ANN model can be divided into two categories: one is related to network structure, and the other is related to learning algorithm. The number of hidden layers and the number of neurons contained in each of the hidden layers are two main types of hyperparameters to control the complexity of the network structure in ANNs. Generally speaking, more hidden layers or more neurons contained in one hidden layer increase model complexity, and thus might lead to the problem of over-fitting. One trick to reduce the problem of over-fitting is called dropout. It is a regularization approach that can be applied to input and hidden layers. Dropout means for an input or a hidden layer, temporarily removing a certain proportion (denoted by $p$ ) of neurons from the layer, including the neuron itself as well as the connections with its preceding (if any) and following layers. Then, the outputs of this hidden layer are scaled by multiplying each of the outputs by $(1-p)$. This approach is widely used in deep neural networks (DNNs). Besides, the types of activation functions of the neurons are also hyperparameters that need to be decided before ANN training, and they can have a large impact on model performance.

Hyperparameters regarding the learning algorithm mainly include the learning rate (i.e., $\eta$ ), the number of rounds (i.e., epoch), and batch size. In particular, $\eta$ controls the speed of weight updating. If $\eta$ is too small, the speed of learning would be slow and a larger value of epoch might be needed. If $\eta$ is too large, the optimal values of the weights might be surpassed in the updating process. If the value of epoch is too large, the ANN model developed might learn the data too well, leading to the problem of over-fitting. In contrast, if epoch is too small, the problem of under-fitting might occur. To find a proper value for epoch, a validation set that is independent of the training set should be used to test the performance of the temporary ANN model constructed: if the cost function on the validation set decreases moderately or even increases, the training should be stopped as the problem of overfitting is highly likely to occur. This trick is also called “early stopping.” Finally, batch size is highly dependent on the size of the whole data set and the network structure. Common batch size is $1,2,4,16,32,64,128$, and 256 .

## 计算机代写|机器学习代写machine learning代考|Node splitting in regression trees

When applying DTs to address regression tasks, CART is the most popular algorithm for tree construction, and we only cover CART in this section. Given a training data set $D=\left{\left(\mathbf{x}i, y_i\right), i=1, \ldots, n\right}$ where $y_i$ is continuous, MSE is used as the criterion to split one node in a regression tree in CART. Starting from the root node, the tree is constructed in a greedy manner: to split one node containing data set $D^{\prime}$, all features as well as their values are enumerated to form the feature value pair denoted by (feature, value) or $\left(x, x^j\right)$. Then, examples with the feature less than or equal to the threshold value are split to the left child node, and the other examples are split to the right child node. That is, the example set of the left child node is $D_1^{\prime}\left(x, x^j\right)=\left{i=1, \ldots, n \mid x{i j} \leq x^j\right}$, and the example set of the right child node is $D_2^{\prime}\left(x, x^j\right)=\left{i=1, \ldots, n \mid x_{i j}^j\right}$. The output of one child node is the average targets of the examples contained in that node. That is, the output of $D_1^{\prime}\left(x, x^j\right)$ is $c_1=\frac{1}{\left|D_1^{\prime}(x, j)\right|} \sum_{i \in D_1^{\prime}\left(x, y^j\right)} y_i$, and the output of $D_2^{\prime}\left(x, x^j\right)$ is $c_2=\frac{1}{\left|D_2^{\prime}(x, b,)\right|} \sum_{i \in D_2^{\prime}(x, y)} y_i$. The best split pair is the one that leads to the minimum sum of MSE of the two child nodes, that is

After a regression tree based on CART is constructed, tree pruning can be conducted similar to the classification tree based on CART to reduce over-fitting. The overall procedure of constructing a regression tree using CART is shown in Algorithm 2.

## 计算机代写|机器学习代写machine learning代考|Hyperparameters in an ANN model

ANN模型中的超参数可以分为两类：一类与网络结构有关，一类与学习算法有关。隐藏层的数量和每个隐藏层中包含的神经元数量是控制 ANN 中网络结构复杂性的两种主要超参数。一般来说，隐藏层越多，或者一个隐藏层中包含的神经元越多，模型的复杂度就越高，可能会导致过拟合的问题。减少过拟合问题的一种技巧称为 dropout。这是一种可以应用于输入层和隐藏层的正则化方法。Dropout 是指对于输入层或隐藏层，暂时去除一定比例（记为p) 来自该层的神经元，包括神经元本身以及与其前层（如果有）和后层的连接。然后，这个隐藏层的输出通过将每个输出乘以(1−p). 这种方法广泛用于深度神经网络 (DNN)。此外，神经元的激活函数类型也是超参数，需要在 ANN 训练前确定，它们对模型性能有很大影响。

## 计算机代写|机器学习代写machine learning代考|Node splitting in regression trees

，右子节点的样本集为
$D_{_} 2^{\wedge}{$ prime $} \backslash$ eft $\left(x, x^{\wedge} \backslash\right.$ right $)=\backslash$ eft $\left{i=1, \backslash\right.$ dots, $n \backslash m i d ~ x _{i j} \wedge j \backslash r i g h t$

$c_1=\frac{1}{\left|D_1^{\prime}(x, j)\right|} \sum_{i \in D_1^{\prime}\left(x, y^j\right)} y_i$ ，以及输出 $D_2^{\prime}\left(x, x^j\right)$

