统计代写|统计与机器学习作业代写Statistical and Machine Learning代考|Decision Tree

Decision trees are statistical models designed for supervised prediction problems. Supervised prediction encompasses predictive modeling, pattern recognition, discriminant analysis, multivariate function estimation, and supervised machine learning. A decision tree includes the following components:

• An internal node is a test on an attribute.
• A branch represents an outcome of the test, such as color=purple.
• A leaf node represents a class label or class label distribution.
• At each node, one attribute is chosen to split the training data into distinct classes as much as possible.
• A new instance is classified by following a matching path to a leaf node.
The model is called a decision tree because the model can be represented in a tree-like structure. A decision tree is read from the top down starting at the root node. Each internal node represents a split based on the values of one of the inputs. The inputs can appear in any number of splits throughout the tree. Cases move down the branch that contains its input value. In a binary tree with interval inputs, each internal node is a simple inequality. A case moves left if the inequality is true and right otherwise. The terminal nodes of the tree are called leaves. The leaves represent the predicted target. All cases reaching a leaf are given the same predicted value. The leaves give the predicted class as well as the probability of class membership.

Decision trees can also have multi-way splits where the values of the inputs are partitioned into disjoint ranges.
When the target is categorical, the model is called a classification tree. A classification tree can be thought of as defining several multivariate step functions. Each function corresponds to the posterior probability of a target class. When the target is continuous, the model is a called a regression tree. The leaves give the predicted value of the target. All cases that reach a leaf are assigned the same predicted value. Cases are scored using prediction rules. These prediction rules define the regions of the input space in which the predictions are made. Each prediction rule tries to make the region of the input space purer with regard to the target response value.
To illustrate decision trees using business data, a generic data set containing information about payment is used with a binary target of default. For simplicity, the input variables are:

• Previous delay: the number of previous delays since the time analyzed.
• Over billing: the billing amount difference, or the billing amount divided by average billing amount.
• Aging: the time since the customer first started consuming products or services from the company.

统计代写|统计与机器学习作业代写Statistical and Machine Learning代考|Subscription Fraud

A business problem where decision tree models can be useful is subscription fraud. In telecommunications, subscription fraud is when a fraudster uses a stolen or a synthetic identity to acquire mobile devices and services with no intention to pay. In many countries, telecommunications regulations allow customers to remain insolvent for a period without getting their services blocked. This causes major financial damages to the companies. Subscription fraud in telecommunications can be even worse as the proceeds and services are sometimes used by organized crime and terrorist networks. The main goal of the model is to detect subscription fraud and to prevent intentional bad debts. Fraud analysts need to be careful when assessing the cases to avoid adversely impacting the customer journey for the genuine customers. Blocking genuine communication services by mistake is a genuine problem.
As shown in Figure 3.3, a usual framework involving fraud – either subscription fraud or usage fraud – consists of a customer relationship management (CRM) system to receive customers’ orders. These orders are evaluated by a credit system (it can be accomplished using a credit bureau). In parallel, these orders can also be analyzed by a subscription fraud system, which normally receives information about past customers’ transactions. For example, in telecommunications, all raw transactions (calls or even calls attempted) are fetched by the collection systems. This system sends all transactions to a mediation system to aggregate all information and filter the billable transactions. These billable transactions are sent to the billing systems, which process the bills and charge the customers. All this information, in different levels, are used to evaluate and detect subscription and usage fraud. Historical customer information and transaction information are gathered in the data warehouse, which provides the data needed by the data mining tool, environment, or system to train, evaluate, and deploy the predictive models.When a service order is placed in the call center (CRM), the service representative must decide in a matter of seconds whether the request is a fraudulent event or a genuine request. This can be accomplished using decision tree models that recognize patterns associated with subscription fraud. Some of the information used could include stolen identities, fake addresses, specific payment methods, and known blocked lists. The models are used to compute the probability of subscription fraud, and these scores are relayed to the service representative. With this information, the service representative can decide whether this is subscription fraud or a genuine customer. As the decision tree models are fit because of a set of rules based on thresholds, some of the rules that are the most correlated to the subscription fraud can be communicated to the representative when evaluating the case. The list of rules generated by the decision tree is especially useful to the service representative during the customer call but also to the team of fraud analysts when analyzing the cases afterward. Some of the high probability subscription fraud cases might go through during the customer service call, but fraud analysts can evaluate cases afterward to decide what actions to take on some of the orders.

