统计代写|统计与机器学习作业代写Statistical and Machine Learning代考|Logistic Regression

Logistic regressions are closely related to linear regressions. In logistic regression, the expected value of the target is transformed by a link function to restrict its value to the unit interval. In this way, model predictions can be viewed as primary outcome probabilities between 0 and 1 . A linear combination of the inputs generates a logit score, or the log of the odds of the primary outcome, in contrast to linear regression, which estimates the value of the target. The range of logit scores is from negative infinity to positive infinity. For binary prediction, any monotonic function that maps the unit interval to the real number line can be considered as a link. The logit link function is one of the most common. Its popularity is due, in part, to the interpretability of the model.
For example, if you want to use logistic regression for classification of a binary target, you would want to restrict the range of the output to be between 0 and 1 . The logit link function transforms the continuous logit scores into probabilities between 0 and 1. The continuous logit scores, or the logit of $\hat{p}$, is given by the log of the odds, which is the log of the probability of the event divided by the probability of the non-event. This logit transformation transforms the probability scale to the real line of negative infinity to positive infinity. Therefore, the logit can be modeled with a linear combination since linear combinations can take on any value.
The logistic model is particularly easy to interpret because each predictor variable affects the logit linearly. The coefficients are the slopes. Exponentiating each parameter estimate gives the odds ratios, which compares the odds of the event in one group to the odds of the event in another group.

The odds ratio shows the strength of the association between the predictor variable and the target variable. If the odds ratio is 1 , then there is no association between the predictor variable and the target. If the odds ratio is greater than 1 , then the group in the numerator has higher odds of having the event. If the odds ratio is between 0 and 1 , then the group in the denominator has higher odds of having the event. For example, an odds ratio of 3 indicates that the odds of getting the event for the group in the numerator are three times that for the group in the denominator. The group in the numerator and denominator is based on the coding of the input variable. For example, if the parameter estimate for the input variable age is $0.97$, then the exponent is $2.66$. That means, for a one unit increase in age, the odds are increased by $166 \%((2.66-1) * 100)$.

统计代写|统计与机器学习作业代写Statistical and Machine Learning代考|Collecting Predictive Model

An example of a business problem that can be addressed using logistic regression is the probability of debt repayment. The main goal of the model is to rank insolvent customers based on the probability of paying off their unpaid bills. The results of the model can be used to target the insolvent customers who are more likely to pay off their debt. The input variables in the data are:

• Demographic information about the customers
• Payment type, day, frequency of payment, amount paid, and so on
• Payment delay information
• Aging of the customer in the company, product, or service
• Credit history for the customer
• Past delinquent bills for the customer
• Debt to income ratio, debt to bill ratio, total debt to total bill ratio, and so on
• Others
The target variable is:
• Whether the customer has paid off the unpaid bill
Telecommunications companies usually rank insolvent customers by how much they owe or by the age of the unpaid bills. Then, they target the customers with the largest or oldest debt. However, if the telecommunications company ranked the customers based on the probability of payment and targeted the customers with the highest probabilities, there might be an increase in revenue. This problem can cause even more damage in countries with unstable economic situations and high inflation rates. If customers do not pay their bills, the company needs to get money from the financial market to maintain the cash flow. This money costs much more than the company charges the insolvent customers in terms of fees and interest rates. The longer the customers remain insolvent, more money needs to be collected from the financial market. The main goal for this model is to allow companies to anticipate cash by contacting customers who are likely to pay their unpaid bills first.

