Telecom Churn Prediction
Churn prediction is customer defection by predicting which customers are likely to cancel a subscription to a service.
Though originally used within the telecommunications industry, it has become common practice across banks, ISPs, insurance firms.
To analyse customer behaviour,following factors to be analysed
- Customer demographic data (age, marital status, etc.)
- Sentiment analysis of social media
- Customer usage patterns and geographical usage trends
- Calling-circle data
DATASET ATTRIBUTES
1.State
2.Account length
3.Area code
4.International plan
5.Voice mail plan
6.Number vmail messages
7.Total day minutes
8.Total day calls
9.Total day charge
10.Total eve minutes
11.Total eve call,Total eve charge
12.Total night minutes
13.Total night calls
14.Total night charge
15.Total intl minutes
16.Total intl calls
17.Total intl charge
18.Customer service calls
19.Churn
PROPOSED SOLUTION
- Whether a customer has a high probability of unsubscribing from the service or not
- Churn is the Label: True or False
Machine learning Algorithms we want to use:
⮚Random forest
⮚Logistic Regression
⮚ Decision tree
Comparing The Accuracy of three different Algorithms
CODE(Preprocessing)
- Creating schema according to the dataset
- Importing training and test data set and making churn attribute as label
- Feature engineering
- Dropping columns which does not effect churn label
PIPELINE CONSTRUCTION
A Transformer is an abstraction that includes feature transformers and learned models.Technically, a Transformer implements a method transform(), which converts one DataFrame into another, generally by appending one or more columns.An Estimator abstracts the concept of a learning algorithm or any algorithm that fits or trains on data. Technically, an Estimator implements a method fit(), which accepts a DataFrame and produces a Model, which is a Transformer.
DECISION TREE
- A decision tree model is selected because the rules that come out of the decision tree help to understand the root cause of churn better.
- The sentiment score is derived from the customer comments text and is an important predictor of churn.
- Other important predictors that are identified during the data understanding and modeling phase
Logistic Regression
- Logistic regression is a popular method to predict a categorical response. It is a special case of Linear models that predicts the probability of the outcomes
SVM:
- A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimentional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.