Telecom Churn Prediction

Comments · 68 Views

Churn prediction is customer defection by predicting which customers are likely to cancel a subscription to a service.

Telecom Churn Prediction

Churn prediction is customer defection by predicting which customers are likely to cancel a subscription to a service.

Though originally used within the telecommunications industry, it has become common practice across banks, ISPs, insurance firms.

To analyse customer behaviour,following factors to be analysed

  • Customer demographic data (age, marital status, etc.)
  • Sentiment analysis of social media
  • Customer usage patterns and geographical usage trends
  • Calling-circle data



2.Account length 

3.Area code                                                

4.International plan 

5.Voice mail plan                           

6.Number vmail messages

7.Total day minutes                            

8.Total day calls 

9.Total day charge                            

10.Total eve minutes 

11.Total eve call,Total eve charge        

12.Total night minutes 

13.Total night calls                             

14.Total night charge 

15.Total intl minutes                               

16.Total intl calls 

17.Total intl charge                               

18.Customer service calls 




  • Whether a customer has a high probability of unsubscribing from the service or not
  • Churn is the Label: True or False

Machine learning Algorithms we want to use:

⮚Random forest

⮚Logistic Regression

⮚ Decision tree

Comparing The Accuracy of three different Algorithms


  • Creating schema according to the dataset
  • Importing training and test data set and making churn attribute as label
  • Feature engineering
  • Dropping columns which does not effect churn label


A Transformer is an abstraction that includes feature transformers and learned models.Technically, a Transformer implements a method transform(), which converts one DataFrame into another, generally by appending one or more columns.An Estimator abstracts the concept of a learning algorithm or any algorithm that fits or trains on data. Technically, an Estimator implements a method fit(), which accepts a DataFrame and produces a Model, which is a Transformer.


  • A decision tree model is selected because the rules that come out of the decision tree help to understand the root cause of churn better.
  • The sentiment score is derived from the customer comments text and is an important predictor of churn.
  • Other important predictors that are identified during the data understanding and modeling phase

Logistic Regression

  • Logistic regression is a popular method to predict a categorical response. It is a special case of Linear models that predicts the probability of the outcomes




  • A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimentional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.