|
Informatics, Computer Science and Control
Analysis of the texts for predicting the churn of ISP
A. A. Karyakina, D. S. Botov Chelyabinsk State University, Chelyabinsk, Russia
Abstract:
The possibility of forecasting the churn of customers based
on the data
of the Russian ISP are considered. The basic stages and approaches to the preliminary
processing of the texts of operators’ comments have been determined. It’s offered to
use
classification algorithms such as the logistic regression, $k$-nearest neighbors method,
the gradient
boosting, the naive Bayesian algorithm. As a sample, an array of input data from 23
features
of 380 000 subscribers was formed. Typos are correcting with using the Dahmerau — Levenshtein
distance and lemmatizing of the textual information, and then they are converted into a feature
vector
using the TF-IDF method and are added to the model. The main approaches of
categorical features coding are determined. The forecast models are constructed. Comparison of
the results of the study with different classifiers is made and conclusions are drawn.
Keywords:
prediction, clients churn, ISP, python, customers calls, classification, analysis of texts, tf-idf.
Received: 31.12.2017 Revised: 04.05.2018
Citation:
A. A. Karyakina, D. S. Botov, “Analysis of the texts for predicting the churn of ISP”, Chelyab. Fiz.-Mat. Zh., 3:2 (2018), 227–236
Linking options:
https://www.mathnet.ru/eng/chfmj102 https://www.mathnet.ru/eng/chfmj/v3/i2/p227
|
Statistics & downloads: |
Abstract page: | 261 | Full-text PDF : | 138 | References: | 35 |
|