General information
Latest issue

Search papers
Search references

Latest issue
Current issues
Archive issues
What is RSS

Informatics and Automation:

Personal entry:
Save password
Forgotten password?

Trudy SPIIRAN, 2019, Issue 18, volume 2, Pages 354–389
(Mi trspy1049)

This article is cited in 2 scientific papers (total in 2 papers)

Artificial Intelligence, Knowledge and Data Engineering

Sentiment analysis of "AUTOSTRADA.INFO/RU" users’ comments

Ya. A. Seliverstovab, V. I. Chigurc, A. M. Sazanova, S. A. Seliverstovab, A. S. Svistunovad

a Peter the Great St. Petersburg Polytechnic University
b Solomenko Institute of Transport Problems of the Russian academy of sciences
c St. Petersburg State University
d St. Petersburg Institute of Informatics and Automation of the Russian Academy of Sciences (SPIIRAS)
Abstract: As a result of the analysis, it was revealed that social networks (Vkontakte, Facebook), thematic communities in microblogging networks (Twitter), resources for travelers (TripAdvisor), transport portals (Autostrada) are a source of up-to-date and operational information about the traffic situation, the quality of transport services and passenger satisfaction with the quality of levels of transport services. However, the existing transport monitoring systems do not contain software tools capable of collecting and analyzing traffic information located in the Internet environment. This paper discusses the task of building a system for automatically retrieving and classifying road traffic information from transport Internet portals and testing the developed system for analyzing the transport networks of Crimea and the city of Sevastopol. To solve this problem, an analysis of open source libraries for thematic data collection and analysis was carried out. An algorithm for extracting and analyzing texts has been developed. A crawler was developed using the Scrapy package in Python3, and user feedback from the portal was collected on the state of the transport system of Crimea and the city of Sevastopol. For texts lemmatization and vector text transformation, the tf, idf, tf-idf methods and their implementation in the Scikit-Learn library were considered: CountVectorizer and TF-IDF Vectorizer. For word processing, Bag-of-Words and n-gram methods were considered. During the development of the classifier model, the naive Bayes algorithm (MultinomialNB) and the linear classifier model with optimization of the stochastic gradient descent (SGDClassifier) were used. As a training sample, a corpus of 225,000 labeled texts from the Twitter resource was used. The classifier was trained, during which the cross-validation strategy and the ShuffleSplit method were used. Testing and comparison of the results of the pitch classification were carried out. According to the results of validation, the linear model with the n-gram scheme [1, 3] and the vectorizer TF-IDF turned out to be the best. During the approbation of the developed system, the collection and analysis of reviews related to the quality of transport networks of the Republic of Crimea and the city of Sevastopol were conducted. Conclusions are drawn and prospects for further functional development of the developed tools are defined.
Keywords: automatic text analysis, crowlers, classification of texts, intelligent transport systems, machine training, TF-IDF, naive bayes algorithm, linear classifier, sentiment analysis.
Funding agency Grant number
Russian Foundation for Basic Research 18-410-920016
The research is supported by the Russian Foundation for Basic Research within the framework of the project № 18-410-920016 р_а.
Received: 19.02.2019
Bibliographic databases:
Document Type: Article
UDC: 656, 004.8, 007.5, 51-74, 510.67
Language: Russian
Citation: Ya. A. Seliverstov, V. I. Chigur, A. M. Sazanov, S. A. Seliverstov, A. S. Svistunova, “Sentiment analysis of "AUTOSTRADA.INFO/RU" users’ comments”, Tr. SPIIRAN, 18:2 (2019), 354–389
Citation in format AMSBIB
\by Ya.~A.~Seliverstov, V.~I.~Chigur, A.~M.~Sazanov, S.~A.~Seliverstov, A.~S.~Svistunova
\paper Sentiment analysis of "AUTOSTRADA.INFO/RU" users’ comments
\jour Tr. SPIIRAN
\yr 2019
\vol 18
\issue 2
\pages 354--389
Linking options:
  • This publication is cited in the following 2 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Informatics and Automation
    Statistics & downloads:
    Abstract page:260
    Full-text PDF :181
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024