Modelirovanie i Analiz Informatsionnykh Sistem
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Model. Anal. Inform. Sist.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Modelirovanie i Analiz Informatsionnykh Sistem, 2022, Volume 29, Number 4, Pages 316–332
DOI: https://doi.org/10.18255/1818-1015-2022-4-316-332
(Mi mais782)
 

Theory of data

Detecting mentions of green practices in social media based on text classification

A. V. Glazkovaa, O. V. Zakharovaa, A. V. Zakharova, N. N. Moskvinaa, T. R. Enikeevb, A. N. Hodyreva, V. K. Borovinskiya, I. N. Pupyshevaa

a University of Tyumen, 6 Volodarskogo str., Tyumen 625003, Russia
b Novosibirsk State University, 1 Pirogova str., Novosibirsk 630090, Russia
References:
Abstract: The paper is devoted to the task of searching for mentions of green practices in social media texts. The relevance of this task is dictated by the need to expand existing knowledge about the use of green practices in society and the spread of existing green practices. This paper uses a text corpus consisting of the texts published on the environmental communities of the VKontakte social network. The corpus is equipped with an expert markup of the mention of nine types of green practices. As part of this work, a semi-automatic approach is proposed to the collection of additional texts to reduce the class imbalance in the corpus. The approach includes the following steps: detecting the most frequent words for each practice type; automatic collecting texts in social media that contain the detected frequent words; expert verification and filtering of collected texts. The four machine learning models are compared to find the mentions of green practices on the two variants of the corpus: original and augmented using the proposed approach. Among the listed models, the highest averaged F1-score (81.32%) was achieved by Conversational RuBERT fine-tuned on the augmented corpus. Conversational RuBERT model was chosen for the implementation of the application prototype. The main function of the prototype is to detect the presence of the mention of nine types of green practices in the text. The prototype is implemented in the form of the Telegram chatbot.
Keywords: text classification, social network analysis, machine learning, BERT, green practices, natural language processing.
Funding agency Grant number
Ministry of Science and Higher Education of the Russian Federation
The work was carried out during the Big Mathematical Workshop of the Mathematical Center in Akademgorodok.
Received: 06.10.2022
Revised: 11.11.2022
Accepted: 16.11.2022
Document Type: Article
UDC: 004.912
MSC: 68T50
Language: Russian
Citation: A. V. Glazkova, O. V. Zakharova, A. V. Zakharov, N. N. Moskvina, T. R. Enikeev, A. N. Hodyrev, V. K. Borovinskiy, I. N. Pupysheva, “Detecting mentions of green practices in social media based on text classification”, Model. Anal. Inform. Sist., 29:4 (2022), 316–332
Citation in format AMSBIB
\Bibitem{GlaZakZak22}
\by A.~V.~Glazkova, O.~V.~Zakharova, A.~V.~Zakharov, N.~N.~Moskvina, T.~R.~Enikeev, A.~N.~Hodyrev, V.~K.~Borovinskiy, I.~N.~Pupysheva
\paper Detecting mentions of green practices in social media based on text classification
\jour Model. Anal. Inform. Sist.
\yr 2022
\vol 29
\issue 4
\pages 316--332
\mathnet{http://mi.mathnet.ru/mais782}
\crossref{https://doi.org/10.18255/1818-1015-2022-4-316-332}
Linking options:
  • https://www.mathnet.ru/eng/mais782
  • https://www.mathnet.ru/eng/mais/v29/i4/p316
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Моделирование и анализ информационных систем
    Statistics & downloads:
    Abstract page:58
    Full-text PDF :24
    References:5
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024