Abstract:
The work is devoted to the identification of texts generated automatically (artificially) with the use of software algorithms. This is an important and topical issue, because such texts are being widely spread on the Internet. Created «copies» of the web pages are used to attract readers to online resources as well as to disseminate a large number of unique copies of pages with content specific orientation.
This article describes the features of determining the origin of the text by the example of working on texts generated by synonymization as the most common method of generating artificial web content. The author provides an invariant of artificial texts as a set of the values of text characteristics, which allows classification of texts according to the process of their creation. The article proposes a method of the artificial texts identification based on the calculation of the belonging measure to the invariants, which allows making a decision about the origin of the text. The article also presents values obtained from the experiments on identifying artificial texts.
Keywords:
automatically generated texts; artificial texts; massively generated texts; text features; text attribution.
Bibliographic databases:
Document Type:
Article
UDC:
004.072.7
Language: Russian
Citation:
A. O. Shumskaya, “Method of the artificial texts identification based on the calculation of the belonging measure to the invariants”, Tr. SPIIRAN, 49 (2016), 104–121
\Bibitem{Shu16}
\by A.~O.~Shumskaya
\paper Method of the artificial texts identification based on the calculation of the belonging measure to the invariants
\jour Tr. SPIIRAN
\yr 2016
\vol 49
\pages 104--121
\mathnet{http://mi.mathnet.ru/trspy919}
\crossref{https://doi.org/10.15622/sp.49.6}
\elib{https://elibrary.ru/item.asp?id=27657125}
Linking options:
https://www.mathnet.ru/eng/trspy919
https://www.mathnet.ru/eng/trspy/v49/p104
This publication is cited in the following 7 articles:
Anastasia Fedotova, Aleksandr Romanov, Anna Kurtukova, Alexander Shelupanov, “Authorship Attribution of Social Media and Literary Russian-Language Texts Using Machine Learning Methods and Feature Selection”, Future Internet, 14:1 (2021), 4
Alexander Rogov, Nikolai Moskin, Kirill Kulakov, Roman Abramov, 2021 30th Conference of Open Innovations Association FRUCT, 2021, 229
A Iskhakova, A Iskhakov, R Meshcheryakov, “Research of the estimated emotional components for the content analysis”, J. Phys.: Conf. Ser., 1203 (2019), 012065
A. O. Iskhakova, A. Yu. Iskhakov, R. V. Mescheryakov, “Podkhod k avtomatizirovannomu vyyavleniyu
materialov destruktivnoi napravlennosti”, Izvestiya Kabardino-Balkarskogo nauchnogo tsentra RAN, 2018, no. 6-2, 203–209
A. D. Khomonenko, V. L. Dashonok, K. A. Ivanova, D. T. Kassymova, 2017 XX IEEE International Conference on Soft Computing and Measurements (SCM), 2017, 737
Botir Usmonov, Oleg Evsutin, Andrei Iskhakov, Alexander Shelupanov, Anastasia Iskhakova, Roman Meshcheryakov, 2017 International Conference on Information Science and Communications Technologies (ICISCT), 2017, 1
Anastasia Iskhakova, Roman Meshcheryakov, 2017 Second Russia and Pacific Conference on Computer Technology and Applications (RPC), 2017, 85