Modelirovanie i Analiz Informatsionnykh Sistem
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Model. Anal. Inform. Sist.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Modelirovanie i Analiz Informatsionnykh Sistem, 2021, Volume 28, Number 3, Pages 250–259
DOI: https://doi.org/10.18255/1818-1015-2021-3-250-259
(Mi mais748)
 

This article is cited in 1 scientific paper (total in 1 paper)

Theory of data

Comparison of style features for the authorship verification of literary texts

K. V. Lagutina

P. G. Demidov Yaroslavl State University, 14 Sovetskaya str., Yaroslavl 150003, Russia
Full-text PDF (510 kB) Citations (1)
References:
Abstract: The article compares character-level, word-level, and rhythm features for the authorship verification of literary texts of the 19th-21st centuries. Text corpora contains fragments of novels, each fragment has a size of about 50 000 characters. There are 40 fragments for each author. 20 authors who wrote in English, Russian, French, and 8 Spanish-language authors are considered.
The authors of this paper use existing algorithms for calculation of low-level features, popular in the computer linguistics, and rhythm features, common for the literary texts. Low-level features include n-grams of words, frequencies of letters and punctuation marks, average word and sentence lengths, etc. Rhythm features are based on lexico-grammatical figures: anaphora, epiphora, symploce, aposiopesis, epanalepsis, anadiplosis, diacope, epizeuxis, chiasmus, polysyndeton, repetitive exclamatory and interrogative sentences. These features include the frequency of occurrence of particular rhythm figures per 100 sentences, the number of unique words in the aspects of rhythm, the percentage of nouns, adjectives, adverbs and verbs in the aspects of rhythm. Authorship verification is considered as a binary classification problem: whether the text belongs to a particular author or not. AdaBoost and a neural network with an LSTM layer are considered as classification algorithms. The experiments demonstrate the effectiveness of rhythm features in verification of particular authors, and superiority of feature types combinations over single feature types on average. The best value for precision, recall, and F-measure for the AdaBoost classifier exceeds 90% when all three types of features are combined.
Keywords: stylometry, natural language processing, style features, rhythm features, authorship verification.
Funding agency Grant number
Russian Foundation for Basic Research 20-37-90045
The reported study was funded by RFBR, project number 20-37-90045.
Received: 04.05.2021
Revised: 20.08.2021
Accepted: 25.08.2021
Document Type: Article
UDC: 004.912
MSC: 68T50
Language: English
Citation: K. V. Lagutina, “Comparison of style features for the authorship verification of literary texts”, Model. Anal. Inform. Sist., 28:3 (2021), 250–259
Citation in format AMSBIB
\Bibitem{Lag21}
\by K.~V.~Lagutina
\paper Comparison of style features for the authorship verification of literary texts
\jour Model. Anal. Inform. Sist.
\yr 2021
\vol 28
\issue 3
\pages 250--259
\mathnet{http://mi.mathnet.ru/mais748}
\crossref{https://doi.org/10.18255/1818-1015-2021-3-250-259}
Linking options:
  • https://www.mathnet.ru/eng/mais748
  • https://www.mathnet.ru/eng/mais/v28/i3/p250
  • This publication is cited in the following 1 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Моделирование и анализ информационных систем
    Statistics & downloads:
    Abstract page:99
    Full-text PDF :61
    References:23
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024