Proceedings of the Institute for System Programming of the RAS
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Proceedings of the Institute for System Programming of the RAS, 2021, Volume 33, Issue 1, Pages 209–224
DOI: https://doi.org/10.15514/ISPRAS-2021-33(1)-14
(Mi tisp582)
 

This article is cited in 1 scientific paper (total in 1 paper)

Plagiarism detection in armenian texts using intrinsic stylometric analysis

Ye. M. Yeshilbashian, A. A. Asatryan, Ts. G. Ghukasyan

Russian-Armenian University
References:
Abstract: In this work we study the application of intrinsic stylometric methods to the task of plagiarism detection in Armenian texts. We use two task setups from PAN's series of conferences on text forensics and stylometry: style change detection and style breach detection. Style change detection aims to determine whether the text is written by more than one author, while style breach detection detects the boundaries of stylistically distinct text fragments. For these tasks, we generate synthetic test sets for three genres of text: academic, literature, and news, and then use them to evaluate the effectiveness of hierarchical clustering and other relevant models from PAN conferences. We employ a standard set of character-level, lexical and readability features, and additionally perform morphological and dependency parsing of text fragments to extract syntactic features encoding author style information. The evaluation results show that the clustering-based approach fails to correctly detect style change detection in longer texts and is only marginally better for shorter texts. For style breach detection, hierarchical clustering-based approach performs better than a random baseline classifier, but the difference is not sufficient to warrant its practical use. In a complementary experiment, we show that reducing the number of features and multicollinearity in them via PCA helps to increase the precision of style breach detection methods for certain text categories.
Keywords: stylometric analysis, plagiarism detection.
Document Type: Article
Language: Russian
Citation: Ye. M. Yeshilbashian, A. A. Asatryan, Ts. G. Ghukasyan, “Plagiarism detection in armenian texts using intrinsic stylometric analysis”, Proceedings of ISP RAS, 33:1 (2021), 209–224
Citation in format AMSBIB
\Bibitem{YesAsaGhu21}
\by Ye.~M.~Yeshilbashian, A.~A.~Asatryan, Ts.~G.~Ghukasyan
\paper Plagiarism detection in armenian texts using intrinsic stylometric analysis
\jour Proceedings of ISP RAS
\yr 2021
\vol 33
\issue 1
\pages 209--224
\mathnet{http://mi.mathnet.ru/tisp582}
\crossref{https://doi.org/10.15514/ISPRAS-2021-33(1)-14}
Linking options:
  • https://www.mathnet.ru/eng/tisp582
  • https://www.mathnet.ru/eng/tisp/v33/i1/p209
  • This publication is cited in the following 1 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Proceedings of the Institute for System Programming of the RAS
    Statistics & downloads:
    Abstract page:104
    Full-text PDF :55
    References:22
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024