|
Numerical methods and programming, 2011, Volume 12, Issue 3, Pages 58–72
(Mi vmp220)
|
|
|
|
Программирование
A detection method for mass-generated unnatural texts based on the
topical structure analysis
A. S. Pavlova, B. V. Dobrovb a M. V. Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics
b M.V. Lomonosov Moscow State University, Research Computing Center
Abstract:
Web spam is considered to be one of the greatest threats to modern search engines.
Spammers use a wide range of algorithms to generate multiple unnatural texts.
A new general model for texts generated from samples of natural texts is proposed.
A new algorithm for detecting unnatural texts based on the topical structure
analysis is also proposed. The proposed algorithm is evaluated on synthetic and
real-world data.
Keywords:
web spam; topical structure; modeling.
Citation:
A. S. Pavlov, B. V. Dobrov, “A detection method for mass-generated unnatural texts based on the
topical structure analysis”, Num. Meth. Prog., 12:3 (2011), 58–72
Linking options:
https://www.mathnet.ru/eng/vmp220 https://www.mathnet.ru/eng/vmp/v12/i3/p58
|
Statistics & downloads: |
Abstract page: | 519 | Full-text PDF : | 129 |
|