Vestnik Yuzhno-Ural'skogo Universiteta. Seriya Matematicheskoe Modelirovanie i Programmirovanie
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Submit a manuscript

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Vestnik YuUrGU. Ser. Mat. Model. Progr.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Vestnik Yuzhno-Ural'skogo Universiteta. Seriya Matematicheskoe Modelirovanie i Programmirovanie, 2012, Issue 13, Pages 119–127 (Mi vyuru74)  

Programming & Computer Software

A Distributed Dictionary — Based Morphological Analysis Framework for Russian Language Processing

D. A. Ustalov, M. L. Goldstein

Institute of Mathematics and Mechanics, Ural Branch of the Russian Academy of Sciences (Yekaterinburg, Russian Federation)
References:
Abstract: This article describes an approach to scaling service morphological parsing of words of natural language processing of various collections of documents in Russian. An overview and critical analysis of existing solutions. The requirements workbench vocabulary morphological analyzer were established. The distributed architecture of the web service morphological analysis, designed to a handle large collections of documents in Russian, presented the form of a structural model. This architecture is implemented as a prototype system in the programming language Ruby. The structure used in the morphological dictionary of a relational schema. Tests of this method in a distributed computing environment showed linear scalability of the proposed solutions. The configuration of the experiment involves the generation of the system load as a HTTP requests, system load balancing working nodes of a distributed system, application servers with a functioning database analyzer and morphological dictionary, as well as a caching node to reduce costs when you run queries to the dictionary. Applying this approach provides a linear increase in performance in distributed systems, automated processing of large volumes of text.
Keywords: distributed computing, natural language processing, corpus linguistics, data-intensive computing, morphological analysis.
Received: 08.06.2012
Document Type: Article
UDC: 004.912
MSC: 68T50
Language: Russian
Citation: D. A. Ustalov, M. L. Goldstein, “A Distributed Dictionary — Based Morphological Analysis Framework for Russian Language Processing”, Vestnik YuUrGU. Ser. Mat. Model. Progr., 2012, no. 13, 119–127
Citation in format AMSBIB
\Bibitem{UstGol12}
\by D.~A.~Ustalov, M.~L.~Goldstein
\paper A Distributed Dictionary --- Based Morphological Analysis Framework for Russian Language Processing
\jour Vestnik YuUrGU. Ser. Mat. Model. Progr.
\yr 2012
\issue 13
\pages 119--127
\mathnet{http://mi.mathnet.ru/vyuru74}
Linking options:
  • https://www.mathnet.ru/eng/vyuru74
  • https://www.mathnet.ru/eng/vyuru/y2012/i13/p119
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Statistics & downloads:
    Abstract page:169
    Full-text PDF :82
    References:62
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024