|
Artificial intelligence
Extracting named entities from russian-language documents with different expressiveness of structure
M. D. Averina, O. A. Levanova P.G. Demidov Yaroslavl State University, 14 Sovetskaya str., Yaroslavl 150003, Russia
Abstract:
This work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of text features and CRF parameters (optimization algorithms). In average for all entities, the best F-measure value for structured documents was 0.99, and for semi-structured ones 0.86.
Keywords:
named entity extraction, CRF.
Received: 13.10.2023 Revised: 10.11.2023 Accepted: 15.11.2023
Citation:
M. D. Averina, O. A. Levanova, “Extracting named entities from russian-language documents with different expressiveness of structure”, Model. Anal. Inform. Sist., 30:4 (2023), 382–393
Linking options:
https://www.mathnet.ru/eng/mais810 https://www.mathnet.ru/eng/mais/v30/i4/p382
|
Statistics & downloads: |
Abstract page: | 34 | Full-text PDF : | 20 | References: | 11 |
|