|
This article is cited in 3 scientific papers (total in 3 papers)
The tasks of identification of informational objects in area-spread data arrays
M. M. Gershkovich, T. K. Birukova Institute of Informatics Problems, Russian Academy of Sciences,
44-2 Vavilov Str., Moscow 119333, Russian Federation
Abstract:
An approach for identification of informational objects (IO) in automatic informational systems employed for data collection, storage, and processing is presented. Information systems consist of multiple nodes and acquire data from multiple sources. In majority of cases, a data array of informational systems is presented as continuously filled event's diary. Each event's record includes characteristics of the event's participant — IO — and of the event's conditions. In order to solve analytical problems related to IO, one should identify IO, i. e., define the array of IOs that are, with certain probability, the same entity. The paper defines typical IO identification tasks for elaboration of large-scale informational systems: IO fusion and IO clustering — forming an aggregate of IOs similar with respect to certain criteria. The identification task is closely connected to the task of identification of links between IOs, as the probability of IO's identity is higher if each IO is associated with another object. The methods for solving these tasks are presented, special features of IO identification in the flow of events are studied, and the correlation search method for detection of associations between IOs is described. The method for comparison of proper names considering probable distortions (phonetic and transcriptional) and misprints is presented. The efficacy of simultaneous Cyrillic and Latin first name – second name blocks application for personal identification is substantiated and the methods for translation from Cyrillic to Latin and vice versa are presented.
Keywords:
identification of informational objects; identification of objects; correlation search; search for associations; identity of objects; fusion of informational objects; fusion of objects; text attributes; data distortions; phonetic distortions; transcriptional errors; Latin to Cyrillic transcription; Cyrillic to Latin transcription; Metaphone; Levenstein's distance; spread systems; area-spread systems; hierarchical systems; flow of events.
Received: 26.02.2014
Citation:
M. M. Gershkovich, T. K. Birukova, “The tasks of identification of informational objects in area-spread data arrays”, Sistemy i Sredstva Inform., 24:1 (2014), 224–243
Linking options:
https://www.mathnet.ru/eng/ssi339 https://www.mathnet.ru/eng/ssi/v24/i1/p224
|
Statistics & downloads: |
Abstract page: | 302 | Full-text PDF : | 205 | References: | 48 |
|