Informatika i Ee Primeneniya [Informatics and its Applications]
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Inform. Primen.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Informatika i Ee Primeneniya [Informatics and its Applications], 2014, Volume 8, Issue 4, Pages 94–109
DOI: https://doi.org/10.14357/19922264140412
(Mi ia348)
 

This article is cited in 4 scientific papers (total in 4 papers)

Methods of entity resolution and data fusion in the ETL-process and their implementation in the Hadoop environment

A. E. Vovchenkoa, L. A. Kalinichenkoab, D. Yu. Kovaleva

a Institute of Informatics Problems, Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
b Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, GSP-1, Moscow 119991, Russian Federation
Full-text PDF (654 kB) Citations (4)
References:
Abstract: Entities extraction, their transformation and loading in the integrated repository are the main problem of data integration. These actions are part of the ETL-process (extract–transform–loading). An entity is a digital representation of a real world object (for example, information about a person). Entity resolution takes care of duplicate detection, deduplication, record linkage, object identification, reference matching, and other ETL-related tasks. After the entity resolution step, entities should be merged into the one reference entity (containing information from all related entities). Data fusion is the final step in the data integration process. The paper gives an overview of the entity resolution and data fusion methods. Also, the paper presents the techniques for programming the entity resolution and data fusion methods for implementing the ETL-process in the Hadoop environment. High-Level Integration Language (HIL), a declarative language that focuses on resolution and fusion of entities in the Hadoop-infrastructure, is used in this part of the paper.
Keywords: data integration; ETL; entity resolution; data fusion; big data; Hadoop; Jaql; HIL.
Received: 09.11.2014
Bibliographic databases:
Document Type: Article
Language: Russian
Citation: A. E. Vovchenko, L. A. Kalinichenko, D. Yu. Kovalev, “Methods of entity resolution and data fusion in the ETL-process and their implementation in the Hadoop environment”, Inform. Primen., 8:4 (2014), 94–109
Citation in format AMSBIB
\Bibitem{VovKalKov14}
\by A.~E.~Vovchenko, L.~A.~Kalinichenko, D.~Yu.~Kovalev
\paper Methods of entity resolution and~data fusion in~the~ETL-process and their implementation in the Hadoop environment
\jour Inform. Primen.
\yr 2014
\vol 8
\issue 4
\pages 94--109
\mathnet{http://mi.mathnet.ru/ia348}
\crossref{https://doi.org/10.14357/19922264140412}
\elib{https://elibrary.ru/item.asp?id=22846470}
Linking options:
  • https://www.mathnet.ru/eng/ia348
  • https://www.mathnet.ru/eng/ia/v8/i4/p94
  • This publication is cited in the following 4 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Информатика и её применения
    Statistics & downloads:
    Abstract page:444
    Full-text PDF :278
    References:61
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024