Informatika i Ee Primeneniya [Informatics and its Applications]
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Inform. Primen.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Informatika i Ee Primeneniya [Informatics and its Applications], 2013, Volume 7, Issue 3, Pages 2–13
DOI: https://doi.org/10.14357/19922264130301
(Mi ia267)
 

This article is cited in 1 scientific paper (total in 1 paper)

Unsupervised approach to web wrapper maintenance

A. M. Andreev, D. V. Berezkin, I. A. Kozlov, K. V. Simakov

Bauman Moscow State Technical University
Full-text PDF (680 kB) Citations (1)
References:
Abstract: HTML-wrapper applications rely on formatting regularities of targeted websites. Therefore, maintenance of such applications is connected with the problem of detecting markup changes of web pages. This article describes the unsupervised approach to this problem. The proposed method of detection consists of two parts: the real-time one based on clustering considering HTML-document as a vector of some features and the time-lagged one based on comparison of distributions of such features for learning and testing sets of HTML-documents. There have been carried out several experiments with data obtained from real wrapper. The results reveal feasibility of the suggested approach.
Keywords: wrapper maintenance; web-site parsing; clustering; HTML-markup statistical processing.
Bibliographic databases:
Document Type: Article
Language: Russian
Citation: A. M. Andreev, D. V. Berezkin, I. A. Kozlov, K. V. Simakov, “Unsupervised approach to web wrapper maintenance”, Inform. Primen., 7:3 (2013), 2–13
Citation in format AMSBIB
\Bibitem{AndBerKoz13}
\by A.~M.~Andreev, D.~V.~Berezkin, I.~A.~Kozlov, K.~V.~Simakov
\paper Unsupervised approach to~web wrapper maintenance
\jour Inform. Primen.
\yr 2013
\vol 7
\issue 3
\pages 2--13
\mathnet{http://mi.mathnet.ru/ia267}
\crossref{https://doi.org/10.14357/19922264130301}
\elib{https://elibrary.ru/item.asp?id=20446779}
Linking options:
  • https://www.mathnet.ru/eng/ia267
  • https://www.mathnet.ru/eng/ia/v7/i3/p2
  • This publication is cited in the following 1 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Информатика и её применения
    Statistics & downloads:
    Abstract page:224
    Full-text PDF :95
    References:34
    First page:2
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024