Vestnik Sankt-Peterburgskogo Universiteta. Seriya 10. Prikladnaya Matematika. Informatika. Protsessy Upravleniya
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Vestnik S.-Petersburg Univ. Ser. 10. Prikl. Mat. Inform. Prots. Upr.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Vestnik Sankt-Peterburgskogo Universiteta. Seriya 10. Prikladnaya Matematika. Informatika. Protsessy Upravleniya, 2018, Volume 14, Issue 4, Pages 334–345
DOI: https://doi.org/10.21638/11701/spbu10.2018.406
(Mi vspui381)
 

Computer science

Setting lower bounds on Jensen–Shannon divergence and its application to nearest neighbor document search

V. Yu. Dobrynina, N. Rooneyb, J. A. Serdyukc

a St. Petersburg State University, 7–9, Universitetskaya nab., St. Petersburg, 199034, Russian Federation
b Sophia Ltd, Northern Ireland Science Park, the Innovation Centre, Queen’s Road, Queen’s Island, Belfast, BT3 9DT, Northern Ireland
c Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow, 119991, Russian Federation
References:
Abstract: The Jensen–Shannon divergence provides a mechanism to determine nearest neighbours in a document collection to a specific query document. This is an effective mechanism however for exhaustive search this can be a time-consuming process. In this paper, we show by setting lower bounds on the Jensen–Shannon divergence search we can reduce by up to a factor of 60% the level of calculation for exhaustive search and 98% for approximate search, based on the nearest neighbour search in a real-world document collection. In these experiments a document corpus that contains 1 854 654 articles published in New York Times from 1987-01-01 till 2007-06-19 (The New York Times Annotated Corpus) was used. As queries, 100 documents from same document corpus were selected randomly. We assess the effect on performance based on the reduction in the number of log function calculations. Approximate nearest neighbour search is based on clustering of documents using Contextual Document Clustering algorithm. We perform an approximated nearest neighbour search by finding the best matching set of cluster attractors to a query and limiting the search for documents to the attractors' corresponding clusters.
Keywords: Jensen–Shannon divergence, nearest neighbors search, dimensionality reduction.
Received: June 5, 2018
Accepted: September 25, 2018
Bibliographic databases:
Document Type: Article
UDC: 519.2
MSC: 60E05
Language: English
Citation: V. Yu. Dobrynin, N. Rooney, J. A. Serdyuk, “Setting lower bounds on Jensen–Shannon divergence and its application to nearest neighbor document search”, Vestnik S.-Petersburg Univ. Ser. 10. Prikl. Mat. Inform. Prots. Upr., 14:4 (2018), 334–345
Citation in format AMSBIB
\Bibitem{DobRooSer18}
\by V.~Yu.~Dobrynin, N.~Rooney, J.~A.~Serdyuk
\paper Setting lower bounds on Jensen--Shannon divergence and its application to nearest neighbor document search
\jour Vestnik S.-Petersburg Univ. Ser. 10. Prikl. Mat. Inform. Prots. Upr.
\yr 2018
\vol 14
\issue 4
\pages 334--345
\mathnet{http://mi.mathnet.ru/vspui381}
\crossref{https://doi.org/10.21638/11701/spbu10.2018.406}
\elib{https://elibrary.ru/item.asp?id=36687361}
Linking options:
  • https://www.mathnet.ru/eng/vspui381
  • https://www.mathnet.ru/eng/vspui/v14/i4/p334
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Вестник Санкт-Петербургского университета. Серия 10. Прикладная математика. Информатика. Процессы управления
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024