S. A. Krasnov, A. S. Ilatovsky, A. D. Khomonenko, V. N. Arseniev, “Assessment of semantic smilarity of documents on the basis of the latent semantic analysis with the automatic choice of rank values”, Tr. SPIIRAN, 54 (2017), 185

Trudy SPIIRAN

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Informatics and Automation:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Trudy SPIIRAN, 2017, Issue 54, Pages 185–204
DOI: https://doi.org/10.15622/sp.54.8 (Mi trspy971)

Algorithms and Software

Assessment of semantic smilarity of documents on the basis of the latent semantic analysis with the automatic choice of rank values

S. A. Krasnov^a, A. S. Ilatovsky^a, A. D. Khomonenko^b, V. N. Arseniev^a

^a Mozhaisky Military Space Academy
^b Emperor Alexander I St. Petersburg State Transport University

Full-text PDF (1059 kB)

DOI: https://doi.org/10.15622/sp.54.8

Abstract: The method of assessment of semantic similarity of documents, which is based on the use of the latent and semantic analysis, dynamics of change of singular values of a term-document matrix and automatic determination of a range of rank values, is offered. Assessment of semantic similarity of documents is considered in relation to the solution of problems of identification of duplication and contradictions in databases and storages of data.
A short review of the approaches used at assessment of semantic similarity of documents, identification of duplication and contradictions in databases is provided. Results of numerical examples of assessment of semantic dependences between terms of documents for the benefit of identification of duplication and contradictions in databases and storages of data are given. In this case, the degree of correspondence between the compared documents as the resultant characteristic is calculated.
Comparative estimates of the accuracy of the calculation of the degree of correspondence of $\lambda$ documents with the help of the main methods (cosine proximity measure, vector model, Spearman rank correlation coefficient, static measure tf-idf–frequency of the term–reverse document frequency) are given.
It is shown that application of the offered method of the latent and semantic analysis with automatic detection of a range of rank values allows eliminating dependence of results of application of a method of the latent semantic analysis on the chosen rank.

Keywords: assessment of semantic similarity of documents; identification of duplications and contradictions; databases; latent semantic analysis; statistical analysis; cosine measure of proximity; vector model.

Bibliographic databases:

Document Type: Article

UDC: 004.912

Language: Russian

Citation: S. A. Krasnov, A. S. Ilatovsky, A. D. Khomonenko, V. N. Arseniev, “Assessment of semantic smilarity of documents on the basis of the latent semantic analysis with the automatic choice of rank values”, Tr. SPIIRAN, 54 (2017), 185–204

Citation in format AMSBIB

\Bibitem{KraIlaKho17}

\by S.~A.~Krasnov, A.~S.~Ilatovsky, A.~D.~Khomonenko, V.~N.~Arseniev

\paper Assessment of semantic smilarity of documents on the basis of the latent semantic analysis with the automatic choice of rank values

\jour Tr. SPIIRAN

\yr 2017

\vol 54

\pages 185--204

\mathnet{http://mi.mathnet.ru/trspy971}

\crossref{https://doi.org/10.15622/sp.54.8}

\elib{https://elibrary.ru/item.asp?id=30282025}

Linking options:

https://www.mathnet.ru/eng/trspy971

https://www.mathnet.ru/eng/trspy/v54/p185

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Что такое QR-код?

Registration to the website

Logotypes