Proceedings of the Institute for System Programming of the RAS
General information
Latest issue

Search papers
Search references

Latest issue
Current issues
Archive issues
What is RSS

Proceedings of ISP RAS:

Personal entry:
Save password
Forgotten password?

Proceedings of the Institute for System Programming of the RAS, 2021, Volume 33, Issue 4, Pages 131–146
(Mi tisp618)

Text documents marking algorithm based on interword distances shifting invariant to format conversion

A. V. Kozachoka, S. A. Kopylova, P. N. Gorbacheva, A. E. Gaynovb, B. V. Kondrat’evb

a Academy of Federal Guard Service
b Ministry of Defence of the Russian Federation
Abstract: The article presents an electronic text documents marking algorithm based on the identification information embedding by changing the values of the intervals between words (interwords distance shifting). The algorithm development is aimed at increasing the documents containing text information security from leakage through the channel due to the transfer of documents printed on paper, as well as the corresponding electronic copies of paper documents. In the marking algorithm developing process, an existing tools analysis of protecting paper documents from leakage was carried out, practical solutions in the field of protecting text documents were considered, their advantages and disadvantages were determined. The interwods distance shifting algorithm acts as an approach to the information embedding in electronic documents. Changing the values of interwords distance is based on embedding the normalized space in the selected areas of text lines and adjusting the remaining values of the spacing between words by the calculated values. To invariance ensure of the embedded marker for printing and subsequent scanning or photographing, formation algorithms of embedding regions and embedding matrix have been developed. In the embedding regions forming process from the text lines of the source document, arrays of spaces are formed, consisting of pairs: four and two spaces or two spaces. By means of the embedded information in the formed areas, the places where the normalized space is inserted is determined. In the embedding a marker process, an embedding matrix is formed, containing the values of the word displacement, and it is embedded in the original document in the process of printing. The developed marking algorithm usage makes it possible to introduce a marker into the electronic document text structure that is invariant to the format transformation of an electronic document into a paper one and vice versa. In addition, the developed marking algorithm features and limitations are presented. Directions for further research identified.
Keywords: information leakage protection, marking, pattern recognition, image processing, text documents.
Document Type: Article
Language: Russian
Citation: A. V. Kozachok, S. A. Kopylov, P. N. Gorbachev, A. E. Gaynov, B. V. Kondrat'ev, “Text documents marking algorithm based on interword distances shifting invariant to format conversion”, Proceedings of ISP RAS, 33:4 (2021), 131–146
Citation in format AMSBIB
\by A.~V.~Kozachok, S.~A.~Kopylov, P.~N.~Gorbachev, A.~E.~Gaynov, B.~V.~Kondrat'ev
\paper Text documents marking algorithm based on interword distances shifting invariant to format conversion
\jour Proceedings of ISP RAS
\yr 2021
\vol 33
\issue 4
\pages 131--146
Linking options:
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Proceedings of the Institute for System Programming of the RAS
    Statistics & downloads:
    Abstract page:17
    Full-text PDF :9
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024