|
Text documents marking algorithm based on interword distances shifting invariant to format conversion
A. V. Kozachoka, S. A. Kopylova, P. N. Gorbacheva, A. E. Gaynovb, B. V. Kondrat’evb a Academy of Federal Guard Service
b Ministry of Defence of the Russian Federation
Abstract:
The article presents an electronic text documents marking algorithm based on the identification information embedding by changing the values of the intervals between words (interwords distance shifting). The algorithm development is aimed at increasing the documents containing text information security from leakage through the channel due to the transfer of documents printed on paper, as well as the corresponding electronic copies of paper documents. In the marking algorithm developing process, an existing tools analysis of protecting paper documents from leakage was carried out, practical solutions in the field of protecting text documents were considered, their advantages and disadvantages were determined. The interwods distance shifting algorithm acts as an approach to the information embedding in electronic documents. Changing the values of interwords distance is based on embedding the normalized space in the selected areas of text lines and adjusting the remaining values of the spacing between words by the calculated values. To invariance ensure of the embedded marker for printing and subsequent scanning or photographing, formation algorithms of embedding regions and embedding matrix have been developed. In the embedding regions forming process from the text lines of the source document, arrays of spaces are formed, consisting of pairs: four and two spaces or two spaces. By means of the embedded information in the formed areas, the places where the normalized space is inserted is determined. In the embedding a marker process, an embedding matrix is formed, containing the values of the word displacement, and it is embedded in the original document in the process of printing. The developed marking algorithm usage makes it possible to introduce a marker into the electronic document text structure that is invariant to the format transformation of an electronic document into a paper one and vice versa. In addition, the developed marking algorithm features and limitations are presented. Directions for further research identified.
Keywords:
information leakage protection, marking, pattern recognition, image processing, text documents.
Citation:
A. V. Kozachok, S. A. Kopylov, P. N. Gorbachev, A. E. Gaynov, B. V. Kondrat'ev, “Text documents marking algorithm based on interword distances shifting invariant to format conversion”, Proceedings of ISP RAS, 33:4 (2021), 131–146
Linking options:
https://www.mathnet.ru/eng/tisp618 https://www.mathnet.ru/eng/tisp/v33/i4/p131
|
Statistics & downloads: |
Abstract page: | 30 | Full-text PDF : | 12 |
|