D. O. Kushchuk, M. A. Ryndin, A. K. Yatskov, M. I. Varlamov, “Using domain adversarial learning for text captchas recognition”, Proceedings of ISP RAS, 32:4 (2020), 203

Loading [MathJax]/jax/output/SVG/config.js

Proceedings of the Institute for System Programming of the RAS

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Proceedings of the Institute for System Programming of the RAS, 2020, Volume 32, Issue 4, Pages 203–216
DOI: https://doi.org/10.15514/ISPRAS-2020-32(4)-15 (Mi tisp535)

This article is cited in 1 scientific paper (total in 1 paper)

Using domain adversarial learning for text captchas recognition

D. O. Kushchuk^a, M. A. Ryndin^b, A. K. Yatskov^b, M. I. Varlamov^b

^a Moscow Institute of Physics and Technology
^b Ivannikov Institute for System Programming of the RAS

Full-text PDF (487 kB) Citations (1)

References:

PDF

HTML

DOI: https://doi.org/10.15514/ISPRAS-2020-32(4)-15

Abstract: Nowadays the problem of legal regulation of automatic collection of information from sites is being actively solved. This means that interest in tools and programs for automatic data collection is growing and that's why interest in automatic programs for solving CAPTCHA («Completely Automated Public Turing test to tell Computers and Humans Apart») is increasing too. In spite of сreation of more advanced types of captcha, nowadays text captcha is quite common. For instance, such large services as Yandex, Google, Wikipedia, VK continue to use them. There are many methods of breaking text captchas in literature, however, it should be noted that most of them have a limitation to priori know the length of the text on the image, which is not always the case in the real world. Also, many methods are multi-stage, which brings additional inconvenience to their implementation and application. Moreover, some methods use a large number of labeled pictures for training, but in reality, to collect data one has to be able to solve captchas from different sites. Respectively, manually labeling dozens of thousands of examples for training for each new type of captcha is an unrealistically difficult action. In this paper we propose a one-step algorithm of attack on text captchas. This approach does not require a priori knowledge of the text's length on the image. It has been shown experimentally that the usage of this algorithm in conjunction with the adversarial learning method allows one to achieve high quality on real data, using the low number (200-500) of labeled examples for training. An experimental comparison of the developed method with modern analogs showed that using the same number of real examples for training, our algorithm shows a comparable or higher quality, while it has a higher speed of working and training.

Keywords: machine learning, captcha solving, OCR, adversarial learning.

Document Type: Article

Language: Russian

Citation: D. O. Kushchuk, M. A. Ryndin, A. K. Yatskov, M. I. Varlamov, “Using domain adversarial learning for text captchas recognition”, Proceedings of ISP RAS, 32:4 (2020), 203–216

Citation in format AMSBIB

\Bibitem{KusRynYat20}

\by D.~O.~Kushchuk, M.~A.~Ryndin, A.~K.~Yatskov, M.~I.~Varlamov

\paper Using domain adversarial learning for text captchas recognition

\jour Proceedings of ISP RAS

\yr 2020

\vol 32

\issue 4

\pages 203--216

\mathnet{http://mi.mathnet.ru/tisp535}

\crossref{https://doi.org/10.15514/ISPRAS-2020-32(4)-15}

Linking options:

https://www.mathnet.ru/eng/tisp535

https://www.mathnet.ru/eng/tisp/v32/i4/p203

This publication is cited in the following 1 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Proceedings of the Institute for System Programming of the RAS

Statistics & downloads:
Abstract page:	84
Full-text PDF :	32
References:	22

Registration to the website

Logotypes