T. Ter-Hovhannisyan, H. Aleksanyan, K. Avetisyan, “Adversarial attacks on language models: WordPiece filtration and ChatGPT synonyms”, Investigations on applied mathematics and informatics. Part II–2, Zap. Nauchn. Sem. POMI, 530, POMI, St. Petersburg, 2023, 80

Zapiski Nauchnykh Seminarov POMI

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Zap. Nauchn. Sem. POMI:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Zapiski Nauchnykh Seminarov POMI, 2023, Volume 530, Pages 80–95 (Mi znsl7434)

Adversarial attacks on language models: WordPiece filtration and ChatGPT synonyms

T. Ter-Hovhannisyan, H. Aleksanyan, K. Avetisyan

Russian-Armenian University, ISP RAS, Yerevan, Armenia

Full-text PDF (522 kB)

References:

PDF

HTML

Abstract: Adversarial attacks on text have gained significant attention in recent years due to their potential to undermine the reliability of NLP models. We present novel black-box character- and word-level adversarial example generation approaches applicable to BERT-based models. The character-level approach is based on the idea of adding natural typos into a word according to its WordPiece tokenization. As for word-level approaches, we present three techniques that make use of synonymous substitute words created by ChatGPT and post-corrected to be in the appropriate grammatical form for the given context. Additionally, we try to minimize the perturbation rate taking into account the damage that each perturbation does to the model. By combining character-level approaches, word-level approaches, and the perturbation rate minimization technique, we achieve a state of the art attack rate. Our best approach works 30-65% faster than the previously best method, Tampers, and has a comparable perturbation rate. At the same time, proposed perturbations retain the semantic similarity between the original and adversarial examples and achieve a relatively low value of Levenshtein distance.

Key words and phrases: adversarial attacks, character-level attacks, word-level attacks, ChatGPT synonyms, WordPiece.

Received: 06.09.2023

Document Type: Article

UDC: 81.322.2

Language: English

Citation: T. Ter-Hovhannisyan, H. Aleksanyan, K. Avetisyan, “Adversarial attacks on language models: WordPiece filtration and ChatGPT synonyms”, Investigations on applied mathematics and informatics. Part II–2, Zap. Nauchn. Sem. POMI, 530, POMI, St. Petersburg, 2023, 80–95

Citation in format AMSBIB

\Bibitem{TerAleAve23}

\by T.~Ter-Hovhannisyan, H.~Aleksanyan, K.~Avetisyan

\paper Adversarial attacks on language models: WordPiece filtration and ChatGPT synonyms

\inbook Investigations on applied mathematics and informatics. Part~II--2

\serial Zap. Nauchn. Sem. POMI

\yr 2023

\vol 530

\pages 80--95

\publ POMI

\publaddr St.~Petersburg

\mathnet{http://mi.mathnet.ru/znsl7434}

Linking options:

https://www.mathnet.ru/eng/znsl7434

https://www.mathnet.ru/eng/znsl/v530/p80

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Statistics & downloads:
Abstract page:	141
Full-text PDF :	68
References:	29

Что такое QR-код?

Registration to the website

Logotypes