Zapiski Nauchnykh Seminarov POMI
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Zap. Nauchn. Sem. POMI:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Zapiski Nauchnykh Seminarov POMI, 2023, Volume 529, Pages 157–175 (Mi znsl7425)  

What do text-to-image models know about the languages of the world?

V. Firsanova

St. Petersburg State University, St. Petersburg, Russia
References:
Abstract: Text-to-image models use user-generated prompts to produce images. Such text-to-image models as DALL-E 2, Imagen, Stable Diffusion, and Midjourney can generate photorealistic or similar to human-drawn images. Apart from imitating human art, large text-to-image models have learned to produce combinations of pixels reminiscent of captions in natural languages. For example, a generated image might contain a figure of an animal and a symbol combination reminding us of human-readable words in a natural language describing the biological name of this species. Although the words occasionally appearing on generated images can be human-readable, they are not rooted in natural language vocabularies and make no sense to non-linguists. At the same time, we find that semiotic and linguistic analysis of the so-called hidden vocabulary of text-to-image models will contribute to the field of explainable AI and prompt engineering. We can use the results of this analysis to reduce the risks of applying such models in real life problem solving and to detect deepfakes. The proposed study is one of the first attempts at analyzing text-to-image models from the point of view of semiotics and linguistics. Our approach implies prompt engineering, image generation, and comparative analysis. The source code, generated images, and prompts have been made available at https://github.com/vifirsanova/text-to-image-explainable.
Key words and phrases: explainable artificial intelligence, text-to-image synthesis, diffusion models.
Received: 06.09.2023
Document Type: Article
UDC: 81.322.2
Language: English
Citation: V. Firsanova, “What do text-to-image models know about the languages of the world?”, Investigations on applied mathematics and informatics. Part II–1, Zap. Nauchn. Sem. POMI, 529, POMI, St. Petersburg, 2023, 157–175
Citation in format AMSBIB
\Bibitem{Fir23}
\by V.~Firsanova
\paper What do text-to-image models know about the languages of the world?
\inbook Investigations on applied mathematics and informatics. Part~II--1
\serial Zap. Nauchn. Sem. POMI
\yr 2023
\vol 529
\pages 157--175
\publ POMI
\publaddr St.~Petersburg
\mathnet{http://mi.mathnet.ru/znsl7425}
Linking options:
  • https://www.mathnet.ru/eng/znsl7425
  • https://www.mathnet.ru/eng/znsl/v529/p157
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Записки научных семинаров ПОМИ
    Statistics & downloads:
    Abstract page:124
    Full-text PDF :61
    References:16
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024