|
Zapiski Nauchnykh Seminarov POMI, 2023, Volume 529, Pages 176–196
(Mi znsl7426)
|
|
|
|
Blending of predictions boosts understanding for multimodal advertisements
A. Alekseeva, A. Savchenkob, E. Tutubalinacd, E. Myasnikove, S. Nikolenkoa a Steklov Institute of Mathematics at St. Petersburg, Russia
b Sber AI Lab, Russia
c Sber AI, Russia
d Kazan Federal University, Russia
e Samara National Research University, Russia
Abstract:
The advertising industry employs several content modalities to deliver implied messages: images, videos, text, music, and all of them combined. “Decoding” a message implied by multimodal content often requires both text and visual components. We study the tasks of multimodal symbolism prediction, topic detection, and sentiment type classification. Motivated by the difference in parts of the message conveyed by two modalities in advertisements, we train separate models for images and texts and significantly improve upon current state of the art by blending image- and text-based predictions (with OCR-extracted text), providing a comprehensive experimental validation of our approach.
Key words and phrases:
multimodal, ads understanding, topic detection, sentiment, sentiment classification.
Received: 12.10.2023
Citation:
A. Alekseev, A. Savchenko, E. Tutubalina, E. Myasnikov, S. Nikolenko, “Blending of predictions boosts understanding for multimodal advertisements”, Investigations on applied mathematics and informatics. Part II–1, Zap. Nauchn. Sem. POMI, 529, POMI, St. Petersburg, 2023, 176–196
Linking options:
https://www.mathnet.ru/eng/znsl7426 https://www.mathnet.ru/eng/znsl/v529/p176
|
Statistics & downloads: |
Abstract page: | 77 | Full-text PDF : | 37 | References: | 20 |
|