A. S. Kharchevnikova, A. V. Savchenko, “Visual preferences prediction for a photo gallery based on image captioning methods”, Computer Optics, 44:4 (2020), 618

Computer Optics

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Computer Optics:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Computer Optics, 2020, Volume 44, Issue 4, Pages 618–626
DOI: https://doi.org/10.18287/2412-6179-CO-678 (Mi co828)

This article is cited in 3 scientific papers (total in 3 papers)

IMAGE PROCESSING, PATTERN RECOGNITION

Visual preferences prediction for a photo gallery based on image captioning methods

A. S. Kharchevnikova, A. V. Savchenko

National Research University Higher School of Economics, Nizhny Novgorod, Russia

Full-text PDF (1374 kB) Citations (3)

References:

PDF

HTML

DOI: https://doi.org/10.18287/2412-6179-CO-678

Abstract: The paper considers a problem of extracting user preferences based on their photo gallery. We propose a novel approach based on image captioning, i.e., automatic generation of textual descriptions of photos, and their classification. Known image captioning methods based on convolutional and recurrent (Long short-term memory) neural networks are analyzed. We train several models that combine the visual features of a photograph and the outputs of an Long short-term memory block by using Google's Conceptual Captions dataset. We examine application of natural language processing algorithms to transform obtained textual annotations into user preferences. Experimental studies are carried out using Microsoft COCO Captions, Flickr8k and a specially collected dataset reflecting the user’s interests. It is demonstrated that the best quality of preference prediction is achieved using keyword search methods and text summarization from Watson API, which are 8 % more accurate compared to traditional latent Dirichlet allocation. Moreover, descriptions generated by trained neural models are classified 1 – 7 % more accurately when compared to known image captioning models.

Keywords: user modeling, image processing, image captioning, convolutional neural networks.

Funding agency	Grant number
National Research University Higher School of Economics	19-04-004
The work was partly funded within the Academic Fund Program at the National Research University Higher School of Economics (HSE University) in 2019 (grant No 19-04-004) and by the Russian Academic Excellence Project "5-100".

Received: 13.12.2019
Accepted: 06.03.2020

Document Type: Article

Language: Russian

Citation: A. S. Kharchevnikova, A. V. Savchenko, “Visual preferences prediction for a photo gallery based on image captioning methods”, Computer Optics, 44:4 (2020), 618–626

Citation in format AMSBIB

\Bibitem{KhaSav20}

\by A.~S.~Kharchevnikova, A.~V.~Savchenko

\paper Visual preferences prediction for a photo gallery based on image captioning methods

\jour Computer Optics

\yr 2020

\vol 44

\issue 4

\pages 618--626

\mathnet{http://mi.mathnet.ru/co828}

\crossref{https://doi.org/10.18287/2412-6179-CO-678}

Linking options:

https://www.mathnet.ru/eng/co828

https://www.mathnet.ru/eng/co/v44/i4/p618

This publication is cited in the following 3 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Statistics & downloads:
Abstract page:	132
Full-text PDF :	79
References:	13

Что такое QR-код?

Registration to the website

Logotypes