Alexander V. Kolnogorov, Alexander V. Nazin, Dmitry N. Shiyan, “Two-armed bandit problem and batch version of the mirror descent algorithm”, Mat. Teor. Igr Pril., 13:2 (2021), 9

Matematicheskaya Teoriya Igr i Ee Prilozheniya

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Mat. Teor. Igr Pril.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Matematicheskaya Teoriya Igr i Ee Prilozheniya, 2021, Volume 13, Issue 2, Pages 9–39 (Mi mgta279)

This article is cited in 2 scientific papers (total in 2 papers)

Two-armed bandit problem and batch version of the mirror descent algorithm

Alexander V. Kolnogorov^a, Alexander V. Nazin^b, Dmitry N. Shiyan^a

^a Yaroslav-the-Wise Novgorod State University
^b V.A.Trapeznikov Institute of Control Sciences Russian Academy of Sciences

Full-text PDF (382 kB) Citations (2)

References:

PDF

HTML

Abstract: We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods with different a priori unknown efficiencies. One should determine the most efficient method and provide its predominant application. To this end, we use the mirror descent algorithm (MDA). It is well-known that corresponding minimax risk has the order of $N^{1/2}$ with $N$ being the number of processed data and this bound is unimprovable in order. We propose a batch version of the MDA which allows processing data by packets that is especially important if parallel data processing can be provided. In this case, the processing time is determined by the number of batches rather than by the total number of data. Unexpectedly, it turned out that the batch version behaves unlike the ordinary one even if the number of packets is large. Moreover, the batch version provides significantly smaller value of the minimax risk, i.e., it considerably improves a control performance. We explain this result by considering another batch modification of the MDA which behavior is close to behavior of the ordinary version and minimax risk is close as well. Our estimates use invariant descriptions of the algorithms based on Gaussian approximations of incomes in batches of data in the domain of “close” distributions and are obtained by Monte-Carlo simulations.

Keywords: two-armed bandit problem, minimax approach, mirror descent algorithm, EXP3, batch processing.

Funding agency	Grant number
Russian Foundation for Basic Research	20-01-00062
Russian Science Foundation	16-11-10015

Received: 20.11.2020
Revised: 08.12.2020
Accepted: 01.03.2021

Document Type: Article

UDC: 519.832, 519.245

BBC: 22.18

Language: Russian

Citation: Alexander V. Kolnogorov, Alexander V. Nazin, Dmitry N. Shiyan, “Two-armed bandit problem and batch version of the mirror descent algorithm”, Mat. Teor. Igr Pril., 13:2 (2021), 9–39

Citation in format AMSBIB

\Bibitem{KolNazShi21}

\by Alexander~V.~Kolnogorov, Alexander~V.~Nazin, Dmitry~N.~Shiyan

\paper Two-armed bandit problem and batch version of the mirror descent algorithm

\jour Mat. Teor. Igr Pril.

\yr 2021

\vol 13

\issue 2

\pages 9--39

\mathnet{http://mi.mathnet.ru/mgta279}

Linking options:

https://www.mathnet.ru/eng/mgta279

https://www.mathnet.ru/eng/mgta/v13/i2/p9

This publication is cited in the following 2 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Математическая теория игр и её приложения

Statistics & downloads:
Abstract page:	225
Full-text PDF :	107
References:	22

Что такое QR-код?

Registration to the website

Logotypes