Dmitry N. Shiyan, “One-armed bandit problem and the mirror descent algorithm”, Mat. Teor. Igr Pril., 15:3 (2023), 88

Matematicheskaya Teoriya Igr i Ee Prilozheniya

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Mat. Teor. Igr Pril.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Matematicheskaya Teoriya Igr i Ee Prilozheniya, 2023, Volume 15, Issue 3, Pages 88–106 (Mi mgta337)

One-armed bandit problem and the mirror descent algorithm

Dmitry N. Shiyan

Yaroslav-the-Wise Novgorod State University

Full-text PDF (746 kB)

References:

PDF

HTML

Abstract: We consider the application of the mirror descent algorithm (MDA) to the one-armed bandit problem in the minimax statement as applied to data processing. This problem is also known as the game with nature, where the player's payoff function is the mathematical expectation of the total income. The player must determine the most effective method of the two available and provide that it is predominantly used. In this case, the a priori efficiency of one of the methods is known. This article proposes a modification of the MDA that allows to improve the efficiency of control through the use of additional a priori information. The proposed strategy retains the characteristic property of strategies for one-armed bandits – if a known action is applied once, it will be applied until the end of the control. Modifications for the algorithm for one-by-one processing and for its batch version are considered. Batch processing is interesting in that the total processing time is determined by the number of batches and not the original amount of data, if it is possible to provide parallel processing of data in batches. For the proposed algorithms, using the Monte-Carlo simulation, the optimal values of the tunable parameters were calculated and the minimax risk estimates were obtained.

Keywords: two-armed bandit problem, one-armed bandit problem, minimax approach, mirror descent algorithm, EXP3, batch processing.

Funding agency	Grant number
Russian Foundation for Basic Research	20-01-00062

Received: 04.04.2023
Revised: 10.06.2023
Accepted: 01.09.2023

Document Type: Article

UDC: 519.832, 519.245

BBC: 22.18

Language: Russian

Citation: Dmitry N. Shiyan, “One-armed bandit problem and the mirror descent algorithm”, Mat. Teor. Igr Pril., 15:3 (2023), 88–106

Citation in format AMSBIB

\Bibitem{Shi23}

\by Dmitry~N.~Shiyan

\paper One-armed bandit problem and the mirror descent algorithm

\jour Mat. Teor. Igr Pril.

\yr 2023

\vol 15

\issue 3

\pages 88--106

\mathnet{http://mi.mathnet.ru/mgta337}

Linking options:

https://www.mathnet.ru/eng/mgta337

https://www.mathnet.ru/eng/mgta/v15/i3/p88

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Математическая теория игр и её приложения

Statistics & downloads:
Abstract page:	30
Full-text PDF :	19
References:	13

Что такое QR-код?

Registration to the website

Logotypes