|
This article is cited in 1 scientific paper (total in 1 paper)
Gittins index for simple family of Markov bandit processes with switching cost and no discounting
M. P. Savelov Novosibirsk State University
Abstract:
We consider the multiarmed bandit problem (the problem of Markov bandits)
with switching penalties and no discounting in case when state spaces of all bandits are finite.
An optimal strategy should have the largest average reward per unit time on
an infinite time horizon.
For this problem it is shown that an optimal strategy can be specified by a Gittins index
under the natural assumption that the switching penalties are nonnegative.
Keywords:
multicomponent systems, Gittins index,
simple family of alternative Markov bandit processes,
multiarmed bandit problem, Markov decision process, controlled Markov processes,
long run average return, no discounting, switching penalties,
optimal strategy.
Received: 26.03.2019 Accepted: 20.06.2019
Citation:
M. P. Savelov, “Gittins index for simple family of Markov bandit processes with switching cost and no discounting”, Teor. Veroyatnost. i Primenen., 64:3 (2019), 442–455; Theory Probab. Appl., 64:3 (2019), 355–364
Linking options:
https://www.mathnet.ru/eng/tvp5303https://doi.org/10.4213/tvp5303 https://www.mathnet.ru/eng/tvp/v64/i3/p442
|
Statistics & downloads: |
Abstract page: | 414 | Full-text PDF : | 96 | References: | 51 | First page: | 24 |
|