|
Problemy Peredachi Informatsii, 2000, Volume 36, Issue 4, Pages 117–127
(Mi ppi501)
|
|
|
|
This article is cited in 2 scientific papers (total in 2 papers)
Automata Theory
On Optimal Prior Learning Time in the Two-Armed Bandit Problem
A. V. Kolnogorov
Abstract:
For the two-armed bandit problem considered on a known finite time segment $T$, a strategy with a priori determined learning time is proposed. Based on the loss balance equation, its exact asymptotic estimate is established, which is found to be of order $T^{2/3}$. For near distributions, the estimate changes: for a Bernoullian two-armed bandit, the learning time in this case approximately equals $T/3$.
Received: 22.06.1999 Revised: 24.07.2000
Citation:
A. V. Kolnogorov, “On Optimal Prior Learning Time in the Two-Armed Bandit Problem”, Probl. Peredachi Inf., 36:4 (2000), 117–127; Problems Inform. Transmission, 36:4 (2000), 387–396
Linking options:
https://www.mathnet.ru/eng/ppi501 https://www.mathnet.ru/eng/ppi/v36/i4/p117
|
Statistics & downloads: |
Abstract page: | 328 | Full-text PDF : | 116 | References: | 51 | First page: | 1 |
|