Teoriya Veroyatnostei i ee Primeneniya
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor
Guidelines for authors
Submit a manuscript

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Teor. Veroyatnost. i Primenen.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Teoriya Veroyatnostei i ee Primeneniya, 2019, Volume 64, Issue 1, Pages 53–74
DOI: https://doi.org/10.4213/tvp5231
(Mi tvp5231)
 

This article is cited in 1 scientific paper (total in 1 paper)

$Q$-learning in a stochastic Stackelberg game between an uninformed leader and a naive follower

D. B. Rokhlin

Institute of Mathematics, Mechanics and Computer Sciences, Southern Federal University
Full-text PDF (512 kB) Citations (1)
References:
Abstract: We consider a game between a leader and a follower, where the players' actions affect the stochastic dynamics of the state process $x_t$, $t\in\mathbb Z_+$. The players observe their rewards and the system state $x_t$. The transition kernel of the process $x_t$ and the opponent rewards are unobservable. At each stage of the game, the leader selects action $a_t$ first. When selecting the action $b_t$, the follower knows the action $a_t$. The follower's actions are unknown to the leader (an uniformed leader). Each player tries to maximize the discounted criterion by applying the $Q$-learning algorithm. The players' randomized strategies are uniquely determined by Boltzmann distributions depending on the $Q$-functions, which are updated in the course of learning. The specific feature of the algorithm is that when updating the $Q$-function, the follower believes that the action of the leader in the next state is the same as in the current one (a naive follower). It is shown that the convergence of the algorithm is secured by the existence of deterministic stationary strategies that generate an irreducible Markov chain. The limiting large time behavior of the players' $Q$-functions is described in terms of controlled Markov processes. The distributions of the players' actions converge to Boltzmann distributions depending on the limiting $Q$-functions.
Keywords: $Q$-learning, leader, follower, stochastic Stackelberg game, discounted criterion, Boltzmann distribution.
Funding agency Grant number
Russian Science Foundation 17-19-01038
This work was supported by the Russian Science Foundation (grant 17-19-01038).
Received: 18.06.2018
Revised: 12.10.2018
Accepted: 18.10.2018
English version:
Theory of Probability and its Applications, 2019, Volume 64, Issue 1, Pages 41–58
DOI: https://doi.org/10.1137/S0040585X97T989386
Bibliographic databases:
Document Type: Article
Language: Russian
Citation: D. B. Rokhlin, “$Q$-learning in a stochastic Stackelberg game between an uninformed leader and a naive follower”, Teor. Veroyatnost. i Primenen., 64:1 (2019), 53–74; Theory Probab. Appl., 64:1 (2019), 41–58
Citation in format AMSBIB
\Bibitem{Rok19}
\by D.~B.~Rokhlin
\paper $Q$-learning in a stochastic Stackelberg game between an uninformed leader and a naive follower
\jour Teor. Veroyatnost. i Primenen.
\yr 2019
\vol 64
\issue 1
\pages 53--74
\mathnet{http://mi.mathnet.ru/tvp5231}
\crossref{https://doi.org/10.4213/tvp5231}
\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=3904805}
\zmath{https://zbmath.org/?q=an:07062745}
\elib{https://elibrary.ru/item.asp?id=37090012}
\transl
\jour Theory Probab. Appl.
\yr 2019
\vol 64
\issue 1
\pages 41--58
\crossref{https://doi.org/10.1137/S0040585X97T989386}
\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=000466860200004}
\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85067334309}
Linking options:
  • https://www.mathnet.ru/eng/tvp5231
  • https://doi.org/10.4213/tvp5231
  • https://www.mathnet.ru/eng/tvp/v64/i1/p53
  • This publication is cited in the following 1 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Теория вероятностей и ее применения Theory of Probability and its Applications
    Statistics & downloads:
    Abstract page:319
    Full-text PDF :81
    References:37
    First page:15
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024