D. B. Rokhlin, “$Q$-learning in a stochastic Stackelberg game between an uninformed leader and a naive follower”, Teor. Veroyatnost. i Primenen., 64:1 (2019), 53–74; Theory Probab. Appl., 64:1 (2019), 41

Teoriya Veroyatnostei i ee Primeneniya

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor
	Guidelines for authors
	Submit a manuscript

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Teor. Veroyatnost. i Primenen.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Teoriya Veroyatnostei i ee Primeneniya, 2019, Volume 64, Issue 1, Pages 53–74
DOI: https://doi.org/10.4213/tvp5231 (Mi tvp5231)

This article is cited in 1 scientific paper (total in 1 paper)

$Q$-learning in a stochastic Stackelberg game between an uninformed leader and a naive follower

D. B. Rokhlin

Institute of Mathematics, Mechanics and Computer Sciences, Southern Federal University

Full-text PDF (512 kB) Citations (1)

References:

PDF

HTML

DOI: https://doi.org/10.4213/tvp5231

Abstract: We consider a game between a leader and a follower, where the players' actions affect the stochastic dynamics of the state process $x_t$, $t\in\mathbb Z_+$. The players observe their rewards and the system state $x_t$. The transition kernel of the process $x_t$ and the opponent rewards are unobservable. At each stage of the game, the leader selects action $a_t$ first. When selecting the action $b_t$, the follower knows the action $a_t$. The follower's actions are unknown to the leader (an uniformed leader). Each player tries to maximize the discounted criterion by applying the $Q$-learning algorithm. The players' randomized strategies are uniquely determined by Boltzmann distributions depending on the $Q$-functions, which are updated in the course of learning. The specific feature of the algorithm is that when updating the $Q$-function, the follower believes that the action of the leader in the next state is the same as in the current one (a naive follower). It is shown that the convergence of the algorithm is secured by the existence of deterministic stationary strategies that generate an irreducible Markov chain. The limiting large time behavior of the players' $Q$-functions is described in terms of controlled Markov processes. The distributions of the players' actions converge to Boltzmann distributions depending on the limiting $Q$-functions.

Keywords: $Q$-learning, leader, follower, stochastic Stackelberg game, discounted criterion, Boltzmann distribution.

Funding agency	Grant number
Russian Science Foundation	17-19-01038
This work was supported by the Russian Science Foundation (grant 17-19-01038).

Received: 18.06.2018
Revised: 12.10.2018
Accepted: 18.10.2018

English version:
Theory of Probability and its Applications, 2019, Volume 64, Issue 1, Pages 41–58
DOI: https://doi.org/10.1137/S0040585X97T989386

Bibliographic databases:

Document Type: Article

Language: Russian

Citation: D. B. Rokhlin, “$Q$-learning in a stochastic Stackelberg game between an uninformed leader and a naive follower”, Teor. Veroyatnost. i Primenen., 64:1 (2019), 53–74; Theory Probab. Appl., 64:1 (2019), 41–58

Citation in format AMSBIB

\Bibitem{Rok19}

\by D.~B.~Rokhlin

\paper $Q$-learning in a stochastic Stackelberg game between an uninformed leader and a naive follower

\jour Teor. Veroyatnost. i Primenen.

\yr 2019

\vol 64

\issue 1

\pages 53--74

\mathnet{http://mi.mathnet.ru/tvp5231}

\crossref{https://doi.org/10.4213/tvp5231}

\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=3904805}

\zmath{https://zbmath.org/?q=an:07062745}

\elib{https://elibrary.ru/item.asp?id=37090012}

\transl

\jour Theory Probab. Appl.

\yr 2019

\vol 64

\issue 1

\pages 41--58

\crossref{https://doi.org/10.1137/S0040585X97T989386}

\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=000466860200004}

\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85067334309}

Linking options:

https://www.mathnet.ru/eng/tvp5231

https://doi.org/10.4213/tvp5231

https://www.mathnet.ru/eng/tvp/v64/i1/p53

This publication is cited in the following 1 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Theory of Probability and its Applications

Statistics & downloads:
Abstract page:	324
Full-text PDF :	84
References:	38
First page:	15

Что такое QR-код?

Registration to the website

Logotypes