Computer Research and Modeling
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computer Research and Modeling:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Computer Research and Modeling, 2023, Volume 15, Issue 2, Pages 329–353
DOI: https://doi.org/10.20537/2076-7633-2023-15-2-329-353
(Mi crm1063)
 

MATHEMATICAL MODELING AND NUMERICAL SIMULATION

Survey of convex optimization of Markov decision processes

V. D. Rudenkoab, N. E. Yudinac, A. A. Vasind

a Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow region, 141701, Russia
b National Research University Higher School of Economics, 11 Pokrovsky Bulvar, Moscow, 109028, Russia
c Federal Research Center “Informatics and Control” of Russian Academy of Sciences, 44/2 Vavilova st., Moscow, 119333, Russia
d Lomonosov Moscow State University, 1/52 Leninskiye Gory, Moscow, 119991, Russia
References:
Abstract: This article reviews both historical achievements and modern results in the field of Markov Decision Process (MDP) and convex optimization. This review is the first attempt to cover the field of reinforcement learning in Russian in the context of convex optimization. The fundamental Bellman equation and the criteria of optimality of policy — strategies based on it, which make decisions based on the known state of the environment at the moment, are considered. The main iterative algorithms of policy optimization based on the solution of the Bellman equations are also considered. An important section of this article was the consideration of an alternative to the $Q$-learning approach — the method of direct maximization of the agent’s average reward for the chosen strategy from interaction with the environment. Thus, the solution of this convex optimization problem can be represented as a linear programming problem. The paper demonstrates how the convex optimization apparatus is used to solve the problem of Reinforcement Learning (RL). In particular, it is shown how the concept of strong duality allows us to naturally modify the formulation of the RL problem, showing the equivalence between maximizing the agent’s reward and finding his optimal strategy. The paper also discusses the complexity of MDP optimization with respect to the number of state–action–reward triples obtained as a result of interaction with the environment. The optimal limits of the MDP solution complexity are presented in the case of an ergodic process with an infinite horizon, as well as in the case of a non-stationary process with a finite horizon, which can be restarted several times in a row or immediately run in parallel in several threads. The review also reviews the latest results on reducing the gap between the lower and upper estimates of the complexity of MDP optimization with average remuneration (Averaged MDP, AMDP). In conclusion, the real-valued parametrization of agent policy and a class of gradient optimization methods through maximizing the $Q$-function of value are considered. In particular, a special class of MDPs with restrictions on the value of policy (Constrained Markov Decision Process, CMDP) is presented, for which a general direct-dual approach to optimization with strong duality is proposed.
Keywords: MDP, convex optimization, $Q$-learning, linear programming, policy gradient methods.
Funding agency Grant number
Russian Science Foundation 21-71-30005
The research was supported by Russian Science Foundation (project No. 21-71-30005), https://rscf.ru/en/project/21-71-30005/.
Received: 19.02.2023
Accepted: 23.02.2023
Document Type: Article
UDC: 519.8
Language: Russian
Citation: V. D. Rudenko, N. E. Yudin, A. A. Vasin, “Survey of convex optimization of Markov decision processes”, Computer Research and Modeling, 15:2 (2023), 329–353
Citation in format AMSBIB
\Bibitem{RudYudVas23}
\by V.~D.~Rudenko, N.~E.~Yudin, A.~A.~Vasin
\paper Survey of convex optimization of Markov decision processes
\jour Computer Research and Modeling
\yr 2023
\vol 15
\issue 2
\pages 329--353
\mathnet{http://mi.mathnet.ru/crm1063}
\crossref{https://doi.org/10.20537/2076-7633-2023-15-2-329-353}
Linking options:
  • https://www.mathnet.ru/eng/crm1063
  • https://www.mathnet.ru/eng/crm/v15/i2/p329
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Computer Research and Modeling
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025