|
This article is cited in 1 scientific paper (total in 1 paper)
Optimal control
TT-QI: Faster value iteration in tensor train format for stochastic optimal control
A. I. Boykoa, I. V. Oseledetsab, G. Ferrera a Skolkovo Institute of Science and Technology, 121205, Moscow, Russia
b Marchuk Institute of Numerical Mathematics of the Russian Academy of Sciences, Moscow
Abstract:
The problem of general non-linear stochastic optimal control with small Wiener noise is studied. The problem is approximated by a Markov Decision Process. Bellman Equation is solved using Value Iteration (VI) algorithm in the low rank Tensor Train format (TT-VI). In this paper a modification of the TT-VI algorithm called TT-Q-Iteration (TT-QI) is proposed by authors. In it, the nonlinear Bellman Optimality Operator is iteratively applied to the solution as a composition of internal Tensor Train algebraic operations and TT-CROSS algorithm. We show that it has lower asymptotic complexity per iteration than the method existing in the literature, provided that TT-ranks of transition probabilities are small. In test examples of an underpowered inverted pendulum and Dubins cars our method shows up to 3–10 times faster convergence in terms of wall clock time compared with the original method.
Key words:
dynamic programming, optimal control, Markov decision process, MDP, Markov chain approximation, MCA, low rank decomposition.
Received: 24.11.2020 Revised: 24.11.2020 Accepted: 14.01.2021
Citation:
A. I. Boyko, I. V. Oseledets, G. Ferrer, “TT-QI: Faster value iteration in tensor train format for stochastic optimal control”, Zh. Vychisl. Mat. Mat. Fiz., 61:5 (2021), 865–877; Comput. Math. Math. Phys., 61:5 (2021), 836–846
Linking options:
https://www.mathnet.ru/eng/zvmmf11244 https://www.mathnet.ru/eng/zvmmf/v61/i5/p865
|
Statistics & downloads: |
Abstract page: | 78 |
|