|
Teoriya Veroyatnostei i ee Primeneniya, 1982, Volume 27, Issue 1, Pages 109–119
(Mi tvp2274)
|
|
|
|
This article is cited in 33 scientific papers (total in 33 papers)
Nonrandomized Markov and semi-Markov policies in dynamic programming
E. A. Faĭnberg Moscow
Abstract:
The discrete time infinite horizon Borel state and action spaces non-stationary Markov decision model with the expected total reward criterion is considered. For an arbitrary fixed policy $\pi$ the following two statements are proved:
a) for an arbitrary initial measure $\mu$ and for a constant $K<\infty$ there exists a nonrandomized Markov policy $\varphi$ such that
\begin{gather*}
w(\mu,\varphi)\ge w(\mu,\pi)\ \text{if}\ w(\mu,\pi)<\infty,
\\
w(\mu,\varphi)\ge K\ \text{if}\ w(\mu,\pi)=\infty,
\end{gather*}
b) for an arbitrary measurable function $K(x)<\infty$ on the initial state space $X_0$ there exists a nonrandomized semi-Markov policy $\varphi'$ such that
\begin{gather*}
w(x,\varphi')\ge w(x,\pi)\ \text{if}\ w(x,\pi)<\infty,
\\
w(x,\varphi')\ge K(x)\ \text{if}\ w(x,\pi)=\infty\ \text{for every}\ x\in X_0.
\end{gather*}
For every policy $\sigma$ the numbers $w(\mu,\sigma)$ and $w(x,\sigma)$ are the values of the criterion for the initial measure $\mu$ and the initial state $x$ respectively.
Received: 28.11.1979
Citation:
E. A. Faǐnberg, “Nonrandomized Markov and semi-Markov policies in dynamic programming”, Teor. Veroyatnost. i Primenen., 27:1 (1982), 109–119; Theory Probab. Appl., 27:1 (1982), 116–126
Linking options:
https://www.mathnet.ru/eng/tvp2274 https://www.mathnet.ru/eng/tvp/v27/i1/p109
|
Statistics & downloads: |
Abstract page: | 203 | Full-text PDF : | 87 | First page: | 1 |
|