Abstract

Q-Learning is based on value iteration and remains the most popular choice for solving Markov Decision Problems (MDPs) via reinforcement learning (RL), where the goal is to bypass the transition probabilities of the MDP. Approximate policy iteration (API) is another RL technique, not as widely used as Q-Learning, based on modified policy iteration. in this paper, we present and analyze an API algorithm for discounted reward based on (i) classical temporal differences update for policy evaluation and (ii) simulation-Based mean estimation for policy improvement. Further, we analyze for convergence API algorithms based on Q-factors for (i) discounted reward and (ii) for average reward MDPs. the average reward algorithm is based on relative value iteration; we also present results from some numerical experiments with it. © 2012 Published by Elsevier B.V.

Recommended Citation

A. Gosavi, "Approximate Policy Iteration for Markov Control Revisited," Procedia Computer Science, vol. 12, pp. 90 - 95, Elsevier, Jan 2012.

The definitive version is available at https://doi.org/10.1016/j.procs.2012.09.036

Department(s)

Engineering Management and Systems Engineering

Publication Status

Open Access

Keywords and Phrases

Approximate policy iteration; Average reward; Q-P-Learning; Relative value iteration

International Standard Serial Number (ISSN)

1877-0509

Document Type

Article - Conference proceedings

Document Version

Final Version

File Type

text

Language(s)

English

Rights

Creative Commons Licensing

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Publication Date

01 Jan 2012

Download

Full Text Link

Included in

Operations Research, Systems Engineering and Industrial Engineering Commons

COinS

Engineering Management and Systems Engineering Faculty Research & Creative Works

Approximate Policy Iteration for Markov Control Revisited

Abstract

Recommended Citation

Department(s)

Publication Status

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Creative Commons Licensing

Publication Date

Included in

Search

Browse

Faculty Gallery

Author Corner

Related Content

Useful Links

Article Locations

Engineering Management and Systems Engineering Faculty Research & Creative Works

Approximate Policy Iteration for Markov Control Revisited

Author

Abstract

Recommended Citation

Department(s)

Publication Status

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Creative Commons Licensing

Publication Date

Included in

Share

Search

Browse

Faculty Gallery

Author Corner

Related Content

Useful Links

Article Locations