Semi-Markov Adaptive Critic Heuristics with Application to Airline Revenue Management
The adaptive critic heuristic has been a popular algorithm in reinforcement learning (RL) and approximate dynamic programming (ADP) alike. It is one of the first RL and ADP algorithms. RL and ADP algorithms are particularly useful for solving Markov decision processes (MDPs) that suffer from the curses of dimensionality and modeling. Many real-world problems, however, tend to be semi-Markov decision processes (SMDPs) in which the time spent in each transition of the underlying Markov chains is itself a random variable. Unfortunately for the average reward case, unlike the discounted reward case, the MDP does not have an easy extension to the SMDP. Examples of SMDPs can be found in the area of supply chain management, maintenance management, and airline revenue management. In this paper, we propose an adaptive critic heuristic for the SMDP under the long-run average reward criterion. We present the convergence analysis of the algorithm which shows that under certain mild conditions, which can be ensured within a simulator, the algorithm converges to an optimal solution with probability 1. We test the algorithm extensively on a problem of airline revenue management in which the manager has to set prices for airline tickets over the booking horizon. The problem has a large scale, suffering from the curse of dimensionality, and hence it is difficult to solve it via classical methods of dynamic programming. Our numerical results are encouraging and show that the algorithm outperforms an existing heuristic used widely in the airline industry.
K. Kulkarni et al., "Semi-Markov Adaptive Critic Heuristics with Application to Airline Revenue Management," Journal of Control Theory and Applications, vol. 9, no. 3, pp. 421-430, Springer Science and Business Media Deutschland GmbH, Jan 2011.
The definitive version is available at https://doi.org/10.1007/s11768-011-0161-9
Engineering Management and Systems Engineering
Keywords and Phrases
Actor critic; Adaptive critic; Airline industry; Airline tickets; Approximate dynamic programming; Average reward; Average reward criteria; Classical methods; Convergence analysis; Curse of dimensionality; Discounted reward; Maintenance management; Markov Decision Processes; Numerical results; Optimal solutions; Real-world problem; Revenue management; Semi-Markov; Semi-Markov decision process; Time spent; Air transportation; Algorithms; Commerce; Convergence of numerical methods; Heuristic programming; Maintenance; Management; Markov processes; Random variables; Real variables; Reinforcement learning; Supply chain management; Dynamic programming
International Standard Serial Number (ISSN)
Article - Journal
© 2011 Springer Science and Business Media Deutschland GmbH, All rights reserved.