Location
Havener Center, Miner Lounge / Wiese Atrium, 9:30am-11:30am
Start Date
4-2-2026 9:30 AM
End Date
4-2-2026 11:30 AM
Presentation Date
April 2, 2026; 9:30am-11:30am
Description
While reinforcement learning has been increasingly applied to stochastic control, limited work examines policy-based methods in queuing environments modeled as semi-Markov decision processes (SMDP). This study investigates how policy-based reinforcement learning (RL) algorithms perform when applied to service rate control in an M/M/1 queue, a common queuing model for manufacturing and service systems. The problem is formulated as an SMDP in which decisions occur at each new service, allowing an agent to select different service rates from a finite set of speeds, aiming to minimize an objective function that manages system congestion and energy costs. Three policy-based reinforcement learning algorithms, namely REINFORCE, Actor-Critic (A2C), and Proximal Policy Optimization (PPO), are trained within a simulated environment using two state representations: the instantaneous queue length and an augmented state including a one-step history. Performance is evaluated in terms of convergence speed, sampling efficiency, policy quality, and pseudo-regret relative to the steady-state optimum.
Biography
Joseph Walton is a first-year Ph.D. student in the Department of Engineering Management and Systems Engineering at Missouri University of Science and Technology, under the supervision of Dr. Gabriel Nicolosi, and a Kummer Innovation and Entrepreneurship Doctoral Fellow. His research interests include operations research, machine learning, and stochastic systems.
He has professional experience in process improvement and data analytics, where he applied statistical and computational methods to optimize manufacturing performance and support data-driven decision making.
Joseph’s work centers on improving decision-making in complex operational systems.
Meeting Name
2026 - Miners Solving for Tomorrow Research Conference
Department(s)
Engineering Management and Systems Engineering
Document Type
Poster
Document Version
Final Version
File Type
event
Language(s)
English
Rights
© 2026 The Authors, All rights reserved
Empirical Evaluation of Policy-Based Reinforcement Learning for Dynamic Service Control in an M/M/1 Queue
Havener Center, Miner Lounge / Wiese Atrium, 9:30am-11:30am
While reinforcement learning has been increasingly applied to stochastic control, limited work examines policy-based methods in queuing environments modeled as semi-Markov decision processes (SMDP). This study investigates how policy-based reinforcement learning (RL) algorithms perform when applied to service rate control in an M/M/1 queue, a common queuing model for manufacturing and service systems. The problem is formulated as an SMDP in which decisions occur at each new service, allowing an agent to select different service rates from a finite set of speeds, aiming to minimize an objective function that manages system congestion and energy costs. Three policy-based reinforcement learning algorithms, namely REINFORCE, Actor-Critic (A2C), and Proximal Policy Optimization (PPO), are trained within a simulated environment using two state representations: the instantaneous queue length and an augmented state including a one-step history. Performance is evaluated in terms of convergence speed, sampling efficiency, policy quality, and pseudo-regret relative to the steady-state optimum.

Comments
Advisor: Gabriel Nicolosi, gabrielnicolosi@mst.edu