Location

Havener Center, Miner Lounge / Wiese Atrium, 9:30am-11:30am

Start Date

4-2-2026 9:30 AM

End Date

4-2-2026 11:30 AM

Presentation Date

April 2, 2026; 9:30am-11:30am

Description

While reinforcement learning has been increasingly applied to stochastic control, limited work examines policy-based methods in queuing environments modeled as semi-Markov decision processes (SMDP). This study investigates how policy-based reinforcement learning (RL) algorithms perform when applied to service rate control in an M/M/1 queue, a common queuing model for manufacturing and service systems. The problem is formulated as an SMDP in which decisions occur at each new service, allowing an agent to select different service rates from a finite set of speeds, aiming to minimize an objective function that manages system congestion and energy costs. Three policy-based reinforcement learning algorithms, namely REINFORCE, Actor-Critic (A2C), and Proximal Policy Optimization (PPO), are trained within a simulated environment using two state representations: the instantaneous queue length and an augmented state including a one-step history. Performance is evaluated in terms of convergence speed, sampling efficiency, policy quality, and pseudo-regret relative to the steady-state optimum.

Biography

Joseph Walton is a first-year Ph.D. student in the Department of Engineering Management and Systems Engineering at Missouri University of Science and Technology, under the supervision of Dr. Gabriel Nicolosi, and a Kummer Innovation and Entrepreneurship Doctoral Fellow. His research interests include operations research, machine learning, and stochastic systems.

He has professional experience in process improvement and data analytics, where he applied statistical and computational methods to optimize manufacturing performance and support data-driven decision making.

Joseph’s work centers on improving decision-making in complex operational systems.

Meeting Name

2026 - Miners Solving for Tomorrow Research Conference

Department(s)

Engineering Management and Systems Engineering

Comments

Advisor: Gabriel Nicolosi, gabrielnicolosi@mst.edu

Document Type

Poster

Document Version

Final Version

File Type

event

Language(s)

English

Rights

© 2026 The Authors, All rights reserved

Share

COinS
 
Apr 2nd, 9:30 AM Apr 2nd, 11:30 AM

Empirical Evaluation of Policy-Based Reinforcement Learning for Dynamic Service Control in an M/M/1 Queue

Havener Center, Miner Lounge / Wiese Atrium, 9:30am-11:30am

While reinforcement learning has been increasingly applied to stochastic control, limited work examines policy-based methods in queuing environments modeled as semi-Markov decision processes (SMDP). This study investigates how policy-based reinforcement learning (RL) algorithms perform when applied to service rate control in an M/M/1 queue, a common queuing model for manufacturing and service systems. The problem is formulated as an SMDP in which decisions occur at each new service, allowing an agent to select different service rates from a finite set of speeds, aiming to minimize an objective function that manages system congestion and energy costs. Three policy-based reinforcement learning algorithms, namely REINFORCE, Actor-Critic (A2C), and Proximal Policy Optimization (PPO), are trained within a simulated environment using two state representations: the instantaneous queue length and an augmented state including a one-step history. Performance is evaluated in terms of convergence speed, sampling efficiency, policy quality, and pseudo-regret relative to the steady-state optimum.