Transformer-Guided Deep Reinforcement Learning for Trajectory Design

Nathan Roberts, Missouri University of Science and Technology

Advisor: Xiaosong Du, xiaosongdu@mst.edu

Description

Electric vertical takeoff and landing (eVTOL) aircraft are limited in range by battery capacity, making energy-efficient takeoff trajectory design critical for practical operation. While deep reinforcement learning (DRL) enables adaptive control, it often struggles to discover feasible solutions due to highly nonlinear dynamics, strict constraints, and high sample complexity. This work introduces a transformer-guided deep reinforcement learning (TDRL) framework for eVTOL takeoff trajectory design. A transformer trained on optimal control trajectories learns temporal relationships in control sequences and guides policy exploration by restricting actions to feasible, energy-conscious regions. The approach is evaluated across a design space varying aircraft parameters, namely efficiency and wing planform scale. Transformer-guided agents achieved feasible trajectories at all evaluated design points, maintaining energy consumption within 5% of optimal solutions, whereas vanilla DRL failed in most cases. Results demonstrate that transformer guidance improves training reliability and performance across vehicle configurations.

 
Apr 1st, 1:30 PM Apr 1st, 3:30 PM

Transformer-Guided Deep Reinforcement Learning for Trajectory Design

Havener Center, Miner Lounge / Wiese Atrium, 1:30pm-3:30pm

Electric vertical takeoff and landing (eVTOL) aircraft are limited in range by battery capacity, making energy-efficient takeoff trajectory design critical for practical operation. While deep reinforcement learning (DRL) enables adaptive control, it often struggles to discover feasible solutions due to highly nonlinear dynamics, strict constraints, and high sample complexity. This work introduces a transformer-guided deep reinforcement learning (TDRL) framework for eVTOL takeoff trajectory design. A transformer trained on optimal control trajectories learns temporal relationships in control sequences and guides policy exploration by restricting actions to feasible, energy-conscious regions. The approach is evaluated across a design space varying aircraft parameters, namely efficiency and wing planform scale. Transformer-guided agents achieved feasible trajectories at all evaluated design points, maintaining energy consumption within 5% of optimal solutions, whereas vanilla DRL failed in most cases. Results demonstrate that transformer guidance improves training reliability and performance across vehicle configurations.