STARD-Net: SpatioTemporal Attention for Robust Detection of Tiny Airborne Objects from Moving Drones
Abstract
The rapid adoption of drones across various domains, alongside advancements in computer vision, has driven growing interest in vision-based airborne object detection from moving aerial platforms. However, this task remains challenging due to the small scale of objects, camouflage within cluttered backgrounds, and occlusions. To address these challenges, we introduce an end-to-end detection framework that integrates a Drone Receptive Field Block (DRFB) to extract multiscale and geometrically diverse features, specifically designed to enhance the detection of small and camouflaged airborne objects. To model motion patterns over time while preserving spatial structure, particularly for detecting camouflaged, cluttered and occluded objects with limited appearance cues, we incorporate a Convolutional Long Short-Term Memory (ConvLSTM) module, which effectively captures temporal dependencies across consecutive frames. Additionally, we introduce a SpatioTemporal Attention Block (STAB), inspired by Multi-Head Attention, to aggregate spatial and temporal context for improved semantic understanding. The detection head combines a Swin Transformer with a Cross Stage Partial (CSP) Bottleneck, offering lightweight yet powerful global context modeling for robust detection in complex aerial scenes. We evaluate our model on four publicly available airborne object detection datasets from moving drones, achieving significant improvements in accuracy while maintaining real time inference speed. Moreover, when integrated into various You Look Only Once (YOLO) architectures, our spatial feature extraction module (DRFB) consistently boosts performance, demonstrating its broad applicability and effectiveness. The code is available online here.
Recommended Citation
H. Rahman and S. K. Madria, "STARD-Net: SpatioTemporal Attention for Robust Detection of Tiny Airborne Objects from Moving Drones," ACM Transactions on Spatial Algorithms and Systems, vol. 12, no. 1, article no. 3, Association for Computing Machinery (ACM), Mar 2026.
The definitive version is available at https://doi.org/10.1145/3787467
Department(s)
Computer Science
Publication Status
Open Access
Keywords and Phrases
Airborne object detection; ConvLSTM; drone to drone detection; multihead attention; receptive field block; swin transformer; UAV detection; YOLO
International Standard Serial Number (ISSN)
2374-0361; 2374-0353
Document Type
Article - Journal
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2026 Association for Computing Machinery (ACM), All rights reserved.
Creative Commons Licensing

This work is licensed under a Creative Commons Attribution 4.0 License.
Publication Date
01 Mar 2026
