Abstract

The rapid adoption of drones across various domains, alongside advancements in computer vision, has driven growing interest in vision-based airborne object detection from moving aerial platforms. However, this task remains challenging due to the small scale of objects, camouflage within cluttered backgrounds, and occlusions. To address these challenges, we introduce an end-to-end detection framework that integrates a Drone Receptive Field Block (DRFB) to extract multiscale and geometrically diverse features, specifically designed to enhance the detection of small and camouflaged airborne objects. To model motion patterns over time while preserving spatial structure, particularly for detecting camouflaged, cluttered and occluded objects with limited appearance cues, we incorporate a Convolutional Long Short-Term Memory (ConvLSTM) module, which effectively captures temporal dependencies across consecutive frames. Additionally, we introduce a SpatioTemporal Attention Block (STAB), inspired by Multi-Head Attention, to aggregate spatial and temporal context for improved semantic understanding. The detection head combines a Swin Transformer with a Cross Stage Partial (CSP) Bottleneck, offering lightweight yet powerful global context modeling for robust detection in complex aerial scenes. We evaluate our model on four publicly available airborne object detection datasets from moving drones, achieving significant improvements in accuracy while maintaining real time inference speed. Moreover, when integrated into various You Look Only Once (YOLO) architectures, our spatial feature extraction module (DRFB) consistently boosts performance, demonstrating its broad applicability and effectiveness. The code is available online here.

Department(s)

Computer Science

Publication Status

Open Access

Keywords and Phrases

Airborne object detection; ConvLSTM; drone to drone detection; multihead attention; receptive field block; swin transformer; UAV detection; YOLO

International Standard Serial Number (ISSN)

2374-0361; 2374-0353

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2026 Association for Computing Machinery (ACM), All rights reserved.

Creative Commons Licensing

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Publication Date

01 Mar 2026

Share

 
COinS