YOGA: Deep Object Detection in the Wild with Lightweight Feature Learning and Multiscale Attention
Abstract
We Introduce YOGA, a Deep Learning based Yet Lightweight Object Detection Model that Can Operate on Low-End Edge Devices While Still Achieving Competitive Accuracy. the YOGA Architecture Consists of a Two-Phase Feature Learning Pipeline with a Cheap Linear Transformation, Which Learns Feature Maps using Only Half of the Convolution Filters Required by Conventional Convolutional Neural Networks. in Addition, It Performs Multi-Scale Feature Fusion in its Neck using an Attention Mechanism Instead of the Naive Concatenation Used by Conventional Detectors. YOGA is a Flexible Model that Can Be Easily Scaled Up or Down by Several Orders of Magnitude to Fit a Broad Range of Hardware Constraints. We Evaluate YOGA on COCO-Val and COCO-Testdev Datasets with over 10 State-Of-The-Art Object Detectors. the Results Show that YOGA Strikes the Best Trade-Off between Model Size and Accuracy (Up to 22% Increase of AP and 23–34% Reduction of Parameters and FLOPs), Making It an Ideal Choice for Deployment in the Wild on Low-End Edge Devices. This is Further Affirmed by Our Hardware Implementation and Evaluation on NVIDIA Jetson Nano.
Recommended Citation
R. Sunkara and T. (. Luo, "YOGA: Deep Object Detection in the Wild with Lightweight Feature Learning and Multiscale Attention," Pattern Recognition, vol. 139, article no. 109451, Elsevier, Jul 2023.
The definitive version is available at https://doi.org/10.1016/j.patcog.2023.109451
Department(s)
Computer Science
International Standard Serial Number (ISSN)
0031-3203
Document Type
Article - Journal
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2023 Elsevier, All rights reserved.
Publication Date
01 Jul 2023