YOGA: Deep Object Detection in the Wild with Lightweight Feature Learning and Multiscale Attention


We Introduce YOGA, a Deep Learning based Yet Lightweight Object Detection Model that Can Operate on Low-End Edge Devices While Still Achieving Competitive Accuracy. the YOGA Architecture Consists of a Two-Phase Feature Learning Pipeline with a Cheap Linear Transformation, Which Learns Feature Maps using Only Half of the Convolution Filters Required by Conventional Convolutional Neural Networks. in Addition, It Performs Multi-Scale Feature Fusion in its Neck using an Attention Mechanism Instead of the Naive Concatenation Used by Conventional Detectors. YOGA is a Flexible Model that Can Be Easily Scaled Up or Down by Several Orders of Magnitude to Fit a Broad Range of Hardware Constraints. We Evaluate YOGA on COCO-Val and COCO-Testdev Datasets with over 10 State-Of-The-Art Object Detectors. the Results Show that YOGA Strikes the Best Trade-Off between Model Size and Accuracy (Up to 22% Increase of AP and 23–34% Reduction of Parameters and FLOPs), Making It an Ideal Choice for Deployment in the Wild on Low-End Edge Devices. This is Further Affirmed by Our Hardware Implementation and Evaluation on NVIDIA Jetson Nano.


Computer Science

International Standard Serial Number (ISSN)


Document Type

Article - Journal

Document Version


File Type





© 2023 Elsevier, All rights reserved.

Publication Date

01 Jul 2023