OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

CVPR 2021

Abstract

1. Single-path FPN super-net from SPOS search space. 2. Our OPANAS: (a) super-net training, i.e., the optimization of super-net weights; (b) optimal sub-net search with an evolutionary algorithm; (c) the searched optimal architecture.

Recently, neural architecture search (NAS) has been exploited to design feature pyramid networks (FPNs) and achieved promising results for visual object detection. Encouraged by the success, we propose a novel One-Shot Path Aggregation Network Architecture Search (OPANAS) algorithm, which significantly improves both searching efficiency and detection accuracy. Specifically, we first introduce six heterogeneous information paths to build our search space, namely top-down, bottom-up, fusing-splitting, scale-equalizing, skip-connect and none. Second, we propose a novel search space of FPNs, in which each FPN candidate is represented by a densely-connected directed acyclic graph (each node is a feature pyramid and each edge is one of the six heterogeneous information paths). Third, we propose an efficient one-shot search method to find the optimal path aggregation architecture, that is, we first train a super-net and then find the optimal candidate with an evolutionary algorithm. Experimental results demonstrate the efficacy of the proposed OPANAS for object detection: (1) OPANAS is more efficient than state-of-the-art methods (i.e., NAS-FPN and Auto-FPN), at significantly smaller searching cost (i.e., only 4 GPU days on MS-COCO); (2) the optimal architecture found by OPANAS significantly improves main-stream detectors including RetinaNet, Faster R-CNN and Cascade R-CNN, by 2.3~3.2% mAP comparing to their FPN counterparts; and (3) a new state-of-the-art accuracy-speed trade-off (52.2% mAP at 7.6 FPS) at smaller training costs than comparable state-of-the-arts.

The proposed six heterogeneous information paths mapping 4-level pyramid features $\{P_2, P_3, P_4, P_5\}$ to $\{F_2, F_3, F_4, F_5\}$. (a)-(d) are parameterized and (e)-(f) are parameter-free.

Overview Video

Click to watch video.

Further Information

For more detailed information, check out our paper and code. We are happy to receive your feedback!

@inproceedings{DBLP:conf/cvpr/LiangWTHL21,
  author    = {Tingting Liang and
               Yongtao Wang and
               Zhi Tang and
               Guosheng Hu and
               Haibin Ling},
  title     = {OPANAS: One-Shot Path Aggregation Network Architecture Search for
               Object Detection},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition, {CVPR}
               2021, virtual, June 19-25, 2021},
  pages     = {10195--10203},
  publisher = {Computer Vision Foundation / {IEEE}},
  year      = {2021},
}