DEIM: DETR with Improved Matching for Fast Convergence.

Shihua Huang1Zhichao Lu2Xiaodong Cun3Yongjun Yu1Xiao Zhou4Xi Shen1

1 Intellindust AI Lab   2 City University of Hong Kong   3 Great Bay University   4 Hefei Normal University

arXiv GitHub Slides

Abstract


    We introduce DEIM, an innovative and efficient training framework designed to accelerate convergence in real-time object detection with Transformer-based architectures (DETR). To mitigate the sparse supervision inherent in one-to-one (O2O) matching in DETR models, DEIM employs a Dense O2O matching strategy. This approach increases the number of positive samples per image by incorporating additional targets, using standard data augmentation techniques. While Dense O2O matching speeds up convergence, it also introduces numerous low-quality matches that could affect performance. To address this, we propose the Matchability-Aware Loss (MAL), a novel loss function that optimizes matches across various quality levels, enhancing the effectiveness of Dense O2O. Extensive experiments on the COCO dataset validate the efficacy of DEIM. When integrated with RT-DETR and D-FINE, it consistently boosts performance while reducing training time by 50%. Notably, paired with RT-DETRv2, DEIM achieves 53.2% AP in a single day of training on an NVIDIA 4090 GPU. Additionally, DEIM-trained real-time models outperform leading real-time object detectors, with DEIM-D-FINE-L and DEIM-D-FINE-X achieving 54.7% and 56.5% AP at 124 and 78 FPS on an NVIDIA T4 GPU, respectively, without the need for additional data. We believe DEIM sets a new baseline for advancements in real-time object detection.

Method


DEIM.jpg

Results


Please refer to our paper for more experiments.

Faster Convergence on COCO

Better performance on COCO

clothing1m.jpg
clothing1m.jpg

Resources


arXiv

github_repo.jpg

Code

github_repo.jpg

Slides

github_repo.jpg

BibTeX

If you find this work useful for your research, please cite:
          @misc{huang2024deim,
            title={DEIM: DETR with Improved Matching for Fast Convergence},
            author={Shihua Huang, Zhichao Lu, Xiaodong Cun, Yongjun Yu, Xiao Zhou, and Xi Shen},
            year={2024},
            eprint={2412.04234},
            archivePrefix={arXiv},
            primaryClass={cs.CV}
      }

© This webpage was in part inspired from this template.