Skaldheim

字数

794 字

阅读时间

5 分钟

Abstract

The YOLO series has become the most popular framework for real-time object detection due to its reasonable trade-off between speed and accuracy.

However, we observe that the speed and accuracy of YOLOs are negatively affected by the NMS.

The Trouble with YOLO:

It relies on a post-processing step called NMS (Non-Maximum Suppression). You can think of YOLO as "casting a net to catch fish"—it throws out many overlapping boxes at once, and NMS is used to filter out the redundant ones, keeping only the best. This filtering process slows down the overall speed and affects performance.

Recently, end-to-end Transformer-based detectors (DETRs) have provided an alternative to eliminating NMS. Nevertheless, the high computational cost limits their practicality and hinders them from fully exploiting the advantage of excluding NMS.

E2E vs Non-E2E

E2E:

This is a design concept for a model or system. It refers to taking a complex task, from the original input ("one end") directly to the final output ("the other end") in one step, and entrusting all the intermediate processes to a single, unified neural network model to learn and process, without human intervention or splitting into multiple independent modules.

Non-E2E:

It splits a complex task into multiple independent, sequential subtasks (modules). Each module is responsible for handling a part of the work and uses its output as the input of the next module. These modules are usually designed and trained independently.

In this paper, we propose the Real-Time DEtection TRansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge that addresses the above dilemma. ...

Specifically, we design an efficient hybrid encoder to expeditiously process multi-scale features... to improve speed.

Innovation 01 (To speed up)

The encoder is the part of the model that is responsible for "reading" and understanding the features of the image. The encoder of traditional DETR is very computationally intensive.

RT-DETR is cleverly designed. It decouples the interaction of features at different scales and processes features in a smarter and less computationally expensive way, thereby greatly improving the speed.

Then, we propose the uncertainty-minimal query selection to provide high-quality initial queries to the decoder, thereby improving accuracy.

Innovation 02 (For accuracy)

Think of a query as a "probe" that the model sends out to find an object. The initial quality of the probe determines whether the object can be found quickly and accurately in the end.

This method can help the model select very high-quality initial "probes", making subsequent detection more accurate.

In addition, RT-DETR supports flexible speed tuning by adjusting the number of decoder layers to adapt to various scenarios without retraining.

RT-DETR Core Performance Comparison Table

Model	Comparison Target	Dataset	Accuracy (AP)	Speed (FPS)	Notes
RT-DETR-R50	Advanced YOLOs	COCO	53.1%	108	Outperforms YOLOs in both speed and accuracy.
RT-DETR-R101	Advanced YOLOs	COCO	54.3%	74	Outperforms YOLOs in both speed and accuracy.
RT-DETR-R50	DINO-R50	COCO	2.2% higher	~21x faster
RT-DETR-R50	(Performance Showcase)	Objects365 (pre-training)	55.3%	-	Accuracy after pre-training on a larger dataset.
RT-DETR-R101	(Performance Showcase)	Objects365 (pre-training)	56.2%	-	Accuracy after pre-training on a larger dataset.

Introduction

NMS not only slows down the inference speed, but worse, it introduces hyperparameters that need to be set manually. These hyperparameters can cause instability in the speed and accuracy of the model.

hyperparameters

Hyperparameters (such as learning rate, number of neural network layers) are teaching rules set by the teacher in advance, such as "review your notes several times a day"

End-to-end Speed of Detectors

Analysis of NMS

Experiment 1: Sensitivity of NMS to “Confidence Threshold” (Corresponding to Figure 2)

Confidence threshold

Before NMS processing, the model will filter out some prediction boxes with very low scores. This "score threshold" is the confidence threshold.

The results are clear at a glance. When the confidence threshold is set very low (such as 0.001), there are tens of thousands of boxes to process, while when the threshold is increased, the number of boxes to process decreases dramatically.

Conclusion

This proves that the computational load of NMS is extremely sensitive to this "threshold" setting.

贡献者

Tuntun Yuchiha

文件历史

最后编辑于大约 8 小时前查看完整历史

STM32F103C8T6

STMduino Framework

2023 Xindong Competition

Arduino

MCM

SMC

Literature review

Preparation

Research Question 01

C

Analog CMOS Integrated Circuits

Electromagnetic Fields and Waves

Electronic Circuit Open Experiment

Marxist Principles

News English Audio-Visual

Semiconductor Physics

Signal and Systems

Solid State Physics

Seaborn

Analysis of Variable Relationships

🔭 Physics

🌟 Astrophysics

🖥️ Computer Science

✒️ Supplementary Knowledge

💴 Reimbursement

📑 Task assignment

🧑🏻‍🏫 Faculty Research Background

School of Integrated Circuits

Preview Stage

Abstract

RT-DETR Core Performance Comparison Table

Introduction

End-to-end Speed of Detectors

Analysis of NMS

贡献者

文件历史

STMduino Framework

Arduino

SMC

Literature review

Preparation

Research Question 01

Analysis of Variable Relationships

🌟 Astrophysics

​​School of Integrated Circuits

Abstract ​

RT-DETR Core Performance Comparison Table ​

Introduction ​

End-to-end Speed of Detectors ​

Analysis of NMS ​

贡献者 ​

文件历史 ​

School of Integrated Circuits

Abstract

RT-DETR Core Performance Comparison Table

Introduction

End-to-end Speed of Detectors

Analysis of NMS

贡献者

文件历史