Dense Heterogeneous Vehicle Object Detection Model Based on RT-DETR-DV

Ou Dongyuan; Hu Yaqi; Luo Weinan; Yu Rong

doi:10.12052/gdutxb.260017

Ou Dongyuan, Hu Yaqi, Luo Weinan, et al. Dense heterogeneous vehicle object detection model based on rt-detr-dvJ. Journal of Guangdong University of Technology. DOI: 10.12052/gdutxb.260017

Citation:

Ou Dongyuan, Hu Yaqi, Luo Weinan, et al. Dense heterogeneous vehicle object detection model based on rt-detr-dvJ. Journal of Guangdong University of Technology. DOI: 10.12052/gdutxb.260017

Citation:

Ou Dongyuan, Hu Yaqi, Luo Weinan, et al. Dense heterogeneous vehicle object detection model based on rt-detr-dvJ. Journal of Guangdong University of Technology. DOI: 10.12052/gdutxb.260017

Dense Heterogeneous Vehicle Object Detection Model Based on RT-DETR-DV

Graphical Abstract

Abstract

Abstract

In complex traffic scenarios, the implementation of vehicle flow detection and tracking relies heavily on accurate vehicle detection and localization, where significant breakthroughs have been achieved. However, in dense trafficenvironmens, challenges such as multi-scale vehicles, overlapping, and occlusion frequently arise, imposing new demands to vehicle detection. To address these issues, an improved vehicle detection model for dense traffic scenarios, termed real-time detection transformer for dense vehicles (RT-DETR-DV) , is proposed. Based on the real-time detection transformer (RT-DETR) framework, a multi-scale vehicle detection (MSVD) module is first introduced enhance the extraction and fusion of features across different scales, thereby reducing missed detections of heterogeneous vehicles. Second, to better handle overlap and occlusion issues, a dense vehicle feature separation (DVFS) module is designed to separate overlapping vehicle features through a feature pyramid network (FPN) branch, thereby enhancing feature discriminability. Finally, to improve the detection capability for small object vehicles and accelerate model training convergence, a dynamic loss function mechanism is proposed. Comparative experiments conducted on the BIT-Vehicle and Venom datasets show that the RT-DETR-DV model contains only 19.8 M parameters, representing a 9.4% reduction compared to the baseline model. Its floating point operations (FLOPs) decrease to 27.9 G, a reduction of 7.7%, while the detection frame rate is effectively improved. Meanwhile, the mean average precision (mAP50:95) increases by 0.6 and 1.8 percentage points on the two datasets, respectively. Additionally, the gradient-weighted class activation mapping (Grad-CAM) is used to validate the model’s ability to focus on object features and its robustness in dense traffic detection scenarios.

FullText(HTML)

References (30)

Cited By

Turn off MathJax

Article Contents

Dense Heterogeneous Vehicle Object Detection Model Based on RT-DETR-DV

Abstract

Catalog

Export File

Citation

Format

Content