Dense Heterogeneous Vehicle Object Detection Model Based on RT-DETR-DV
-
-
Abstract
In complex traffic scenarios, the implementation of vehicle flow detection and tracking relies heavily on accurate vehicle detection and localization, where significant breakthroughs have been achieved. However, in dense trafficenvironmens, challenges such as multi-scale vehicles, overlapping, and occlusion frequently arise, imposing new demands to vehicle detection. To address these issues, an improved vehicle detection model for dense traffic scenarios, termed real-time detection transformer for dense vehicles (RT-DETR-DV) , is proposed. Based on the real-time detection transformer (RT-DETR) framework, a multi-scale vehicle detection (MSVD) module is first introduced enhance the extraction and fusion of features across different scales, thereby reducing missed detections of heterogeneous vehicles. Second, to better handle overlap and occlusion issues, a dense vehicle feature separation (DVFS) module is designed to separate overlapping vehicle features through a feature pyramid network (FPN) branch, thereby enhancing feature discriminability. Finally, to improve the detection capability for small object vehicles and accelerate model training convergence, a dynamic loss function mechanism is proposed. Comparative experiments conducted on the BIT-Vehicle and Venom datasets show that the RT-DETR-DV model contains only 19.8 M parameters, representing a 9.4% reduction compared to the baseline model. Its floating point operations (FLOPs) decrease to 27.9 G, a reduction of 7.7%, while the detection frame rate is effectively improved. Meanwhile, the mean average precision (mAP50:95) increases by 0.6 and 1.8 percentage points on the two datasets, respectively. Additionally, the gradient-weighted class activation mapping (Grad-CAM) is used to validate the model’s ability to focus on object features and its robustness in dense traffic detection scenarios.
-
-