Abstract:
RGB-Thermal (RGB-T) tracking methods utilize the complementarity of visible light and thermal infrared images to improve the accuracy of target tracking in the scenarios of low light conditions and adverse weather. However, most existing studies focus only on image-level appearance matching, making them difficult to cope with challenges of target deformation and interference under complex environments. To address this problem, a tracking method based on efficient temporal modeling is proposed. Firstly, the temporal information is modeled, and the feature fusion module is improved to process temporal information. Then, a lightweight adapter is used for fine-tuning to improve the feature extraction module for thermal infrared images, enhancing the model's ability to extract features from different modal information, reducing the computational memory usage, and improving the training efficiency. Finally, a dynamic template update and selection method is proposed to fully explore and utilize temporal information, thereby improving the model's performance. ETMTrack achieves state-of-the-art performance on three public datasets, and performs excellently in dealing with challenges such as occlusion and similar appearances, demonstrating the effectiveness and robustness of the tracking algorithm based on temporal modeling.