Journal of Guangdong University of Technology ›› 2019, Vol. 36 ›› Issue (04): 18-23.doi: 10.12052/gdutxb.190039

Previous Articles     Next Articles

Object Tracking Combined with Attention and Feature Fusion

Gao Jun-yan, Liu Wen-yin, Yang Zhen-guo   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2019-03-15 Online:2019-06-18 Published:2019-05-31

Abstract: The full-convolutional Siamese network solves the problem of object tracking through similarity learning, and the algorithm has received more and more attention. In order to extract more discriminative object features and improve the accuracy and robustness of tracking, an object tracking model combining attention mechanism and feature fusion is proposed. Firstly, the first frame and the previous frame of the current frame are combined as target templates, and the features from multiple convolution layers of the target templates and the current frame are extracted by using the shared feature extraction network. Furthermore, for the multi-layer convolution features of the target templates, the channel attention mechanism is adopted to improve the discriminative power of the template features. Finally, the features of the target templates are cross-correlated with the features of the current frame to obtain response map, thereby obtaining the position and scale of the predicted object in the current frame. The final experimental results show that compared with several advanced tracking models, the proposed object tracking model achieves relatively competitive performance.

Key words: object tracking, siamese network, feature fusion, attention mechanism, discriminative feature

CLC Number: 

  • TP931
[1] 张文峰, 胡振涛, 程建兴. 一种车辆机动目标跟踪的多传感器信息融合估计算法[J]. 广东工业大学学报, 2009, 26(1):36-39 ZHANG W F, HU Z T, CHENG J X. A multisensor data fusion estimation algorithm for vehicle maneuvering target tracking[J]. Journal of Guangdong University of Technology, 2009, 26(1):36-39
[2] 吴智敏, 何汉武, 吴悦明. 基于混合现实交互的指挥棒位姿跟踪[J]. 广东工业大学学报, 2018, 35(3):111-116 WU Z M, HE H W, WU Y M. Baton-like attitude tracking based on mixed reality interaction[J]. Journal of Guangdong University of Technology, 2018, 35(3):111-116
[3] HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intel-ligence, 2015, 37(3):583-596
[4] MA C, HUANG J B, YANG X, et al. Hierarchical convo-lutional features for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago:IEEE, 2015:3074-3082.
[5] QI Y, ZHANG S, QIN L, et al. Hedged deep tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada:IEEE, 2016:4303-4311.
[6] DANELLJAN M, ROBINSON A, KHAN F S, et al. Be-yond correlation filters:Learning continuous convolution operators for visual tracking[C]//European Conference on Computer Vision. Amsterdam:Springer, 2016:472-488.
[7] DANELLJAN M, BHAT G, KHAN F S, et al. ECO:efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii:IEEE, 2017:6638-6646.
[8] NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada:IEEE, 2016:4293-4302.
[9] HELD D, THRUN S, SAVARESE S. Learning to track at 100 fps with deep regression networks[C]//European Conference on Computer Vision. Amsterdam:Springer, 2016:749-765.
[10] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional siamese networks for object tracking[C]//European Conference on Computer Vision. Amsterdam:Springer, 2016:850-865.
[11] HE A, LUO C, TIAN X, et al. A twofold siamese network for real-time object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah:IEEE, 2018:4834-4843.
[12] LI B, YAN J, WU W, et al. High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah:IEEE, 2018:8971-8980.
[13] ZHU Z, WU W, ZOU W, et al. End-to-end flow correlation tracking with spatial-temporal attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah:IEEE, 2018:548-557.
[14] GUO Q, FENG W, ZHOU C, et al. Learning dynamic siamese network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice:IEEE, 2017:1763-1771.
[15] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252
[16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems. Lake Tahoe:NIPS Foundation, 2012:1097-1105.
[17] WANG Q, TENG Z, XING J, et al. Learning attentions:residual attentional siamese network for high performance online visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah:IEEE, 2018:4854-4863.
[18] WU Y, LIM J, YANG M H. Online object tracking:A benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, Oregon:IEEE, 2013:2411-2418.
[19] WU Y, LIM J, YANG M H. Object Tracking Benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1834-1848
[20] KRISTAN M, LEONARDIS A, MATAS J, et al. The visual object tracking vot2017 challenge results[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice:IEEE, 2017:1949-1972.
[21] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252
[22] VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii:IEEE, 2017:2805-2813.
[23] BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple:Complementary learners for real-time tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada:IEEE, 2016:1401-1409.
[1] Xie Guo-bo, Lin Li, Lin Zhi-yi, He Di-xuan, Wen Gang. An Insulator Burst Defect Detection Method Based on YOLOv4-MP [J]. Journal of Guangdong University of Technology, 2023, 40(02): 15-21.
[2] Wu Jun-xian, He Yuan-lie. Channel Attentive Self-supervised Network for Monocular Depth Estimation [J]. Journal of Guangdong University of Technology, 2023, 40(02): 22-29.
[3] Liu Hong-wei, Lin Wei-zhen, Wen Zhan-ming, Chen Yan-jun, Yi Min-qi. A MABM-based Model for Identifying Consumers' Sentiment Polarity―Taking Movie Reviews as an Example [J]. Journal of Guangdong University of Technology, 2022, 39(06): 1-9.
[4] Huang Jian-hang, Wang Zhen-you. A Research on Deep Learning Object Detection Algorithm Based on Feature Fusion [J]. Journal of Guangdong University of Technology, 2021, 38(04): 52-58.
[5] Teng Shao-hua, Dong Pu, Zhang Wei. An Attention Text Summarization Model Based on Syntactic Structure Fusion [J]. Journal of Guangdong University of Technology, 2021, 38(03): 1-8.
[6] Liang Guan-shu, Cao Jiang-zhong, Dai Qing-yun, Huang Yun-fei. An Unsupervised Trademark Retrieval Method Based on Attention Mechanism [J]. Journal of Guangdong University of Technology, 2020, 37(06): 41-49.
[7] Zeng Bi-qing, Han Xu-li, Wang Sheng-yu, Xu Ru-yang, Zhou Wu. Sentiment Classification Based on Double Attention Convolutional Neural Network Model [J]. Journal of Guangdong University of Technology, 2019, 36(04): 10-17.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!