结合注意力与特征融合的目标跟踪

doi:10.12052/gdutxb.190039

摘要/Abstract

摘要： 全卷积孪生网络通过相似性学习解决目标跟踪问题，其算法受到了越来越多的关注.为了提取更有判别力的目标特征，提升跟踪的精确度和鲁棒性，提出了一种结合注意力机制与特征融合的目标跟踪模型.首先，将第一帧和当前帧的前一帧结合作为目标模板，利用共享的特征提取网络提取目标模板和当前帧的多个卷积层的特征；然后，对于目标模板的多层卷积特征，结合通道注意力机制处理，提升模板特征的判别力；最后，目标模板的特征与当前帧的特征进行互相关计算，得到响应图，从而获取预测目标在当前帧中的位置和尺度.最终实验结果表明，与几个先进的跟踪模型相比，提出的目标跟踪模型获得了比较有竞争力的性能.

关键词: 目标跟踪, 孪生网络, 特征融合, 注意力机制, 判别力特征

Abstract: The full-convolutional Siamese network solves the problem of object tracking through similarity learning, and the algorithm has received more and more attention. In order to extract more discriminative object features and improve the accuracy and robustness of tracking, an object tracking model combining attention mechanism and feature fusion is proposed. Firstly, the first frame and the previous frame of the current frame are combined as target templates, and the features from multiple convolution layers of the target templates and the current frame are extracted by using the shared feature extraction network. Furthermore, for the multi-layer convolution features of the target templates, the channel attention mechanism is adopted to improve the discriminative power of the template features. Finally, the features of the target templates are cross-correlated with the features of the current frame to obtain response map, thereby obtaining the position and scale of the predicted object in the current frame. The final experimental results show that compared with several advanced tracking models, the proposed object tracking model achieves relatively competitive performance.

Key words: object tracking, siamese network, feature fusion, attention mechanism, discriminative feature

中图分类号:

TP931

高俊艳, 刘文印, 杨振国. 结合注意力与特征融合的目标跟踪[J]. 广东工业大学学报, 2019, 36(04): 18-23.

Gao Jun-yan, Liu Wen-yin, Yang Zhen-guo. Object Tracking Combined with Attention and Feature Fusion[J]. Journal of Guangdong University of Technology, 2019, 36(04): 18-23.

参考文献

[1] 张文峰, 胡振涛, 程建兴. 一种车辆机动目标跟踪的多传感器信息融合估计算法[J]. 广东工业大学学报, 2009, 26(1):36-39 ZHANG W F, HU Z T, CHENG J X. A multisensor data fusion estimation algorithm for vehicle maneuvering target tracking[J]. Journal of Guangdong University of Technology, 2009, 26(1):36-39
[2] 吴智敏, 何汉武, 吴悦明. 基于混合现实交互的指挥棒位姿跟踪[J]. 广东工业大学学报, 2018, 35(3):111-116 WU Z M, HE H W, WU Y M. Baton-like attitude tracking based on mixed reality interaction[J]. Journal of Guangdong University of Technology, 2018, 35(3):111-116
[3] HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intel-ligence, 2015, 37(3):583-596
[4] MA C, HUANG J B, YANG X, et al. Hierarchical convo-lutional features for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago:IEEE, 2015:3074-3082.
[5] QI Y, ZHANG S, QIN L, et al. Hedged deep tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada:IEEE, 2016:4303-4311.
[6] DANELLJAN M, ROBINSON A, KHAN F S, et al. Be-yond correlation filters:Learning continuous convolution operators for visual tracking[C]//European Conference on Computer Vision. Amsterdam:Springer, 2016:472-488.
[7] DANELLJAN M, BHAT G, KHAN F S, et al. ECO:efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii:IEEE, 2017:6638-6646.
[8] NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada:IEEE, 2016:4293-4302.
[9] HELD D, THRUN S, SAVARESE S. Learning to track at 100 fps with deep regression networks[C]//European Conference on Computer Vision. Amsterdam:Springer, 2016:749-765.
[10] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional siamese networks for object tracking[C]//European Conference on Computer Vision. Amsterdam:Springer, 2016:850-865.
[11] HE A, LUO C, TIAN X, et al. A twofold siamese network for real-time object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah:IEEE, 2018:4834-4843.
[12] LI B, YAN J, WU W, et al. High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah:IEEE, 2018:8971-8980.
[13] ZHU Z, WU W, ZOU W, et al. End-to-end flow correlation tracking with spatial-temporal attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah:IEEE, 2018:548-557.
[14] GUO Q, FENG W, ZHOU C, et al. Learning dynamic siamese network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice:IEEE, 2017:1763-1771.
[15] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252
[16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems. Lake Tahoe:NIPS Foundation, 2012:1097-1105.
[17] WANG Q, TENG Z, XING J, et al. Learning attentions:residual attentional siamese network for high performance online visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah:IEEE, 2018:4854-4863.
[18] WU Y, LIM J, YANG M H. Online object tracking:A benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, Oregon:IEEE, 2013:2411-2418.
[19] WU Y, LIM J, YANG M H. Object Tracking Benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1834-1848
[20] KRISTAN M, LEONARDIS A, MATAS J, et al. The visual object tracking vot2017 challenge results[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice:IEEE, 2017:1949-1972.
[21] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252
[22] VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii:IEEE, 2017:2805-2813.
[23] BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple:Complementary learners for real-time tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada:IEEE, 2016:1401-1409.

Metrics

Viewed

Full text

3064

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	3064

From	Others	local

Times	538	2526
Rate	18%	82%

Abstract

700

Just accepted	Online first	Issue

0	8	692

From	Others	local

Times	212	488
Rate	30%	70%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed