广东工业大学学报 ›› 2019, Vol. 36 ›› Issue (04): 18-23.doi: 10.12052/gdutxb.190039

• 综合研究 • 上一篇    下一篇

结合注意力与特征融合的目标跟踪

高俊艳, 刘文印, 杨振国   

  1. 广东工业大学 计算机学院, 广东 广州 510006
  • 收稿日期:2019-03-15 出版日期:2019-06-18 发布日期:2019-05-31
  • 作者简介:高俊艳(1993-),女,硕士研究生,主要研究方向为深度学习、目标跟踪、模式识别.
  • 基金资助:
    国家自然科学基金资助项目(61703109,91748107);中国博士后科学基金资助项目(2018M643024);广东省引进创新科研团队计划资助项目(2014ZT05G157)

Object Tracking Combined with Attention and Feature Fusion

Gao Jun-yan, Liu Wen-yin, Yang Zhen-guo   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2019-03-15 Online:2019-06-18 Published:2019-05-31

摘要: 全卷积孪生网络通过相似性学习解决目标跟踪问题,其算法受到了越来越多的关注.为了提取更有判别力的目标特征,提升跟踪的精确度和鲁棒性,提出了一种结合注意力机制与特征融合的目标跟踪模型.首先,将第一帧和当前帧的前一帧结合作为目标模板,利用共享的特征提取网络提取目标模板和当前帧的多个卷积层的特征;然后,对于目标模板的多层卷积特征,结合通道注意力机制处理,提升模板特征的判别力;最后,目标模板的特征与当前帧的特征进行互相关计算,得到响应图,从而获取预测目标在当前帧中的位置和尺度.最终实验结果表明,与几个先进的跟踪模型相比,提出的目标跟踪模型获得了比较有竞争力的性能.

关键词: 目标跟踪, 孪生网络, 特征融合, 注意力机制, 判别力特征

Abstract: The full-convolutional Siamese network solves the problem of object tracking through similarity learning, and the algorithm has received more and more attention. In order to extract more discriminative object features and improve the accuracy and robustness of tracking, an object tracking model combining attention mechanism and feature fusion is proposed. Firstly, the first frame and the previous frame of the current frame are combined as target templates, and the features from multiple convolution layers of the target templates and the current frame are extracted by using the shared feature extraction network. Furthermore, for the multi-layer convolution features of the target templates, the channel attention mechanism is adopted to improve the discriminative power of the template features. Finally, the features of the target templates are cross-correlated with the features of the current frame to obtain response map, thereby obtaining the position and scale of the predicted object in the current frame. The final experimental results show that compared with several advanced tracking models, the proposed object tracking model achieves relatively competitive performance.

Key words: object tracking, siamese network, feature fusion, attention mechanism, discriminative feature

中图分类号: 

  • TP931
[1] 张文峰, 胡振涛, 程建兴. 一种车辆机动目标跟踪的多传感器信息融合估计算法[J]. 广东工业大学学报, 2009, 26(1):36-39 ZHANG W F, HU Z T, CHENG J X. A multisensor data fusion estimation algorithm for vehicle maneuvering target tracking[J]. Journal of Guangdong University of Technology, 2009, 26(1):36-39
[2] 吴智敏, 何汉武, 吴悦明. 基于混合现实交互的指挥棒位姿跟踪[J]. 广东工业大学学报, 2018, 35(3):111-116 WU Z M, HE H W, WU Y M. Baton-like attitude tracking based on mixed reality interaction[J]. Journal of Guangdong University of Technology, 2018, 35(3):111-116
[3] HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intel-ligence, 2015, 37(3):583-596
[4] MA C, HUANG J B, YANG X, et al. Hierarchical convo-lutional features for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago:IEEE, 2015:3074-3082.
[5] QI Y, ZHANG S, QIN L, et al. Hedged deep tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada:IEEE, 2016:4303-4311.
[6] DANELLJAN M, ROBINSON A, KHAN F S, et al. Be-yond correlation filters:Learning continuous convolution operators for visual tracking[C]//European Conference on Computer Vision. Amsterdam:Springer, 2016:472-488.
[7] DANELLJAN M, BHAT G, KHAN F S, et al. ECO:efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii:IEEE, 2017:6638-6646.
[8] NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada:IEEE, 2016:4293-4302.
[9] HELD D, THRUN S, SAVARESE S. Learning to track at 100 fps with deep regression networks[C]//European Conference on Computer Vision. Amsterdam:Springer, 2016:749-765.
[10] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional siamese networks for object tracking[C]//European Conference on Computer Vision. Amsterdam:Springer, 2016:850-865.
[11] HE A, LUO C, TIAN X, et al. A twofold siamese network for real-time object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah:IEEE, 2018:4834-4843.
[12] LI B, YAN J, WU W, et al. High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah:IEEE, 2018:8971-8980.
[13] ZHU Z, WU W, ZOU W, et al. End-to-end flow correlation tracking with spatial-temporal attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah:IEEE, 2018:548-557.
[14] GUO Q, FENG W, ZHOU C, et al. Learning dynamic siamese network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice:IEEE, 2017:1763-1771.
[15] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252
[16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems. Lake Tahoe:NIPS Foundation, 2012:1097-1105.
[17] WANG Q, TENG Z, XING J, et al. Learning attentions:residual attentional siamese network for high performance online visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah:IEEE, 2018:4854-4863.
[18] WU Y, LIM J, YANG M H. Online object tracking:A benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, Oregon:IEEE, 2013:2411-2418.
[19] WU Y, LIM J, YANG M H. Object Tracking Benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1834-1848
[20] KRISTAN M, LEONARDIS A, MATAS J, et al. The visual object tracking vot2017 challenge results[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice:IEEE, 2017:1949-1972.
[21] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252
[22] VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii:IEEE, 2017:2805-2813.
[23] BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple:Complementary learners for real-time tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada:IEEE, 2016:1401-1409.
[1] 谢国波, 林立, 林志毅, 贺笛轩, 文刚. 基于YOLOv4-MP的绝缘子爆裂缺陷检测方法[J]. 广东工业大学学报, 2023, 40(02): 15-21.
[2] 吴俊贤, 何元烈. 基于通道注意力的自监督深度估计方法[J]. 广东工业大学学报, 2023, 40(02): 22-29.
[3] 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9.
[4] 黄剑航, 王振友. 基于特征融合的深度学习目标检测算法研究[J]. 广东工业大学学报, 2021, 38(04): 52-58.
[5] 滕少华, 董谱, 张巍. 融合语义结构的注意力文本摘要模型[J]. 广东工业大学学报, 2021, 38(03): 1-8.
[6] 梁观术, 曹江中, 戴青云, 黄云飞. 一种基于注意力机制的无监督商标检索方法[J]. 广东工业大学学报, 2020, 37(06): 41-49.
[7] 曾碧卿, 韩旭丽, 王盛玉, 徐如阳, 周武. 基于双注意力卷积神经网络模型的情感分析研究[J]. 广东工业大学学报, 2019, 36(04): 10-17.
[8] 孙伟, 钟映春, 谭志, 连伟烯. 多特征融合的室内场景分类研究[J]. 广东工业大学学报, 2015, 32(1): 75-79.
[9] 张文峰; 胡振涛; 程建兴; . 种车辆机动目标跟踪的多传感器信息融合估计算法[J]. 广东工业大学学报, 2009, 26(1): 16-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!