广东工业大学学报 ›› 2020, Vol. 37 ›› Issue (06): 41-49.doi: 10.12052/gdutxb.200027
梁观术1, 曹江中1, 戴青云1,2, 黄云飞1
Liang Guan-shu1, Cao Jiang-zhong1, Dai Qing-yun1,2, Huang Yun-fei1
摘要: 针对现有商标特征提取方法无法有效捕捉重点区域的关键信息和图像标注成本过高的问题,提出一种基于注意力机制的无监督商标检索方法。该方法基于实例区分算法,将注意力模块同时运用在神经网络特征映射层的空间维度和通道维度上,通过对各个通道进行权重的分配以及对空间变换参数的学习,增强无监督网络的特征表达能力。该方法在公开的商标数据集上进行了验证,实验表明检索效果优于传统的商标检索方法,甚至优于一些有监督商标检索方法。
中图分类号:
[1] PHAN R, ANDROUTSOS D. Content-based retrieval of logo and trademarks in unconstrained color image databases using color edge gradient co-occurrence histograms [J]. Computer Vision and Image Understanding, 2010, 114(1): 66-84. [2] LAM C, WU J, MEHTRE B. STAR-A system for trademark archival and retrieval [J]. World Patent Information, 1996, 4(18): 249-249. [3] OJALA T, PIETIKÄINEN M, MÄENPÄÄ T. A generalized local binary pattern operator for multiresolution gray scale and rotation invariant texture classification[C]//Proceedings of the International Conference on Advances in Pattern Recognition. Brazil: Springer, 2001: 399-408. [4] OJALA T, PIETIKAINEN M, MAENPAA T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971-987. [5] OLIVA A, TORRALBA A. Modeling the shape of the scene: a holistic representation of the spatial envelope [J]. International Journal of Computer Vision, 2001, 42(3): 145-175. [6] SIVIC J, ZISSERMAN A. Video Google: A text retrieval approach to object matching in videos[C]// Ninth IEEE International Conference on Computer Vision. France: IEEE, 2003: 1470-1477. [7] LOWE D G. Distinctive image features from scale-invariant keypoints [J]. International Journal of Computer Vision, 2004, 60(2): 91-110. [8] HER I, MOSTAFA K, HUNG H K. A hybrid trademark retrieval system using four-gray-level zernike moments and image compactness indices [J]. International Journal of Image Processing (IJIP), 2011, 4(6): 631-646. [9] 张皓, 吴建鑫. 基于深度特征的无监督图像检索研究综述[J]. 计算机研究与发展, 2018, 55(9): 1829-1842. ZHANG H, WU J X. A survey on unsupervised image retrieval using deep features [J]. Journal of Computer Research and Development, 2018, 55(9): 1829-1842. [10] TURSUN O, AKER C, KALKAN S. A large-scale dataset and benchmark for similar trademark retrieval[J]. arXiv preprint arXiv: 170105766, 2017. [11] PEREZ C A, ESTÉVEZ P A, GALDAMES F J, et al. Trademark image retrieval using a combination of deep convolutional neural networks[C]// Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN). Rio de Janeiro: IEEE, 2018: 1-7. [12] WU Z, XIONG Y, YU S X, et al. Unsupervised feature learning via non-parametric instance discrimination[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3733-3742. [13] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 14090473, 2014. [14] WOO S, PARK J, LEE J-Y, et al. Cbam: convolutional block attention module[C]// Proceedings of the Proceedings of the European Conference on Computer Vision. Munich: ECCV, 2018: 13-19. [15] GUTMANN M, HYVÄRINEN A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models[J]. Journal of Machine Learning Research, 2010, 9: 297-304. [16] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C]// Proceedings of the Association for Computational Linguistics. North American: human language technologies, 2016: 1480-1489. [17] ZHOU G, ZHU X, SONG C, et al. Deep interest network for click-through rate prediction[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London: ACM, 2018: 1059-1068. [18] KIM S, HORI T, WATANABE S. Joint CTC-attention based end-to-end speech recognition using multi-task learning[C]// Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). New Orleans, LA: IEEE, 2017: 4835-4839. [19] ZHU Y, LI R, YANG Y, et al. Learning cascade attention for fine-grained image classification[J]. Neural Networks, 2020, 122: 174-182. [20] FU J, ZHENG H, MEI T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition[C]// IEEE Conference on Computer Vision & Pattern Recognition. Honolulu, HI: IEEE, 2017: 4438-4446. [21] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141. [22] LASKAR Z, KANNALA J. Context aware query image representation for particular object retrieval[C]//Proceedings of the Scandinavian Conference on Image Analysis. Scandinavian: Springer, 2017: 88-99. [23] JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]// Proceedings of the Advances in Neural Information Processing Systems. Lake tahoe Nevada: NIPS, 2015: 2017-2025. [24] CHEN L, ZHANG H, XIAO J, et al. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI: IEEE, 2017: 5659-5667. [25] WANG F, XIANG X, CHENG J, et al. Normface: L2 hypersphere embedding for face verification[C]// Proceedings of International Conference on Multimedia. Multimedia: ACM, 2017: 1041-1049. [26] PARIKH N, BOYD S. Proximal algorithms [J]. Foundations and Trends® in Optimization, 2014, 1(3): 127-239. [27] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2016: 770-778. [28] LEI Z, FUZONG L, BO Z. A CBIR method based on color-spatial feature[C]//Proceedings of IEEE Region 10 Conference. Cheju Island: IEEE, 1999: 166-169. [29] DOUZE M, JÉGOU H, SANDHAWALIA H, et al. Evaluation of gist descriptors for web-scale image search[C]// Proceedings of the ACM International Conference on Image and Video. Retrieval. Santorini Island: ACM, 2009: 1-8. [30] RUSIÑOL M, LLADÓS J. Efficient logo retrieval through hashing shape context descriptors[C]//Proceedings of the 9th IAPR International Workshop on Document Analysis Systems. Boston: ACM, 2010: 215-222. [31] BAY H, ESS A, TUYTELAARS T, et al. Speeded-up robust features (SURF) [J]. Computer Vision and Image Understanding, 2008, 110(3): 346-359. [32] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//proceedings of Conference on Computer Vision and Pattern Recognition. California: IEEE, 2005: 886-893. [33] VURAL M F, YARDIMCI Y, TEMIZEI A. Registration of multispectral satellite images with orientation-restricted SIFT[C]//Proceedings of IEEE International Geoscience and Remote Sensing Symposium. Cape Town: IEEE, 2009: 243-246. [34] AKER C, TURSUN O, KALKAN S. Analyzing deep features for trademark retrieval[C]//Signal Processing and Communications Applications Conference. Antalya: IEEE, 2017: 1-4. [35] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]// Proceedings of the Advances in neural information processing systems. Curran Associates Inc: NIPS, 2012: 1097-1105. [36] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA: IEEE, 2015: 1-9. [37] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv Preprint arXiv: 14091556, 2014. [38] TURSUN O, DENMAN S, SRIDHARAN S, et al. Enhancing feature invariance with learned image transformations for image retrieval[J]. arXiv Preprint arXiv: 200201642, 2020. |
[1] | 吴俊贤, 何元烈. 基于通道注意力的自监督深度估计方法[J]. 广东工业大学学报, 2023, 40(02): 22-29. |
[2] | 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9. |
[3] | 滕少华, 董谱, 张巍. 融合语义结构的注意力文本摘要模型[J]. 广东工业大学学报, 2021, 38(03): 1-8. |
[4] | 曾碧卿, 韩旭丽, 王盛玉, 徐如阳, 周武. 基于双注意力卷积神经网络模型的情感分析研究[J]. 广东工业大学学报, 2019, 36(04): 10-17. |
[5] | 高俊艳, 刘文印, 杨振国. 结合注意力与特征融合的目标跟踪[J]. 广东工业大学学报, 2019, 36(04): 18-23. |
|