一种基于注意力机制的无监督商标检索方法

doi:10.12052/gdutxb.200027

Abstract

Abstract: In order to solve the deficiency in capturing key information in key areas and the high cost of image annotation in the existing trademark retrieval methods, this paper proposes an unsupervised trademark retrieval method based on attention mechanism. The method applies the attention module to both the spatial dimension and the channel dimension of the feature map layer in the neural network of instance discrimination. Through assigning weights to each channel and learning the spatial transformation parameters, the unsupervised network improve its ability of extracting feature. We further validate the effectiveness of our method on the public trademark datasets and the experiments demonstrate that the proposed method in the paper is better than the traditional trademark retrieval methods, and even surpasses some supervised trademark retrieval methods.

Key words: attention mechanism, instance discrimination, trademark retrieval

CLC Number:

TP391.4

Liang Guan-shu, Cao Jiang-zhong, Dai Qing-yun, Huang Yun-fei. An Unsupervised Trademark Retrieval Method Based on Attention Mechanism[J].Journal of Guangdong University of Technology, 2020, 37(06): 41-49.

References

[1] PHAN R, ANDROUTSOS D. Content-based retrieval of logo and trademarks in unconstrained color image databases using color edge gradient co-occurrence histograms [J]. Computer Vision and Image Understanding, 2010, 114(1): 66-84.
[2] LAM C, WU J, MEHTRE B. STAR-A system for trademark archival and retrieval [J]. World Patent Information, 1996, 4(18): 249-249.
[3] OJALA T, PIETIKÄINEN M, MÄENPÄÄ T. A generalized local binary pattern operator for multiresolution gray scale and rotation invariant texture classification[C]//Proceedings of the International Conference on Advances in Pattern Recognition. Brazil: Springer, 2001: 399-408.
[4] OJALA T, PIETIKAINEN M, MAENPAA T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971-987.
[5] OLIVA A, TORRALBA A. Modeling the shape of the scene: a holistic representation of the spatial envelope [J]. International Journal of Computer Vision, 2001, 42(3): 145-175.
[6] SIVIC J, ZISSERMAN A. Video Google: A text retrieval approach to object matching in videos[C]// Ninth IEEE International Conference on Computer Vision. France: IEEE, 2003: 1470-1477.
[7] LOWE D G. Distinctive image features from scale-invariant keypoints [J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[8] HER I, MOSTAFA K, HUNG H K. A hybrid trademark retrieval system using four-gray-level zernike moments and image compactness indices [J]. International Journal of Image Processing (IJIP), 2011, 4(6): 631-646.
[9] 张皓, 吴建鑫. 基于深度特征的无监督图像检索研究综述[J]. 计算机研究与发展, 2018, 55(9): 1829-1842.
ZHANG H, WU J X. A survey on unsupervised image retrieval using deep features [J]. Journal of Computer Research and Development, 2018, 55(9): 1829-1842.
[10] TURSUN O, AKER C, KALKAN S. A large-scale dataset and benchmark for similar trademark retrieval[J]. arXiv preprint arXiv: 170105766, 2017.
[11] PEREZ C A, ESTÉVEZ P A, GALDAMES F J, et al. Trademark image retrieval using a combination of deep convolutional neural networks[C]// Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN). Rio de Janeiro: IEEE, 2018: 1-7.
[12] WU Z, XIONG Y, YU S X, et al. Unsupervised feature learning via non-parametric instance discrimination[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3733-3742.
[13] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 14090473, 2014.
[14] WOO S, PARK J, LEE J-Y, et al. Cbam: convolutional block attention module[C]// Proceedings of the Proceedings of the European Conference on Computer Vision. Munich: ECCV, 2018: 13-19.
[15] GUTMANN M, HYVÄRINEN A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models[J]. Journal of Machine Learning Research, 2010, 9: 297-304.
[16] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C]// Proceedings of the Association for Computational Linguistics. North American: human language technologies, 2016: 1480-1489.
[17] ZHOU G, ZHU X, SONG C, et al. Deep interest network for click-through rate prediction[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London: ACM, 2018: 1059-1068.
[18] KIM S, HORI T, WATANABE S. Joint CTC-attention based end-to-end speech recognition using multi-task learning[C]// Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). New Orleans, LA: IEEE, 2017: 4835-4839.
[19] ZHU Y, LI R, YANG Y, et al. Learning cascade attention for fine-grained image classification[J]. Neural Networks, 2020, 122: 174-182.
[20] FU J, ZHENG H, MEI T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition[C]// IEEE Conference on Computer Vision & Pattern Recognition. Honolulu, HI: IEEE, 2017: 4438-4446.
[21] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[22] LASKAR Z, KANNALA J. Context aware query image representation for particular object retrieval[C]//Proceedings of the Scandinavian Conference on Image Analysis. Scandinavian: Springer, 2017: 88-99.
[23] JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]// Proceedings of the Advances in Neural Information Processing Systems. Lake tahoe Nevada: NIPS, 2015: 2017-2025.
[24] CHEN L, ZHANG H, XIAO J, et al. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI: IEEE, 2017: 5659-5667.
[25] WANG F, XIANG X, CHENG J, et al. Normface: L2 hypersphere embedding for face verification[C]// Proceedings of International Conference on Multimedia. Multimedia: ACM, 2017: 1041-1049.
[26] PARIKH N, BOYD S. Proximal algorithms [J]. Foundations and Trends^® in Optimization, 2014, 1(3): 127-239.
[27] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2016: 770-778.
[28] LEI Z, FUZONG L, BO Z. A CBIR method based on color-spatial feature[C]//Proceedings of IEEE Region 10 Conference. Cheju Island: IEEE, 1999: 166-169.
[29] DOUZE M, JÉGOU H, SANDHAWALIA H, et al. Evaluation of gist descriptors for web-scale image search[C]// Proceedings of the ACM International Conference on Image and Video. Retrieval. Santorini Island: ACM, 2009: 1-8.
[30] RUSIÑOL M, LLADÓS J. Efficient logo retrieval through hashing shape context descriptors[C]//Proceedings of the 9th IAPR International Workshop on Document Analysis Systems. Boston: ACM, 2010: 215-222.
[31] BAY H, ESS A, TUYTELAARS T, et al. Speeded-up robust features (SURF) [J]. Computer Vision and Image Understanding, 2008, 110(3): 346-359.
[32] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//proceedings of Conference on Computer Vision and Pattern Recognition. California: IEEE, 2005: 886-893.
[33] VURAL M F, YARDIMCI Y, TEMIZEI A. Registration of multispectral satellite images with orientation-restricted SIFT[C]//Proceedings of IEEE International Geoscience and Remote Sensing Symposium. Cape Town: IEEE, 2009: 243-246.
[34] AKER C, TURSUN O, KALKAN S. Analyzing deep features for trademark retrieval[C]//Signal Processing and Communications Applications Conference. Antalya: IEEE, 2017: 1-4.
[35] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]// Proceedings of the Advances in neural information processing systems. Curran Associates Inc: NIPS, 2012: 1097-1105.
[36] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA: IEEE, 2015: 1-9.
[37] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv Preprint arXiv: 14091556, 2014.
[38] TURSUN O, DENMAN S, SRIDHARAN S, et al. Enhancing feature invariance with learned image transformations for image retrieval[J]. arXiv Preprint arXiv: 200201642, 2020.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

An Unsupervised Trademark Retrieval Method Based on Attention Mechanism

HTML

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 5

Metrics

Comments

Recommended 0

[1]	Wu Jun-xian, He Yuan-lie. Channel Attentive Self-supervised Network for Monocular Depth Estimation [J]. Journal of Guangdong University of Technology, 2023, 40(02): 22-29.
[2]	Liu Hong-wei, Lin Wei-zhen, Wen Zhan-ming, Chen Yan-jun, Yi Min-qi. A MABM-based Model for Identifying Consumers' Sentiment Polarity―Taking Movie Reviews as an Example [J]. Journal of Guangdong University of Technology, 2022, 39(06): 1-9.
[3]	Teng Shao-hua, Dong Pu, Zhang Wei. An Attention Text Summarization Model Based on Syntactic Structure Fusion [J]. Journal of Guangdong University of Technology, 2021, 38(03): 1-8.
[4]	Zeng Bi-qing, Han Xu-li, Wang Sheng-yu, Xu Ru-yang, Zhou Wu. Sentiment Classification Based on Double Attention Convolutional Neural Network Model [J]. Journal of Guangdong University of Technology, 2019, 36(04): 10-17.
[5]	Gao Jun-yan, Liu Wen-yin, Yang Zhen-guo. Object Tracking Combined with Attention and Feature Fusion [J]. Journal of Guangdong University of Technology, 2019, 36(04): 18-23.