广东工业大学学报 ›› 2024, Vol. 41 ›› Issue (03): 102-109.doi: 10.12052/gdutxb.230011
郑侠聪, 程良伦, 黄国恒, 王敬超
Zheng Xia-cong, Cheng Liang-lun, Huang Guo-heng, Wang Jing-chao
摘要: 传统的基于锚点框(anchor box)实现的自然场景文本检测方法中,锚点框容易受到其他文本实例的干扰产生误判或精度降低,且文本实例包含强烈的拓扑特征但并未得到重视,导致在弯曲环形文本检测任务中表现不佳。针对这个问题提出了一种新颖的神经网络结构,引入图卷积神经网络的概念,充分考虑邻近锚点框之间的联系,并融入锚点框的拓扑特征辅助图神经网络的学习,提高整体网络的有效性。在两个公开的自然场景文本检测数据集上进行了消融实验,在公开数据集CTW1500中,本文提出的方法使模型在召回率、精确率、F分数这3个指标上分别提高了3.0%、1.9%以及2.5%,在公开数据集Totel-Text中这3个指标分别是2.2%、1.8%以及2.0%。此外,本文方法还与近年提出的其他文本检测算法进行了比较,实验结果证明本文提出的方法在复杂自然场景下文本检测效果优秀,所提出的模块有利于文本检测性能的提高。
中图分类号:
[1] ZHANG S X, ZHU X B, HOU J B, et al. Deep relational reasoning graph network for arbitrary shape text detection[EB/OL]. arXiv:2003.07493. (2020-08-30)[2023-05-12]. https://doi.org/10.48550/arXiv.2003.07493. [2] ROSS B, GIRSHICK, JEFF D, TREVOR D, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]//Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2014: 580-587. [3] GIRSHICK R. Fast R-CNN[EB/OL]. arXiv:1504.08083. (2015-09-27)[2023-05-12]. https://doi.org/10.48550/arXiv.1504.08083. [4] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multi box detector[C]//Proc of the 2016 European Conference on Computer Vision. Amsterdam: Springer, 2016: 21-37. [5] ZHI T, HUANG W, TONG H, et al. Detecting text in natural image with connectionist text proposal network[C]//Proc of the 2016 European Conference on Computer Vision. Amsterdam: Springer, 2016: 56-72. [6] LIAO M, SHI B, BAI X, et al. TextBoxes: a fast text detector with a single deep neural network [C]//Proc of the AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017: 186-196. [7] LIAO M, SHI B G, BAI X. TextBoxes++: a single-shot oriented scene text detector [J]. IEEE Trans on Image Processing:a Publication of the IEEE Signal Processing Society, 2018, 27(8): 3676-3690. [8] LONG S B, RUAN J Q, ZHANG W J, et al. Textsnake: a flexible representation for detecting text of arbitrary shapes[C]//Proc of the European conference on computer vision. Munich: Springer, 2018: 20-36. [9] WEI F, HE W H, YIN F, et al. Textdragon: an end-to-end framework for arbitrary shaped text spotting[C]//Proc of the IEEE/CVF International Conference on Computer Vision, Long Beach: IEEE, 2019: 9076-9085. [10] DAN D, LIU H F, LI X L, et al. Pixellink: detecting scene text via instance segmentation[C]//Proc of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018: 296-308 [11] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 2117-2125. [12] DEFFERRARD M, BRESSON X, VANDERGHEYNST P, et al. Convolutional neural networks on graphs with fast localized spectral filtering[C]//Advances in Neural Information Processing Systems. Barcelona: MIT, 2016: 29. [13] ZHU Y Q, CHEN J Y, LIANG L Y, et al. Fourier contour embedding for arbitrary-shaped text detection[C]//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3123-3131. [14] MA C X, SUN L, ZHONG Z Y, et al. ReLaText: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks[EB/OL]. arXiv:2003.06999. (2020-03-16)[2023-05-12]. https://doi.org/10.48550/arXiv.2003.06999. [15] KOHLI H, AGARWAL J, KUMAR M. An improved method for text detection using Adam optimization algorithm [J]. Global Transitions Proceedings, 2022, 3(1): 230-234. [16] LIU Z, FANG Y, HUANG C, et al. GraphXSS: an efficient XSS payload detection approach based on graph convolutional network[J].Computers & Security, 2022,114:102597. [17] MA J Q, SHAO W Y, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals [J]. IEEE Trans on Multimedia, 2018, 20(11): 3111-3222. [18] ASHISH V, NOAM S, NIKI P, et al. Attention is all you need [J]. Advances in Neural Information Processing Systems, 2017, 30(10): 6000-6010. [19] GAO H, XIANG Y, SUI Y, et al. Topological graph convolutional network based on complex network characteristics[J]. IEEE Access, 2022, 10: 64465-64472 [20] JIANG W. Graph-based deep learning for communication networks: a survey[J]. Computer Communications, 2022, 185: 40-54 [21] WANG Z, ZHENG L, LI Y , et al. Linkage based face clustering via graph convolution network[C]//Proc of the IEEE/CVF International CONference on Computer Vision. Long Beach: IEEE, 2019: 1117-2225. [22] CHENG C K, CHAN C S, LIU C L. Total-text: toward orientation robustness in scene text detection [J]. International Journal on Document Analysis and Recognition(IJDAR) , 2020, 23(1): 31-52. [23] YUAN T L, ZHU Z, XU K, et al. A large chinese text dataset in the wild [J]. Journal of Computer Science and Technology, 2019, 34(3): 509-521. [24] GUPTA A , VEDALDI A , ZISSERMAN A, et al. Synthetic data for text localisation in natural images[C]//Proc of the IEEE CONFERence on Computer Vision and Pattern Recognition. Las Vegas : IEEE, 2016: 2315-2324. |
[1] | 龚晨,汪新. 城市住宅小区环境大气局部流动特征研究[J]. 广东工业大学学报, 2012, 29(3): 103-106. |
|