广东工业大学学报 ›› 2024, Vol. 41 ›› Issue (03): 102-109.doi: 10.12052/gdutxb.230011

• 计算机科学与技术 • 上一篇    下一篇

嵌入拓扑特征的自然场景文本检测方法

郑侠聪, 程良伦, 黄国恒, 王敬超   

  1. 广东工业大学 计算机学院, 广东 广州 510006
  • 收稿日期:2023-01-28 出版日期:2024-05-25 发布日期:2024-06-14
  • 通信作者: 黄国恒(1985-),男,副教授,博士,主要研究方向为计算机视觉、机器学习和模式识别等,E-mail:kevinwong@gdut.edu.cn
  • 作者简介:郑侠聪(1996-),男,硕士研究生,主要研究方向为计算机视觉、人工智能,E-mail:413169248@qq.com
  • 基金资助:
    国家自然科学基金资助项目(U20A6003);国家自然科学基金广东联合基金资助项目(U1801263, U1701262, U2001201);广东省信息物理融合系统重点实验室项目(2020B1212060069);佛山市重点领域科技攻关项目(2020001006832)

Text Detection in Natural Scenes Embedded Topological Feature

Zheng Xia-cong, Cheng Liang-lun, Huang Guo-heng, Wang Jing-chao   

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2023-01-28 Online:2024-05-25 Published:2024-06-14

摘要: 传统的基于锚点框(anchor box)实现的自然场景文本检测方法中,锚点框容易受到其他文本实例的干扰产生误判或精度降低,且文本实例包含强烈的拓扑特征但并未得到重视,导致在弯曲环形文本检测任务中表现不佳。针对这个问题提出了一种新颖的神经网络结构,引入图卷积神经网络的概念,充分考虑邻近锚点框之间的联系,并融入锚点框的拓扑特征辅助图神经网络的学习,提高整体网络的有效性。在两个公开的自然场景文本检测数据集上进行了消融实验,在公开数据集CTW1500中,本文提出的方法使模型在召回率、精确率、F分数这3个指标上分别提高了3.0%、1.9%以及2.5%,在公开数据集Totel-Text中这3个指标分别是2.2%、1.8%以及2.0%。此外,本文方法还与近年提出的其他文本检测算法进行了比较,实验结果证明本文提出的方法在复杂自然场景下文本检测效果优秀,所提出的模块有利于文本检测性能的提高。

关键词: 文本检测, 自然场景, 图神经网络, 拓扑特征

Abstract: In traditional anchor box-based text detection methods for natural scenes, anchor boxes are prone to interference from other text instances, resulting in erroneous judgments or affecting accuracy. Moreover, text instances contain strong topological features, which are usually be ignored, resulting in poor performance in curved circular text detection tasks. To solve this problem, a novel neural network structure is proposed, which introduces the concept of graph convolutional networks by fully considering the relationship between adjacent anchor frames, and incorporating the topological characteristics of anchor frames to assist the learning of graph neural networks, improving the effectiveness of the overall network. The ablation experiments were conducted on two publicly available natural scene text detection datasets. In the CTW1500 dataset, the proposed method improved the model by approximately 3.0%, 1.9%, and 2.5% in terms of recall, accuracy, and F-score, respectively, and in the Totel-Text dataset , the three values were improved by approximately 2.2%, 1.8%, and 2.0%, respectively. In addition, the proposed method has also been compared with other text detection algorithms proposed in recent years. Experimental results show that the proposed method performs well for text detection in complex natural scenes, demonstrating the promising effectiveness of the proposed module for improving the performance of text detection.

Key words: text detection, natural scene, graph convolutional networks(GCN), topological feature

中图分类号: 

  • TP391
[1] ZHANG S X, ZHU X B, HOU J B, et al. Deep relational reasoning graph network for arbitrary shape text detection[EB/OL]. arXiv:2003.07493. (2020-08-30)[2023-05-12]. https://doi.org/10.48550/arXiv.2003.07493.
[2] ROSS B, GIRSHICK, JEFF D, TREVOR D, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]//Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2014: 580-587.
[3] GIRSHICK R. Fast R-CNN[EB/OL]. arXiv:1504.08083. (2015-09-27)[2023-05-12]. https://doi.org/10.48550/arXiv.1504.08083.
[4] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multi box detector[C]//Proc of the 2016 European Conference on Computer Vision. Amsterdam: Springer, 2016: 21-37.
[5] ZHI T, HUANG W, TONG H, et al. Detecting text in natural image with connectionist text proposal network[C]//Proc of the 2016 European Conference on Computer Vision. Amsterdam: Springer, 2016: 56-72.
[6] LIAO M, SHI B, BAI X, et al. TextBoxes: a fast text detector with a single deep neural network [C]//Proc of the AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017: 186-196.
[7] LIAO M, SHI B G, BAI X. TextBoxes++: a single-shot oriented scene text detector [J]. IEEE Trans on Image Processing:a Publication of the IEEE Signal Processing Society, 2018, 27(8): 3676-3690.
[8] LONG S B, RUAN J Q, ZHANG W J, et al. Textsnake: a flexible representation for detecting text of arbitrary shapes[C]//Proc of the European conference on computer vision. Munich: Springer, 2018: 20-36.
[9] WEI F, HE W H, YIN F, et al. Textdragon: an end-to-end framework for arbitrary shaped text spotting[C]//Proc of the IEEE/CVF International Conference on Computer Vision, Long Beach: IEEE, 2019: 9076-9085.
[10] DAN D, LIU H F, LI X L, et al. Pixellink: detecting scene text via instance segmentation[C]//Proc of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018: 296-308
[11] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 2117-2125.
[12] DEFFERRARD M, BRESSON X, VANDERGHEYNST P, et al. Convolutional neural networks on graphs with fast localized spectral filtering[C]//Advances in Neural Information Processing Systems. Barcelona: MIT, 2016: 29.
[13] ZHU Y Q, CHEN J Y, LIANG L Y, et al. Fourier contour embedding for arbitrary-shaped text detection[C]//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3123-3131.
[14] MA C X, SUN L, ZHONG Z Y, et al. ReLaText: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks[EB/OL]. arXiv:2003.06999. (2020-03-16)[2023-05-12]. https://doi.org/10.48550/arXiv.2003.06999.
[15] KOHLI H, AGARWAL J, KUMAR M. An improved method for text detection using Adam optimization algorithm [J]. Global Transitions Proceedings, 2022, 3(1): 230-234.
[16] LIU Z, FANG Y, HUANG C, et al. GraphXSS: an efficient XSS payload detection approach based on graph convolutional network[J].Computers & Security, 2022,114:102597.
[17] MA J Q, SHAO W Y, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals [J]. IEEE Trans on Multimedia, 2018, 20(11): 3111-3222.
[18] ASHISH V, NOAM S, NIKI P, et al. Attention is all you need [J]. Advances in Neural Information Processing Systems, 2017, 30(10): 6000-6010.
[19] GAO H, XIANG Y, SUI Y, et al. Topological graph convolutional network based on complex network characteristics[J]. IEEE Access, 2022, 10: 64465-64472
[20] JIANG W. Graph-based deep learning for communication networks: a survey[J]. Computer Communications, 2022, 185: 40-54
[21] WANG Z, ZHENG L, LI Y , et al. Linkage based face clustering via graph convolution network[C]//Proc of the IEEE/CVF International CONference on Computer Vision. Long Beach: IEEE, 2019: 1117-2225.
[22] CHENG C K, CHAN C S, LIU C L. Total-text: toward orientation robustness in scene text detection [J]. International Journal on Document Analysis and Recognition(IJDAR) , 2020, 23(1): 31-52.
[23] YUAN T L, ZHU Z, XU K, et al. A large chinese text dataset in the wild [J]. Journal of Computer Science and Technology, 2019, 34(3): 509-521.
[24] GUPTA A , VEDALDI A , ZISSERMAN A, et al. Synthetic data for text localisation in natural images[C]//Proc of the IEEE CONFERence on Computer Vision and Pattern Recognition. Las Vegas : IEEE, 2016: 2315-2324.
[1] 龚晨,汪新. 城市住宅小区环境大气局部流动特征研究[J]. 广东工业大学学报, 2012, 29(3): 103-106.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!