Journal of Guangdong University of Technology ›› 2024, Vol. 41 ›› Issue (03): 102-109.doi: 10.12052/gdutxb.230011

• Computer Science and Technology • Previous Articles     Next Articles

Text Detection in Natural Scenes Embedded Topological Feature

Zheng Xia-cong, Cheng Liang-lun, Huang Guo-heng, Wang Jing-chao   

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2023-01-28 Online:2024-05-25 Published:2024-06-14

Abstract: In traditional anchor box-based text detection methods for natural scenes, anchor boxes are prone to interference from other text instances, resulting in erroneous judgments or affecting accuracy. Moreover, text instances contain strong topological features, which are usually be ignored, resulting in poor performance in curved circular text detection tasks. To solve this problem, a novel neural network structure is proposed, which introduces the concept of graph convolutional networks by fully considering the relationship between adjacent anchor frames, and incorporating the topological characteristics of anchor frames to assist the learning of graph neural networks, improving the effectiveness of the overall network. The ablation experiments were conducted on two publicly available natural scene text detection datasets. In the CTW1500 dataset, the proposed method improved the model by approximately 3.0%, 1.9%, and 2.5% in terms of recall, accuracy, and F-score, respectively, and in the Totel-Text dataset , the three values were improved by approximately 2.2%, 1.8%, and 2.0%, respectively. In addition, the proposed method has also been compared with other text detection algorithms proposed in recent years. Experimental results show that the proposed method performs well for text detection in complex natural scenes, demonstrating the promising effectiveness of the proposed module for improving the performance of text detection.

Key words: text detection, natural scene, graph convolutional networks(GCN), topological feature

CLC Number: 

  • TP391
[1] ZHANG S X, ZHU X B, HOU J B, et al. Deep relational reasoning graph network for arbitrary shape text detection[EB/OL]. arXiv:2003.07493. (2020-08-30)[2023-05-12]. https://doi.org/10.48550/arXiv.2003.07493.
[2] ROSS B, GIRSHICK, JEFF D, TREVOR D, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]//Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2014: 580-587.
[3] GIRSHICK R. Fast R-CNN[EB/OL]. arXiv:1504.08083. (2015-09-27)[2023-05-12]. https://doi.org/10.48550/arXiv.1504.08083.
[4] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multi box detector[C]//Proc of the 2016 European Conference on Computer Vision. Amsterdam: Springer, 2016: 21-37.
[5] ZHI T, HUANG W, TONG H, et al. Detecting text in natural image with connectionist text proposal network[C]//Proc of the 2016 European Conference on Computer Vision. Amsterdam: Springer, 2016: 56-72.
[6] LIAO M, SHI B, BAI X, et al. TextBoxes: a fast text detector with a single deep neural network [C]//Proc of the AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017: 186-196.
[7] LIAO M, SHI B G, BAI X. TextBoxes++: a single-shot oriented scene text detector [J]. IEEE Trans on Image Processing:a Publication of the IEEE Signal Processing Society, 2018, 27(8): 3676-3690.
[8] LONG S B, RUAN J Q, ZHANG W J, et al. Textsnake: a flexible representation for detecting text of arbitrary shapes[C]//Proc of the European conference on computer vision. Munich: Springer, 2018: 20-36.
[9] WEI F, HE W H, YIN F, et al. Textdragon: an end-to-end framework for arbitrary shaped text spotting[C]//Proc of the IEEE/CVF International Conference on Computer Vision, Long Beach: IEEE, 2019: 9076-9085.
[10] DAN D, LIU H F, LI X L, et al. Pixellink: detecting scene text via instance segmentation[C]//Proc of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018: 296-308
[11] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 2117-2125.
[12] DEFFERRARD M, BRESSON X, VANDERGHEYNST P, et al. Convolutional neural networks on graphs with fast localized spectral filtering[C]//Advances in Neural Information Processing Systems. Barcelona: MIT, 2016: 29.
[13] ZHU Y Q, CHEN J Y, LIANG L Y, et al. Fourier contour embedding for arbitrary-shaped text detection[C]//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3123-3131.
[14] MA C X, SUN L, ZHONG Z Y, et al. ReLaText: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks[EB/OL]. arXiv:2003.06999. (2020-03-16)[2023-05-12]. https://doi.org/10.48550/arXiv.2003.06999.
[15] KOHLI H, AGARWAL J, KUMAR M. An improved method for text detection using Adam optimization algorithm [J]. Global Transitions Proceedings, 2022, 3(1): 230-234.
[16] LIU Z, FANG Y, HUANG C, et al. GraphXSS: an efficient XSS payload detection approach based on graph convolutional network[J].Computers & Security, 2022,114:102597.
[17] MA J Q, SHAO W Y, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals [J]. IEEE Trans on Multimedia, 2018, 20(11): 3111-3222.
[18] ASHISH V, NOAM S, NIKI P, et al. Attention is all you need [J]. Advances in Neural Information Processing Systems, 2017, 30(10): 6000-6010.
[19] GAO H, XIANG Y, SUI Y, et al. Topological graph convolutional network based on complex network characteristics[J]. IEEE Access, 2022, 10: 64465-64472
[20] JIANG W. Graph-based deep learning for communication networks: a survey[J]. Computer Communications, 2022, 185: 40-54
[21] WANG Z, ZHENG L, LI Y , et al. Linkage based face clustering via graph convolution network[C]//Proc of the IEEE/CVF International CONference on Computer Vision. Long Beach: IEEE, 2019: 1117-2225.
[22] CHENG C K, CHAN C S, LIU C L. Total-text: toward orientation robustness in scene text detection [J]. International Journal on Document Analysis and Recognition(IJDAR) , 2020, 23(1): 31-52.
[23] YUAN T L, ZHU Z, XU K, et al. A large chinese text dataset in the wild [J]. Journal of Computer Science and Technology, 2019, 34(3): 509-521.
[24] GUPTA A , VEDALDI A , ZISSERMAN A, et al. Synthetic data for text localisation in natural images[C]//Proc of the IEEE CONFERence on Computer Vision and Pattern Recognition. Las Vegas : IEEE, 2016: 2315-2324.
[1] Lin Hao, Chen Ping-hua. Factor-level Feature and Attribute Preference Joint Learning Based Session Recommendation [J]. Journal of Guangdong University of Technology, 0, (): 5-0.
[2] Li Xue-sen, Tan Bei-hai, Yu Rong, Xue Xian-bin. Small Target Detection Algorithm for Lightweight UAV Aerial Photography Based on YOLOv5 [J]. Journal of Guangdong University of Technology, 2024, 41(03): 71-80.
[3] Zeng Jia-qi, Wu Zhuo-ting, Wu Ze-kai, Yang Zhen-guo, Liu Wen-yin. Perturbation Optimization Network with Randomization for Text-based CAPTCHAs Generation [J]. Journal of Guangdong University of Technology, 2024, 41(03): 81-90.
[4] Li Zhuo-zhang, Xu Bo-yan, Cai Rui-chu, Hao Zhi-feng. Speaker-Aware Cross Attention Speaker Extraction Network [J]. Journal of Guangdong University of Technology, 2024, 41(03): 91-101.
[5] Xiong Rong-sheng, Wang Bang-hai, Yang Xia-ning. Super-resolution Reconstruction of Images Based on Blueprint Separable Residual Distillation Network [J]. Journal of Guangdong University of Technology, 2024, 41(02): 65-72.
[6] Guo Ao, Xu Bo-yan, Cai Rui-chu, Hao Zhi-feng. Temporal Alignment Style Control in Text-to-Speech Synthesis Algorithm [J]. Journal of Guangdong University of Technology, 2024, 41(02): 84-92.
[7] He Sen-bai, Cheng Liang-lun, Huang Guo-heng, Wu Zhi-chao, Ye Song-hang. SR-Det:Towards Robust Detection of Slender and Rotated Objects in Industrial Scene [J]. Journal of Guangdong University of Technology, 2024, 41(02): 93-100.
[8] Tu Ze-liang, Cheng Liang-lun, Huang Guo-Heng. Local Orthogonal Feature Fusion for Few-Shot Image Classification [J]. Journal of Guangdong University of Technology, 2024, 41(02): 73-83.
[9] Chen Rui, Cai Nian, Luo Zhi-hao, Liu Xuan, Li Jian. Individual Survival Analysis of Breast Cancer Based on Multi-task Recurrent Neural Network Banded Regression Model [J]. Journal of Guangdong University of Technology, 2024, 41(01): 34-40.
[10] Yang Zhen-xiong, Tan Tai-zhe. Low Illumination Image Enhancement Algorithm Based on Generative Adversarial Network [J]. Journal of Guangdong University of Technology, 2024, 41(01): 55-62.
[11] Kuang Yong-nian, Wang Feng. Video Frame Anomaly Behavior Detection Based on Foreground Area Generative Adversarial Networks [J]. Journal of Guangdong University of Technology, 2024, 41(01): 63-68,92.
[12] Liu Jin-neng, Xiao Yan-shan, Liu Bo. A Least Squares Twin Support Vector Machine Method with Uncertain Data [J]. Journal of Guangdong University of Technology, 2024, 41(01): 79-85.
[13] Liang Yu-chen, Cai Nian, Ouyang Wen-sheng, Xie Yi-ying, Wang Ping. CT Diagnosis of Chronic Obstructive Pulmonary Disease Based on Slice Correlation Information [J]. Journal of Guangdong University of Technology, 2024, 41(01): 27-33.
[14] Zhang Ling, Li Rong-zhen, Zheng Su. Short Text Feature Extension and Classification Method Based on Semantic Embedding of Tags and Graph Convolution Network [J]. Journal of Guangdong University of Technology, 2024, 41(01): 69-78.
[15] Zeng An, Chen Xu-zhou, Ji Yu-Zhu, Pan Dan, Xu Xiao-Wei. Cardiac Multiclass Segmentation Method Based on Self-attention and 3D Convolution [J]. Journal of Guangdong University of Technology, 2023, 40(06): 168-175.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!