Journal of Guangdong University of Technology ›› 2023, Vol. 40 ›› Issue (01): 1-9.doi: 10.12052/gdutxb.220055

    Next Articles

Prediction Method of Gene Methylation Sites Based on LSTM with Compound Coding Characteristics

Liu Dong-ning, Wang Zi-qi, Zeng Yan-jiao, Wen Fu-yan, Wang Yang   

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2022-03-23 Online:2023-01-25 Published:2023-01-12

Abstract: DNA-N6 methyladenine (6-mA) methylation modification is one of the most important epigenetic modification markers. The aberrant 6-mA modification can affect gene expression and lead to serious diseases. Therefore, the work of predicting the 6-mA site is of great significance for the understanding of the pathogenesis and treatment of diseases. In this paper, a long short-term memory (LSTM) neural network based on K-mer encoding method and one hot encoding method is proposed to predict methylation sites.Firstly, the information content of gene sequence is increased through K-mer coding method. Secondly, the information content after one hot encoding is converted into a composite encoding matrix. The LSTM model can extract more feature dimensions and types from the encoding matrix, to improve the prediction performance of the LSTM model for gene sequence. The cross validation experiment show that the proposed method can achieve accuracy of 93.7% on benchmark datasets. The sensitivity, specificity and matthews correlation coefficient of the trained model were 93.0%, 94.5% and 0.875, which outperformed existing 6-mA prediction methods. On the other six different species datasets, the proposed method can achieve the area under the curve (AUC) values from 0.9055 to 0.9262,which shows the applicability of the proposed method on animals, plants and microorganisms methylation tasks. The proposed method was applied on rice gene NC_ 029258.1, and the predictions were verified by the recently published online prediction tools. The results show that 86% to 96% of the prediction results are supported by these tools, indicating that the proposed method can be applied to large-scale site prediction and analysis of different species.

Key words: methylation site prediction, deep learning, long short-term memory network, compound features

CLC Number: 

  • TP301.6
[1] KULIS M, ESTELLER M. DNA methylation and cancer [J]. Advances in Genetics, 2010, 70(10): 27-56.
[2] ROBERTSON, KEITH D. DNA methylation and human disease [J]. Nature Reviews Genetics, 2005, 6(8): 597-610.
[3] LOPEZ-SERRA P, ESTELLER M. DNA methylation-associated silencing of tumor-suppressor micro-RNAs in cancer [J]. Oncogene, 2012, 31(13): 1609-1622.
[4] LYU H, DAO F Y, ZHANG D, et al. Advances in mapping the epigenetic modifications of 5-methylcytosine (5mC), N6-methyladenine (6mA), and N4‐methylcytosine (4mC) [J]. Biotechnology and Bioengineering, 2021, 118(11): 4204-4216.
[5] DAY J J, CHILDS D, GUZMAN-KARLSSON M C, et al. DNA methylation regulates associative reward learning [J]. Nature Neuroscience, 2013, 16(10): 1445-1452.
[6] YANG X J, LAY D F, HAN H, et al. Targeting DNA methylation for epigenetic therapy [J]. Trends Pharmacol Sci, 2010, 31(11): 536-546.
[7] MEISSNER A, MIKKELSEN T S, GU H C. Genome-scale DNA methylation maps of pluripotent and differentiated cells [J]. Nature, 2008, 454: 766-770.
[8] LIANG Z, SHEN L S, CUI X A, et al. DNA N6-adenine methylation in arabidopsis thaliana [J]. Developmental Cell, 2018, 45(3): 406-416.
[9] LIU M C, OXNARD G R, KLEIN E A, et al. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA [J]. Ann Oncol, 2020, 31: 745-759.
[10] CATANIA S, PHILLIP A D, HAROLD P, et al. Evolutionary persistence of DNA methylation for millions of years after ancient loss of a de novo methyltransferase [J]. Cell, 2020, 180(20): 263-277.
[11] CHAI P W, YU J, GE S F, et al. Genetic alteration, RNA expression, and DNA methylation profiling of coronavirusdisease 2019 (COVID-19) receptor ACE2 in malignancies: a pan-cancer analysis [J]. Journal of Hematology Oncol, 2020, 13: 1-5.
[12] IZZO F, LEE S C, PORAN A, et al. DNA methylation disruption reshapes the hematopoietic differentiation landscape [J]. Nature Genetics, 2020, 52(4): 1-10.
[13] JOSÉ A E, MENENDEZ J A. Potential drugs targetingearly innate immune evasion of SARS-coronavirus 2 via 2'-O-methylation of viral RNA [J]. Viruses, 2020, 12(5): 525.
[14] YANG J L, LANG K, ZHANG G L, et al. SOMM4mC: a second-order markov model for DNA N4-methylcytosine site prediction in six species [J]. Bioinformatics, 2020, 36(14): 4103-4105.
[15] KRAIS A M, CORNELIUS M G, SCHMEISER H H. Genomic N6- methyladenine determination by MEKC with LIF [J]. Electrophoresis, 2010, 31(21): 3548-3551.
[16] SMITH Z D, MEISSNER A. DNA methylation: roles in mammalian development [J]. Nature Reviews Genetics, 2013, 14(3): 204-220.
[17] LUO G Z, WANG F, WENG X C, et al. Characterization of eukaryotic DNA N6-methyladenine by a highly sensitive restriction enzyme-assisted sequencing [J]. Nature Communications, 2016, 7(1): 1-6.
[18] ZHANG G Q, HUANG H, LIU D, et al. N6-methyladenine DNA modification in Drosophila [J]. Cell, 2015, 161: 893-906.
[19] FANG G, MUNERA D, FRIEDMAN D I, et al. Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing [J]. Nature Biotechnology, 2012, 30(12): 1232-1239.
[20] BHASIN M, ZHANG H, REINHERZ E L, et al. Prediction of methylated CpGs in DNA sequences using a support vector machine [J]. FEBS Letters, 2005, 579(20): 4302-4308.
[21] ZHANG Q Y, AIRES-DE-SOUSA J. Random forest prediction of mutagenicity from empirical physicochemical descriptors [J]. Journal of Chemical Information and Modeling, 2007, 47(1): 1-8.
[22] FENG P M, CHEN W, LIN H. Prediction of CpG island methylation status by integrating DNA physicochemical propertyes [J]. Genomics, 2014, 104(4): 229-233.
[23] YU H, WANG S, LEE X R, et al. Algorithm study of real-time detection of sleep apnea-hypopnea event based on long-short term memory-convolutional neural network [J]. Chinese Journal Biomedical Engineering, 2020, 39(3): 303-310.
[24] AMIN R, RAHMAN C R, SHATABDA S, et al. i6mA-CNN: a convolution based computational approach towards identification of DNA N6-methyladenine sites in rice genome [J]. Sci Rep, 2020, 11(1): 10458.
[25] WANG Y J, HUANG F L, HUANG S, et al. Breast cancer image classification based on fusion multi-network deep convolution features and sparse double relation regularization method [J]. Chinese Journal Biomedical Engineering, 2020, 39(5): 532-540.
[26] TIAN Q, ZOU J X, TANG J X, et al. MRCNN: a deep learning model for regression of genome-wide DNA methylation [J]. BMC Genomics, 2019, 20(2): 1-10.
[27] ZENG H Y, GIFFORD D K. Predicting the impact of non-coding variants on DNA methylation [J]. Nucleic Acids Research, 2017(11): 11.
[28] ANGERMUELLER C, LEE H J, REIK W, et al. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning [J]. Genome Biol, 2017, 18(1): 1-13.
[29] HASAN M M, BASITH S, SHAMIMA K M, et al. Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework [J]. Brief Bioinform, 2020, 22(3): bbaa202.
[30] HASAN M M, MANAVALAN B, SHOOMBUATONG W, et al. i6mA-Fuse: improved and robust prediction of DNA 6mA sites in the Rosaceae genome by fusing multiple feature representation [J]. Plant Molecular Biology, 2020, 103(1): 225-234.
[31] CONG P, ZHANG G G, LI F, et al. MM-6mAPred: identifying DNA N6-methyladenine sites based on Markov model [J]. Bioinformatics, 2019, 36(2): 388-392.
[32] BASITH S, MANAVALAN B, SHIN T H, et al. SDM6A: a web-based integrative machine-learning framework for predicting 6mA sites in the rice genome [J]. Molecular Therapy-Nucleic Acids, 2019, 18: 131-141.
[33] LYU H, DAO F Y, GUAN Z X, et al. iDNA6mA-Rice: a computational tool for detecting N6-methyladenine sites-in rice [J]. Frontiers in Genetics, 2019(10): 793.
[34] XUH D, HUR F, JIAP L, et al. 6mA-Finder: a novelonline tool for predicting DNA N6-methyladenine sites in genomes [J]. Bioinformatics, 2020, 36(10): 3257-3259.
[35] CHEN W, LYU H, NIE F L, et al. i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome [J]. Bioinformatics, 2019, 35(11): 2796-2800.
[36] CHENG M, SHU X, CAO J, et al. A mutation-based method for pinpointing a DNA N6-methyladenine methyltransferase's modification site at single base resolution [J]. Chem Bio Chem, 2021, 22(11): 1936-1939.
[37] LEE H K, BARBAROSIE M, KAMEYAMA K, et al. Regulation of distinct AMPA receptor phosphorylation sites during bidirectional synaptic plasticity [J]. Nature, 2000, 405(6789): 955-978.
[38] XUE Y, ZHOU F F, ZHU M J, et al. GPS: a comprehensive www server for phosphorylation sites prediction [J]. Nucleic Acids Research, 2005, 33: 184-187.
[39] KIM J H, LEE J, OH B, et al. Prediction of phosphorylation sites using SVMs [J]. Bioinformatics, 2004, 20(17) : 3179-3184.
[40] ZHU Q L, LI X L, CONESA A, et al. GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text [J]. Bioinformatics, 2017, 34(9) : 1547-1554.
[1] Xu Wei-feng, Cai Shu-ting, Xiong Xiao-ming. Visual Inertial Odometry Based on Deep Features [J]. Journal of Guangdong University of Technology, 2023, 40(01): 56-60,76.
[2] Liu Hong-wei, Lin Wei-zhen, Wen Zhan-ming, Chen Yan-jun, Yi Min-qi. A MABM-based Model for Identifying Consumers' Sentiment Polarity―Taking Movie Reviews as an Example [J]. Journal of Guangdong University of Technology, 2022, 39(06): 1-9.
[3] Zhang Yun, Wang Xiao-dong. A Review and Thinking of Deep Learning with a Restricted Number of Samples [J]. Journal of Guangdong University of Technology, 2022, 39(05): 1-8.
[4] Zheng Jia-bi, Yang Zhen-guo, Liu Wen-yin. Marketing-Effect Estimation Based on Fine-grained Confounder Balancing [J]. Journal of Guangdong University of Technology, 2022, 39(02): 55-61.
[5] Gary Yen, Li Bo, Xie Sheng-li. An Evolutionary Optimization of LSTM for Model Recovery of Geophysical Fluid Dynamics [J]. Journal of Guangdong University of Technology, 2021, 38(06): 1-8.
[6] Lai Jun, Liu Zhen-yu, Liu Sheng-hai. A Small Sample Data Prediction Method Based on Global Data Shuffling [J]. Journal of Guangdong University of Technology, 2021, 38(03): 17-21.
[7] Cen Shi-jie, He Yuan-lie, Chen Xiao-cong. A Monocular Depth Estimation Combined with Attention and Unsupervised Deep Learning [J]. Journal of Guangdong University of Technology, 2020, 37(04): 35-41.
[8] Zeng Bi, Ren Wan-ling, Chen Yun-hua. An Unpaired Face Illumination Normalization Method Based on CycleGAN [J]. Journal of Guangdong University of Technology, 2018, 35(05): 11-19.
[9] Yang Meng-jun, Su Cheng-yue, Chen Jing, Zhang Jie-xin. Loop Closure Detection for Visual SLAM Using Convolutional Neural Networks [J]. Journal of Guangdong University of Technology, 2018, 35(05): 31-37.
[10] Chen Xu, Zhang Jun, Chen Wen-wei, Li Shuo-hao. Convolutional Neural Network Algorithm and Case [J]. Journal of Guangdong University of Technology, 2017, 34(06): 20-26.
[11] Liu Zhen-yu, Li Jia-jun, Wang Kun. A Fingerprint Matching Localization Method Based on Deep Auto Encoder [J]. Journal of Guangdong University of Technology, 2017, 34(05): 15-21.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!