广东工业大学学报 ›› 2023, Vol. 40 ›› Issue (01): 1-9.doi: 10.12052/gdutxb.220055
• • 下一篇
刘冬宁, 王子奇, 曾艳姣, 文福燕, 王洋
Liu Dong-ning, Wang Zi-qi, Zeng Yan-jiao, Wen Fu-yan, Wang Yang
摘要: DNA-N6甲基腺嘌呤 (6-mA) 甲基化修饰是重要的表观遗传修饰标记之一。异常的6-mA位点会影响基因表达,进而引发多种重大疾病,因此预测6-mA位点对理解治病机理和治疗疾病具有重要意义。提出一种基于K-mer方法和One-hot方法复合特征编码的长短期记忆 (Long Short-Term Memory,LSTM) 神经网络用于基因甲基化位点预测,通过K-mer编码方法增加基因序列字符信息量,再使用One-hot编码方法对编码后的字符序列进行扩展,形成复合编码矩阵。改进后的序列编码矩阵可增加LSTM模型从基因序列数据中可提取的特征维度和种类,以提高LSTM模型对基因序列的处理性能。通过交叉验证实验表明本方法在公共数据集上的准确率可达93.7%,敏感度、特异性和马氏相关系数分别为93.0%、94.5%、0.875,均优于现有方法。进一步,在其他6个不同物种的基因数据集上,受试者工作特征曲线线下面积 (Area Under the Curve,AUC) 值介于0.9055~0.9262,表明本方法可适用于动物、植物和微生物的甲基化位点预测。本方法对水稻NC_029258.1基因序列进行全碱基位点的预测,经4种不同的在线工具校验,本方法预测出的86%~96%的潜在甲基化位点在其他工具中也获得相似结论,预测结论可靠,可应用于基因序列甲基化位点的预测分析工作。
中图分类号:
[1] KULIS M, ESTELLER M. DNA methylation and cancer [J]. Advances in Genetics, 2010, 70(10): 27-56. [2] ROBERTSON, KEITH D. DNA methylation and human disease [J]. Nature Reviews Genetics, 2005, 6(8): 597-610. [3] LOPEZ-SERRA P, ESTELLER M. DNA methylation-associated silencing of tumor-suppressor micro-RNAs in cancer [J]. Oncogene, 2012, 31(13): 1609-1622. [4] LYU H, DAO F Y, ZHANG D, et al. Advances in mapping the epigenetic modifications of 5-methylcytosine (5mC), N6-methyladenine (6mA), and N4‐methylcytosine (4mC) [J]. Biotechnology and Bioengineering, 2021, 118(11): 4204-4216. [5] DAY J J, CHILDS D, GUZMAN-KARLSSON M C, et al. DNA methylation regulates associative reward learning [J]. Nature Neuroscience, 2013, 16(10): 1445-1452. [6] YANG X J, LAY D F, HAN H, et al. Targeting DNA methylation for epigenetic therapy [J]. Trends Pharmacol Sci, 2010, 31(11): 536-546. [7] MEISSNER A, MIKKELSEN T S, GU H C. Genome-scale DNA methylation maps of pluripotent and differentiated cells [J]. Nature, 2008, 454: 766-770. [8] LIANG Z, SHEN L S, CUI X A, et al. DNA N6-adenine methylation in arabidopsis thaliana [J]. Developmental Cell, 2018, 45(3): 406-416. [9] LIU M C, OXNARD G R, KLEIN E A, et al. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA [J]. Ann Oncol, 2020, 31: 745-759. [10] CATANIA S, PHILLIP A D, HAROLD P, et al. Evolutionary persistence of DNA methylation for millions of years after ancient loss of a de novo methyltransferase [J]. Cell, 2020, 180(20): 263-277. [11] CHAI P W, YU J, GE S F, et al. Genetic alteration, RNA expression, and DNA methylation profiling of coronavirusdisease 2019 (COVID-19) receptor ACE2 in malignancies: a pan-cancer analysis [J]. Journal of Hematology Oncol, 2020, 13: 1-5. [12] IZZO F, LEE S C, PORAN A, et al. DNA methylation disruption reshapes the hematopoietic differentiation landscape [J]. Nature Genetics, 2020, 52(4): 1-10. [13] JOSÉ A E, MENENDEZ J A. Potential drugs targetingearly innate immune evasion of SARS-coronavirus 2 via 2'-O-methylation of viral RNA [J]. Viruses, 2020, 12(5): 525. [14] YANG J L, LANG K, ZHANG G L, et al. SOMM4mC: a second-order markov model for DNA N4-methylcytosine site prediction in six species [J]. Bioinformatics, 2020, 36(14): 4103-4105. [15] KRAIS A M, CORNELIUS M G, SCHMEISER H H. Genomic N6- methyladenine determination by MEKC with LIF [J]. Electrophoresis, 2010, 31(21): 3548-3551. [16] SMITH Z D, MEISSNER A. DNA methylation: roles in mammalian development [J]. Nature Reviews Genetics, 2013, 14(3): 204-220. [17] LUO G Z, WANG F, WENG X C, et al. Characterization of eukaryotic DNA N6-methyladenine by a highly sensitive restriction enzyme-assisted sequencing [J]. Nature Communications, 2016, 7(1): 1-6. [18] ZHANG G Q, HUANG H, LIU D, et al. N6-methyladenine DNA modification in Drosophila [J]. Cell, 2015, 161: 893-906. [19] FANG G, MUNERA D, FRIEDMAN D I, et al. Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing [J]. Nature Biotechnology, 2012, 30(12): 1232-1239. [20] BHASIN M, ZHANG H, REINHERZ E L, et al. Prediction of methylated CpGs in DNA sequences using a support vector machine [J]. FEBS Letters, 2005, 579(20): 4302-4308. [21] ZHANG Q Y, AIRES-DE-SOUSA J. Random forest prediction of mutagenicity from empirical physicochemical descriptors [J]. Journal of Chemical Information and Modeling, 2007, 47(1): 1-8. [22] FENG P M, CHEN W, LIN H. Prediction of CpG island methylation status by integrating DNA physicochemical propertyes [J]. Genomics, 2014, 104(4): 229-233. [23] YU H, WANG S, LEE X R, et al. Algorithm study of real-time detection of sleep apnea-hypopnea event based on long-short term memory-convolutional neural network [J]. Chinese Journal Biomedical Engineering, 2020, 39(3): 303-310. [24] AMIN R, RAHMAN C R, SHATABDA S, et al. i6mA-CNN: a convolution based computational approach towards identification of DNA N6-methyladenine sites in rice genome [J]. Sci Rep, 2020, 11(1): 10458. [25] WANG Y J, HUANG F L, HUANG S, et al. Breast cancer image classification based on fusion multi-network deep convolution features and sparse double relation regularization method [J]. Chinese Journal Biomedical Engineering, 2020, 39(5): 532-540. [26] TIAN Q, ZOU J X, TANG J X, et al. MRCNN: a deep learning model for regression of genome-wide DNA methylation [J]. BMC Genomics, 2019, 20(2): 1-10. [27] ZENG H Y, GIFFORD D K. Predicting the impact of non-coding variants on DNA methylation [J]. Nucleic Acids Research, 2017(11): 11. [28] ANGERMUELLER C, LEE H J, REIK W, et al. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning [J]. Genome Biol, 2017, 18(1): 1-13. [29] HASAN M M, BASITH S, SHAMIMA K M, et al. Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework [J]. Brief Bioinform, 2020, 22(3): bbaa202. [30] HASAN M M, MANAVALAN B, SHOOMBUATONG W, et al. i6mA-Fuse: improved and robust prediction of DNA 6mA sites in the Rosaceae genome by fusing multiple feature representation [J]. Plant Molecular Biology, 2020, 103(1): 225-234. [31] CONG P, ZHANG G G, LI F, et al. MM-6mAPred: identifying DNA N6-methyladenine sites based on Markov model [J]. Bioinformatics, 2019, 36(2): 388-392. [32] BASITH S, MANAVALAN B, SHIN T H, et al. SDM6A: a web-based integrative machine-learning framework for predicting 6mA sites in the rice genome [J]. Molecular Therapy-Nucleic Acids, 2019, 18: 131-141. [33] LYU H, DAO F Y, GUAN Z X, et al. iDNA6mA-Rice: a computational tool for detecting N6-methyladenine sites-in rice [J]. Frontiers in Genetics, 2019(10): 793. [34] XUH D, HUR F, JIAP L, et al. 6mA-Finder: a novelonline tool for predicting DNA N6-methyladenine sites in genomes [J]. Bioinformatics, 2020, 36(10): 3257-3259. [35] CHEN W, LYU H, NIE F L, et al. i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome [J]. Bioinformatics, 2019, 35(11): 2796-2800. [36] CHENG M, SHU X, CAO J, et al. A mutation-based method for pinpointing a DNA N6-methyladenine methyltransferase's modification site at single base resolution [J]. Chem Bio Chem, 2021, 22(11): 1936-1939. [37] LEE H K, BARBAROSIE M, KAMEYAMA K, et al. Regulation of distinct AMPA receptor phosphorylation sites during bidirectional synaptic plasticity [J]. Nature, 2000, 405(6789): 955-978. [38] XUE Y, ZHOU F F, ZHU M J, et al. GPS: a comprehensive www server for phosphorylation sites prediction [J]. Nucleic Acids Research, 2005, 33: 184-187. [39] KIM J H, LEE J, OH B, et al. Prediction of phosphorylation sites using SVMs [J]. Bioinformatics, 2004, 20(17) : 3179-3184. [40] ZHU Q L, LI X L, CONESA A, et al. GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text [J]. Bioinformatics, 2017, 34(9) : 1547-1554. |
[1] | 吴俊贤, 何元烈. 基于通道注意力的自监督深度估计方法[J]. 广东工业大学学报, 2023, 40(02): 22-29. |
[2] | 徐伟锋, 蔡述庭, 熊晓明. 基于深度特征的单目视觉惯导里程计[J]. 广东工业大学学报, 2023, 40(01): 56-60,76. |
[3] | 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9. |
[4] | 章云, 王晓东. 基于受限样本的深度学习综述与思考[J]. 广东工业大学学报, 2022, 39(05): 1-8. |
[5] | 郑佳碧, 杨振国, 刘文印. 基于细粒度混杂平衡的营销效果评估方法[J]. 广东工业大学学报, 2022, 39(02): 55-61. |
[6] | Gary Yen, 栗波, 谢胜利. 地球流体动力学模型恢复的长短期记忆网络渐进优化方法[J]. 广东工业大学学报, 2021, 38(06): 1-8. |
[7] | 赖峻, 刘震宇, 刘圣海. 基于全局数据混洗的小样本数据预测方法[J]. 广东工业大学学报, 2021, 38(03): 17-21. |
[8] | 岑仕杰, 何元烈, 陈小聪. 结合注意力与无监督深度学习的单目深度估计[J]. 广东工业大学学报, 2020, 37(04): 35-41. |
[9] | 曾碧, 任万灵, 陈云华. 基于CycleGAN的非配对人脸图片光照归一化方法[J]. 广东工业大学学报, 2018, 35(05): 11-19. |
[10] | 陈旭, 张军, 陈文伟, 李硕豪. 卷积网络深度学习算法与实例[J]. 广东工业大学学报, 2017, 34(06): 20-26. |
[11] | 刘震宇, 李嘉俊, 王昆. 一种基于深度自编码器的指纹匹配定位方法[J]. 广东工业大学学报, 2017, 34(05): 15-21. |
|