广东工业大学学报

• •    

基于自适应图正则化低秩表示的scRNA-seq数据分析方法

冯思凡, 王振友, 金应华   

  1. 广东工业大学 数学与统计学院, 广东 广州 510520
  • 收稿日期:2024-01-29 出版日期:2024-09-27 发布日期:2024-09-27
  • 通信作者: 金应华(1982–),男,副教授,博士,主要研究方向为回归模型、聚类分析和深度学习等,E-mail:jyh@mail.ustc.edu.cn
  • 作者简介:冯思凡(1996–) ,女,硕士研究生,主要研究方向为生物统计,E-mail:649232944@qq.com
  • 基金资助:
    广东省自然科学基金资助项目(2023A1515012891)

ScRNA-seq Data Analysis Based on Adaptive Graph Regularization and Low-rank Representation

Feng Si-fan, Wang Zhen-you, Jin Ying-hua   

  1. School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou 510520, China
  • Received:2024-01-29 Online:2024-09-27 Published:2024-09-27

摘要: 单细胞RNA测序(scRNA-seq) 技术可以用于研究单个细胞的基因表达情况,生成大量的单细胞基因表达数据。这类数据通常具有高维度和复杂的结构,需要进行降维和聚类分析来揭示细胞类型和状态之间的差异。本文提出了一种基于自适应图正则化低秩表示的scRNA-seq数据分析方法——scLRRAGR。该方法可以充分利用scRNA-seq数据的全局和局部信息进行图学习,并通过自适应图正则化和引入秩约束来捕捉细胞之间的相似性和相互作用,更好地反映细胞之间的聚类结构,帮助揭示不同细胞类型和状态之间的差异。在应用该方法时,可以将scRNA-seq数据转换为一个图结构,其中每个节点表示一个单细胞样本,边表示细胞之间的相似性或相互作用。然后使用该方法对图进行学习和优化,得到最优的图表示。最后,可以使用聚类算法将学习到的图表示应用于细胞类型和状态的识别。实验结果表明,本文方法应用在scRNA-seq数据集上能够显著提高聚类性能。

关键词: scRNA-seq数据, 细胞聚类, 图正则化, 低秩表示, 秩约束

Abstract: Single-cell RNA sequencing (scRNA-seq) can be used to study the gene expression of single cell and generate a large amount of single-cell gene expression data. This type of data generally has high-dimensional and complex structures, requiring dimension reduction and clustering analysis to reveal differences between cell types and states. A new scRNA-seq data analysis method (scLRRAGR) is proposed based on adaptive graph regularization low-rank representation. This method can fully utilize the global and local information of scRNA-seq data for graph learning, and capture the similarity and interaction between cells by adaptive graph regularization and the introduction of rank constraint. Its outcome can better reflect the clustering structure between cells and help to reveal differences between different cell types and states. When applying this method, scRNA-seq data can be transformed into a graph structure with each node representing a single-cell sample and edges representing similarities or interactions between cells. Then this method is used to learn and optimize this graph to obtain the optimal graph representation. Finally, typical clustering algorithms can use the optimal graph representation to recognize cell types and states. The experiment results show that the proposed method can significantly improve clustering performance on scRNA-seq datasets.

Key words: scRNA-seq data, cell clustering, graph regularization, low-rank representation, rank constraint

中图分类号: 

  • Q332
[1] BUETTNER F, NATARAJAN K N, CASALE F P, et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells [J]. Nature Biotechnology, 2015, 33(2): 155-160.
[2] LUECKEN M D, THEIS F J. Current best practices in single-cell RNA-seq analysis: a tutorial [J]. Molecular Systems Biology, 2019, 15(6): e8746.
[3] KUMAR P, TAN Y, CAHAN P. Understanding development and stem cells using single cell-based analyses of gene expression [J]. Development, 2017, 144(1): 17-32.
[4] LLORENS-BOBADILLA E, ZHAO S, BASER A, et al. Single-cell transcriptomics reveals a population of dormant neural stem cells that become activated upon brain injury [J]. Cell Stem Cell, 2015, 17(3): 329-340.
[5] SHALEK A K, SATIJA R, ADICONIS X, et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells [J]. Nature, 2013, 498(7453): 236-240.
[6] NAVIN N E. Delineating cancer evolution with single-cell sequencing [J]. Science Translational Medicine, 2015, 7(296): 296fs29.
[7] KOWALCZYK M S, TIROSH I, HECKL D, et al. Single-cell RNA-seq reveals changes in cell cycle and differentiation programs upon aging of hematopoietic stem cells [J]. Genome Research, 2015, 25(12): 1860-1872.
[8] SANDBERG R. Entering the era of single-cell transcriptomics in biology and medicine [J]. Nature Methods, 2014, 11(1): 22-24.
[9] NAVIN N E. Cancer genomics: one cell at a time [J]. Genome Biology, 2014, 15(8): 1-13.
[10] ADAM M, POTTER A S, POTTER S S. Psychrophilic proteases dramatically reduce single-cell RNA-seq artifacts: a molecular atlas of kidney development [J]. Development, 2017, 144(19): 3625-3632.
[11] SVENSSON V, NATARAJAN K N, LY L H, et al. Power analysis of single-cell RNA-sequencing experiments [J]. Nature Methods, 2017, 14(4): 381-387.
[12] GRÜN D, LYUBIMOVA A, KESTER L, et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types [J]. Nature, 2015, 525(7568): 251-255.
[13] STREHL A, GHOSH J. Cluster ensembles—a knowledge reuse framework for combining multiple partitions [J]. Journal of Machine Learning Research, 2002, 3: 583-617.
[14] KISELEV V Y, KIRSCHNER K, SCHAUB M T, et al. SC3: consensus clustering of single-cell RNA-seq data [J]. Nature Methods, 2017, 14(5): 483-486.
[15] ŽURAUSKIENė J, YAU C. pcaReduce: hierarchical clustering of single cell transcriptional profiles [J]. BMC Bioinformatics, 2016, 17: 1-11.
[16] LIN C, JAIN S, KIM H, et al. Using neural networks for reducing the dimensions of single-cell RNA-Seq data [J]. Nucleic Acids Research, 2017, 45(17): e156.
[17] DEMUTH H B, BEALE M H, HAGAN M T, et al. Neural network design[M]. Boulder: University of Colorado, 2014.
[18] WANG B, ZHU J, PIERSON E, et al. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning [J]. Nature Methods, 2017, 14(4): 414-416.
[19] LANCKRIET G R G, DE BIE T, CRISTIANINI N, et al. A statistical framework for genomic data fusion [J]. Bioinformatics, 2004, 20(16): 2626-2635.
[20] POUYAN M B, KOSTKA D. Random forest based similarity learning for single cell RNA sequencing data [J]. Bioinformatics, 2018, 34(13): 79-88.
[21] WEN J, FANG X, XU Y, et al. Low-rank representation with adaptive graph regularization [J]. Neural Networks, 2018, 108: 83-96.
[22] NG A Y, JORDAN M I, WEISS Y. On spectral clustering: Analysis and an algorithm. [C]//Advances in Neural Information Processing Systems (NeurIPS) . Vancouver: MIT Press, 2001.
[23] YIN M, GAO J, LIN Z. Laplacian regularized low-rank representation and its applications [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(3): 504-517.
[24] LIU J, CHEN Y, ZHANG J, et al. Enhancing low-rank subspace clustering by manifold regularization [J]. IEEE Transactions on Image Processing, 2014, 23(9): 4022-4030.
[25] DU S, MA Y, MA Y. Graph regularized compact low rank representation for subspace clustering [J]. Knowledge-Based Systems, 2017, 118: 56-69.
[26] NIE F, WANG X, JORDAN M, et al. The constrained laplacian rank algorithm for graph-based clustering[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix: AAAI, 2016.
[27] AHLMANN-ELTZE C, HUBER W. Comparison of transformations for single-cell RNA-seq data [J]. Nature Methods, 2023, 20: 665-672.
[28] DORFMAN R A. A note on the delta-method for finding variance formulae [J]. Biometric Bulletin, 1938, 1: 129-138.
[29] TOWNES F W, HICKS S C, ARYEE M J, et al. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model [J]. Genome Biology, 2019, 20: 1-16.
[30] ROBINSON M D R, ROBINSON M D, SONREL A S, et al. PipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools [J]. Genome Biology, 2020, 21(1): 227.
[31] ELHAMIFAR E, VIDAL R. Sparse subspace clustering: Algorithm, theory, and applications [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2765-2781.
[32] LIU G, LIN Z, YAN S, et al. Robust recovery of subspace structures by low-rank representation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(1): 171-184.
[33] VON LUXBURG U. A tutorial on spectral clustering [J]. Statistics and Computing, 2007, 17: 395-416.
[34] FAN K. On a theorem of Weyl concerning eigenvalues of linear transformations I [J]. Proceedings of the National Academy of Sciences, 1949, 35(11): 652-655.
[35] WEN J, XU Y, LI Z, et al. Inter-class sparsity based discriminative least square regression [J]. Neural Networks, 2018, 102: 36-47.
[36] LIU G, YAN S. Latent low-rank representation for subspace segmentation and feature extraction[C]// 2011 International Conference on Computer Vision. Barcelona: IEEE. 2011.
[37] LIN Z, CHEN M, MA Y. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices[EB/OL]. arXiv: 1009. 5055(2010-09-26)[2023-12-15]. https://arxiv.org/abs/1009.5055.
[38] HUANG J, NIE F, HUANG H. A new simplex sparse learning model to measure data similarity for clustering[C]//Twenty-fourth International Joint Conference on Artificial Intelligence. Buenos Aires: AAAI, 2015.
[39] JAIN A K, DUBES R C. Algorithms for clustering data[M]. Prentice Hall, 1988.
[40] GUO M, WANG H, POTTER S S, et al. SINCERA: a pipeline for single-cell RNA-Seq profiling analysis [J]. PLoS Computational Biology, 2015, 11(11): e1004575.
[1] 滕少华, 冯镇业, 滕璐瑶, 房小兆. 联合低秩表示与图嵌入的无监督特征选择[J]. 广东工业大学学报, 2019, 36(05): 7-13.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!