广东工业大学学报 ›› 2022, Vol. 39 ›› Issue (01): 50-55.doi: 10.12052/gdutxb.210043

• 综合研究 • 上一篇    下一篇

基于路径适配的大规模RDF数据子图匹配算法

胡新苗1, 林穗1, 姜文超1,2, 熊梦2, 贺忠堂2   

  1. 1. 广东工业大学 计算机学院,广东 广州 510006;
    2. 中国科学院 云计算产业技术创新与育成中心,广东 东莞 523808
  • 收稿日期:2021-03-11 发布日期:2022-01-20
  • 通信作者: 林穗(1972-),女,副教授,主要研究方向为复杂网络、云计算和推荐系统,E-mail:linsui@gdut.edu.cn
  • 作者简介:胡新苗(1991-),女,硕士研究生,主要研究方向为大数据、复杂网络和推荐系统
  • 基金资助:
    国家重点研发计划项目(2018YFB1003603);广东省自然科学基金资助项目(2018A030313061);广东省科技计划项目(2019B010139001);广州市科技计划项目(201902020016)

A Path Adaptation-based Subgraph Matching Algorithm for Large-scale RDF Graph Data

Hu Xin-miao1, Lin Sui1, Jiang Wen-chao1,2, Xiong Meng2, He Zhong-tang2   

  1. 1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China;
    2. Cloud Computing Center, Chinese Academy of Sciences, Dongguan 523808, China
  • Received:2021-03-11 Published:2022-01-20

摘要: 子图查询与匹配是社会网络分析和大规模网络图知识发现中的核心技术, 也是决定大规模社会网络分析和知识发现准确性的关键。针对当前大规模图数据环境下子图查询算法准确率低、开销大的问题, 提出基于路径适配的子图匹配算法。首先基于路径建立图数据的RDF(Resource Description Framework, 资源描述框架)索引; 然后将查询子图分解为一组路径, 在分解过程中为每条路径获得一组候选匹配路径; 最后通过k-partition交集图将候选路径连接在一起, 从而构建出查询图的结果子图。实验测试了在不同数据集上的路径索引构建时间以及F-measure值, 与Spath(Shortest Path, 最短路径)算法、Sapper算法和SQM(Subgraph Query Matching, 子图查询匹配)算法相比, 在处理大规模数据时, 该算法的查询准确率提高了15%。

关键词: 社会网络, 知识发现, 大规模数据图, 子图查询, 子图匹配

Abstract: Subgraph pattern matching is the core technology in social network analysis and knowledge discovery, and also the key to determine the accuracy of social network analysis and knowledge discovery. In view of the low accuracy and high cost of the subgraph query algorithm based on backtracking in large-scale data graph environment, a subgraph matching algorithm based on path adaptation is proposed. First, an index based on the path RDF (Resource Description Framework) graph is built. Then, the query graph is decomposed into a set of paths, for each path in the process of decomposition to obtain a set of candidate paths. Finally, the candidate paths are concatenated together through the k-partition intersection graph to build the result subgraph of the query graph. Compared with Spath (Shortest Path) algorithm, Sapper algorithm and SQM (Subgraph Query Matching) algorithm, this algorithm improves the query accuracy by 15% when dealing with large-scale data.

Key words: social network, knowledge discovery, large-scale data graphs, subgraph query, subgraph matching

中图分类号: 

  • TP391
[1] 关皓元, 朱斌, 李冠宇, 等. 基于资源描述框架图切分与顶点选择性的高效子图匹配算法[J]. 计算机应用, 2019, 39(2): 360-369.
GUAN H Y, ZHU B, LI G Y, et al. Efficient subgraph matching method based on resource description framework graph segmentation and vertex selectivity [J]. Journal of Computer Applications, 2019, 39(2): 360-369.
[2] YU J, LIU X M, LIU Y B, et al. Multiple pattern graph correlations for efficient graph pattern matching[C]//2017 IEEE/ACS 14th International Conference on Computer Systems and Applications. Hammamet: IEEE, 2017: 469-474.
[3] 关皓元, 朱斌, 李冠宇, 等. 基于RDF图结构切分的高效子图匹配方法[J]. 计算机应用, 2018, 38(7): 1898-1904.
GUAN H Y, ZHU B, LI G Y, et al. Efficient subgraph matching method based on structure segmentation of RDF graph [J]. Journal of Computer Applications, 2018, 38(7): 1898-1904.
[4] MA Z M, CAPRETZ M A M, YAN L. Storing massive resource description framework(RDF) data: a survey [J]. Knowledge Engineering Review, 2016, 31(4): 391-413.
[5] 罗国辉, 林穗, 姜文超. 一种基于二进制编码处理的数字保序匹配算法[J]. 广东工业大学学报, 2017, 34(5): 56-59.
LUO G H, LIN S, JIANG W C. A digital order-preserving-matching algorithm based on binary coding processing [J]. Journal of Guangdong University of Technology, 2017, 34(5): 56-59.
[6] 于静, 刘燕兵, 张宇, 等. 大规模图数据匹配技术综述[J]. 计算机研究与发展, 2015, 52(2): 391-409.
YU J, LIU Y B, ZHANG Y, et al. Survey on large-scale graph pattern matching [J]. Journal of Computer Research and Development, 2015, 52(2): 391-409.
[7] 许文, 宋文爱, 富丽贞, 等. 面向大规模图数据的分布式子图匹配算法[J]. 计算机科学, 2019, 46(4): 28-35.
XU W, SONG W A, FU L Z, et al. Distributed subgraph matching algorithm for large scale graph data [J]. Computer Science, 2019, 46(4): 28-35.
[8] 李龙洋, 董一鸿, 施炜杰, 等. SQM: 基于Spark的大规模单图上的子图匹配算法[J]. 计算机应用, 2019, 39(1): 46-50.
LI L Y, DONG Y H, SHI W J, et al. SQM: subgraph matching algorithm for single large-scale graphs under Spark [J]. Journal of Computer Applications, 2019, 39(1): 46-50.
[9] ULLMANN J R. An algorithm for subgraph isomorphism [J]. Journal of the ACM, 1976, 23(1): 31-42.
[10] CORDELLA L P, FOGGIA P. A (sub)graph isomorpism algorithm for matching large graphs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(10): 1367-1372.
[11] HE H, SINGH A K. Graphs-at-a-time: query language and access methods for graph data bases[C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2008: 405-418.
[12] ZHANG S, LI S, YANG J. GADDI: distance index based subgraph matching in biological networks[C]//Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. New York: ACM, 2009: 192-203.
[13] 张硕, 李建中, 高宏, 等. 一种多到一子图同构检测方法[J]. 软件学报, 2010, 21(3): 401-414.
ZHANG S, LI J Z, GAO H, et al. Approach for efficient subgraph isomorphism testing for multiple graphs [J]. Journal of Software, 2010, 21(3): 401-414.
[14] HAN W S, LEE J, LEE J H. TurboISO: towards ultrafast and roubst subgraph isomorphism search in large graph databases[C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2013: 337-348.
[15] 孙云浩, 李逢雨, 李冠宇, 等. 面向RDF图的多模式匹配方法[J]. 计算机工程与应用, 2020, 56(13): 84-92.
SUN Y H, LI F Y, LI G Y, et al. Multi-pattern matching method for RDF graph [J]. Computer Engineer and Applications, 2020, 56(13): 84-92.
[16] LI G F, YAN L, MA Z M. An approach for approximate subgraph matching in fuzzy RDF graph [J]. Fuzzy Sets and Systems, 2019, 376: 106-126.
[1] 杜松华, 罗子婵, 陈扬森. 基于社会网络分析与反事实方法的价值共创研究——以小米虚拟社区为例[J]. 广东工业大学学报, 2020, 37(02): 11-21.
[2] 杨春燕, 李志明. 基于可拓学的社会网络结构研究[J]. 广东工业大学学报, 2014, 31(1): 1-6.
[3] 张立厚; 郑大庆; 高京广; . 可拓学在Web搜索与Web文本挖掘中的应用[J]. 广东工业大学学报, 2004, 21(2): 1-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!