广东工业大学学报 ›› 2016, Vol. 33 ›› Issue (01): 51-56.doi: 10.3969/j.issn.1007-7162.2016.01.010

• 综合研究 • 上一篇    下一篇

基于云计算平台的并行DBSCAN算法

蔡永强,陈平华,李惠   

  1. 广东工业大学 计算机学院,广东 广州 510006
  • 收稿日期:2014-04-02 出版日期:2016-01-16 发布日期:2016-01-16
  • 作者简介:蔡永强(1987-),男,硕士研究生,主要研究方向为web数据挖掘、云计算.
  • 基金资助:

    广东省教育部产学研结合资助项目(2012B091000058);广东省专业镇中小微企业服务平台建设资助项目(2012B040500034)

Parallel DBSCAN Algorithm Based on Cloud Computing Platform

Cai Yong-qiang,  Chen Ping-hua,   Li Hui   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2014-04-02 Online:2016-01-16 Published:2016-01-16

摘要: DBSCAN算法是一种典型的基于密度的聚类算法,具有速度快、可以发现噪声的优点,但在处理大规模数据时出现聚类效率低、内存和I/O消耗大、聚类精度降低的问题,集群式计算机技术特别是云计算技术的发展提供了解决DBSCAN算法缺陷的方案.文中提出了数据预分区的并行PMDBSCAN算法,该算法在聚类之前对数据分区预处理,利用并行编程模型MapReduce实现DBSCAN算法并行化,结合重叠分区思想,减少I/O消耗.实验结果表明,在大规模数据集上,PMDBSCAN算法聚类有效提高了聚类的速度、减少了I/O消耗、改善了聚类的质量.

关键词: 大规模数据库; DBSCAN算法; 重叠分区; 映射/归约

Abstract: As a typical representative of clustering algorithm, DBSCAN algorithm has the advantages of fast speed and helps to find the noise of data. However, in big data processing, there are problems of low clustering efficiency, high memory and I/O requirement, and poor clustering precision. With the support of cluster computer technology especially the development of cloud computing, the solutions to the problems of DBSCAN algorithm mentioned above can be provided and progressed significantly. This paper proposes a parallel PMDBSCAN algorithm based on data partition which can pre-process data partition before clustering, realize parallelization of DBSCAN algorithm by parallel programming model MapReduce, and reduce I/O consumption according to overlapping partition. The results show that in dealing with large-scale data the PMDBSCAN algorithm increases the speed of clustering, reduces I/O consumption and improves cluster quality significantly.

Key words: large-scale database; DBSCAN algorithm; data overlapping partition; MapReduce

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!