Journal of Guangdong University of Technology ›› 2016, Vol. 33 ›› Issue (01): 51-56.doi: 10.3969/j.issn.1007-7162.2016.01.010

• Comprehensive Studies • Previous Articles     Next Articles

Parallel DBSCAN Algorithm Based on Cloud Computing Platform

Cai Yong-qiang,  Chen Ping-hua,   Li Hui   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2014-04-02 Online:2016-01-16 Published:2016-01-16

Abstract: As a typical representative of clustering algorithm, DBSCAN algorithm has the advantages of fast speed and helps to find the noise of data. However, in big data processing, there are problems of low clustering efficiency, high memory and I/O requirement, and poor clustering precision. With the support of cluster computer technology especially the development of cloud computing, the solutions to the problems of DBSCAN algorithm mentioned above can be provided and progressed significantly. This paper proposes a parallel PMDBSCAN algorithm based on data partition which can pre-process data partition before clustering, realize parallelization of DBSCAN algorithm by parallel programming model MapReduce, and reduce I/O consumption according to overlapping partition. The results show that in dealing with large-scale data the PMDBSCAN algorithm increases the speed of clustering, reduces I/O consumption and improves cluster quality significantly.

Key words: large-scale database; DBSCAN algorithm; data overlapping partition; MapReduce

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!