Journal of Guangdong University of Technology ›› 2016, Vol. 33 ›› Issue (01): 51-56.doi: 10.3969/j.issn.1007-7162.2016.01.010

• Comprehensive Studies • Previous Articles     Next Articles

Parallel DBSCAN Algorithm Based on Cloud Computing Platform

Cai Yong-qiang,  Chen Ping-hua,   Li Hui   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2014-04-02 Online:2016-01-16 Published:2016-01-16

Abstract: As a typical representative of clustering algorithm, DBSCAN algorithm has the advantages of fast speed and helps to find the noise of data. However, in big data processing, there are problems of low clustering efficiency, high memory and I/O requirement, and poor clustering precision. With the support of cluster computer technology especially the development of cloud computing, the solutions to the problems of DBSCAN algorithm mentioned above can be provided and progressed significantly. This paper proposes a parallel PMDBSCAN algorithm based on data partition which can pre-process data partition before clustering, realize parallelization of DBSCAN algorithm by parallel programming model MapReduce, and reduce I/O consumption according to overlapping partition. The results show that in dealing with large-scale data the PMDBSCAN algorithm increases the speed of clustering, reduces I/O consumption and improves cluster quality significantly.

Key words: large-scale database; DBSCAN algorithm; data overlapping partition; MapReduce

No related articles found!
Viewed
Full text
2998
HTML PDF
Just accepted Online first Issue Just accepted Online first Issue
0 0 0 310 0 2688

  From Others local
  Times 455 2543
  Rate 15% 85%

Abstract
233
Just accepted Online first Issue
119 0 114
  From Others local
  Times 88 145
  Rate 38% 62%

Cited

Web of Science  Crossref   ScienceDirect  Search for Citations in Google Scholar >>
 
This page requires you have already subscribed to WoS.
  Shared   
  Discussed   
No Suggested Reading articles found!