Journal of Guangdong University of Technology ›› 2017, Vol. 34 ›› Issue (03): 49-53.doi: 10.12052/gdutxb.170011

Previous Articles     Next Articles

An Improved mpts-HDBSCAN Algorithm

Wang Rong-rong, Fu Xiu-fen   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2017-01-15 Online:2017-05-09 Published:2017-05-09

Abstract:

Cluster analysis is an important branch of non-supervised model classification, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is one of the most common algorithms in density-based clustering methods. It's widely researched and applied in many fields as it can find clusters of arbitrary shapes with noises. Some shortcomings of DBSCAN and also recently improved algorithms based on DBSCAN are focused on. A new data partitioning method is proposed to solve the problem that mpts-HDBSCAN clustering quality will degrade when applied in varied density dataset. Firstly the proposed partitioning method calculates the numbers of the group based on the histogram of the data distribution. Secondly it is determined whether to partition the dataset based on the threshold value. Sub-datasets generated by partitioning method will bind with mpts-HDBSCAN to find clusters and finally merge the sub-clusters to one. Experiment shows the proposed binding algorithm is more effective than mpts-HDBSCAN in finding clusters when dataset density is not even.

Key words: clustering, data partitioning, mpts-HDBSCAN, merging sub clusters

CLC Number: 

  • TP391

[1] 滕少华, 吴昊, 李日贵等. 可调多趟聚类挖掘在电信数据分析中的应用[J]. 广东工业大学学报, 2014, 31(3):1-7. TENG S H, WU H, LI R G, et al. The application of the adjustable multi-time clustering algorithm in telecom data[J]. Journal of Guangdong University of Technology, 2014, 31(3):1-7.
[2] HINNEBURG A, KEIM D. An efficient approach to clustering large muiti-media databases with noise[C]//Proceedings of the 4th ACM SIGKDD, American Association for Artificial Intelligence. New York:KDD, 1998:58-65.
[3] NISA K K, ANDRIANTO H A, Mardhiyyah R. Hotspot clustering using DBSCAN algorithm and shiny web framework[C]//Advanced Computer Science and Information Systems (ICACSIS), 2014 International Conference on.[S.l.]:IEEE, 2014:129-132.
[4] 潘玲玲, 张育平, 徐涛. 核DBSCAN算法在民航客户细分中的应用[J]. 计算机工程, 2012, 38(10):70-73. PAN L L, ZHANG Y P, XU T. Application of kernel DBSCAN algorithm in civil aviation customer segmentation[J]. Computer Engineering, 2012, 38(10):70-73.
[5] 朱烜璋. 基于DBSCAN的无线传感网定位方法[J]. 计算机工程与应用, 2013, 49(11):80-83. ZHU X Z. Location method based on DBSCAN in wireless sensor networks[J]. Computer Engineering and Applications, 2013, 49(11):80-83.
[6] VIJAYALAKSMI S, PUNITHAVALLI M. A fast approach to clustering datasets using DBSCAN and pruning algorithms[J]. International Journal of Computer Applications, 2012, 60(14):1-7.
[7] 李双庆, 慕升弟. 一种改进的DBSCAN算法及其应用[J]. 计算机工程与应用, 2014, 50(8):72-76. Li S Q, MU S D. Improved DBSCAN algorithm and its application[J]. Computer Engineering and Applications, 2014, 50(8):72-76.
[8] LOH W K, YU H. Fast density-based clustering through dataset partition using graphics processing units[J]. Information Sciences, 2015, 308(7):94-112.
[9] WANG S M, LIU Y, SHEN B. MDBSCAN:Multi-level density based spatial clustering of applications with noise[C]//Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society. Hagen, Germany:ACM, 2016:21-27.
[10] FU J S, LIU Y, CHAO H C. ICA:An incremental clustering algorithm based on OPTICS[J]. Wireless Personal Communications, 2015, 84(3):2151-2170.
[11] ANKERST M, BREUNIG M M, KRIEGEL H P, et al. OPTICS:ordering points to identify the clustering structure[J]//ACM Sigmod record. ACM, 1999, 28(2):49-60.
[12] CAMPELLO R J G B, MOULAVI D, SANDER J. Density-based clustering based on hierarchical density estimates[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin Heidelberg:Springer, 2013:160-172.
[13] DOCKHORN A, BRAUNE C, KRUSE R. An alternating optimization approach based on hierarchical adaptations of DBSCAN[C]//Computational Intelligence, 2015 IEEE Symposium Series on.[S.l.]:IEEE, 2015:749-755.
[14] 冯少荣, 肖文俊. DBSCAN聚类算法的研究与改进[J]. 中国矿业大学学报, 2008, 37(1):105-111. FENG S R, XIAO W J. An improved DBSCAN clustering algorithm[J]. Journal of China University of Mining & Technology, 2008, 37(1):105-111.
[15] HE Y B, TAN H Y, LUO W M, et al. MR-DBSCAN:a scalable map reduce-based DBSCAN algorithm for heavily skewed data[J]. Frontiers of Computer Science, 2014, 8(1):83-99.
[16] DAI B R, LIN I C. Efficient map/reduce-based dbscan algorithm with optimized data partition[C]//Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on.[S.l.]:IEEE, 2012:59-66.
[17] 周水庚, 周傲英, 曹晶, 等. 基于数据分区的DBSCAN算法[J]. 计算机研究与发展, 2000, 37(10):1153-1159. ZHOU S G, ZHOU A Y, CAO J. A data-partitioning-based dbscan algorithm[J]. Journal of Computer Research & Development, 2000, 37(10):1153-1159.

[1] Fan Juan, Deng Xiu-qin, Liu Yu-lan. A Spectral Clustering Algorithm Based on Fréchet Distance [J]. Journal of Guangdong University of Technology, 2023, 40(02): 39-44.
[2] Mo Zan, Fan Meng-ting, Liu Hong-wei, Yan Yang-fan. Market Structure of Product Asymmetric Competition Based on Online User Behavior [J]. Journal of Guangdong University of Technology, 2023, 40(02): 111-119.
[3] Fan Meng-ting, Liu Hong-wei, Gao Hong-ming, He Rui-chao. A Research on Competitive Product Market Structure of E-commerce Platform [J]. Journal of Guangdong University of Technology, 2019, 36(06): 32-37.
[4] He Qing-xiang, Zhang Wei. Application of Improved Clustering Algorithm in Terrorist Attacks [J]. Journal of Guangdong University of Technology, 2019, 36(04): 24-30.
[5] Zhang Wei, Mai Zhi-shen. A Research on Local Outlier Factor De-noising Method for Kernel Fuzzy Spectral Clustering [J]. Journal of Guangdong University of Technology, 2018, 35(06): 77-82.
[6] Chen Li, Cao Xi, Lin Jun-jie, Gao Hong-ming, Liu Fei-ya, Li Yan-yan. Prediction of Short-Term Load Based on Big Data Mining [J]. Journal of Guangdong University of Technology, 2017, 34(03): 105-109.
[7] Chen Ji-feng, Liu Guang-cong, Peng Cheng-ping. An Improved DV-Hop Localization Algorithm for Wireless Sensor Networks [J]. Journal of Guangdong University of Technology, 2017, 34(02): 80-85.
[8] SHEN Xiao-Min, LI Bao-Jun, SUN Xu, XU Wei-Chao. Large Scale Face Clustering Based on Convolutional Neural Network [J]. Journal of Guangdong University of Technology, 2016, 33(06): 77-84.
[9] TENG Shao-Hua, LIU Xiang. An E-CARGO Based Cluster Mechanism Research on WSN [J]. Journal of Guangdong University of Technology, 2015, 32(04): 92-98.
[10] WANG Bo, ZHONG Ying-Chun, CHEN Jun-Bin. Research on Speaker Recognition Based on Both AP and GMM [J]. Journal of Guangdong University of Technology, 2015, 32(04): 145-149.
[11] TENG Shao-Hua, WU Hao, LI Ri-Gui, ZHANG Wei, LIU Dong-Ning, LIANG Lu. The Application of the Adjustable Multitimes Clustering Algorithm in Telecom Data [J]. Journal of Guangdong University of Technology, 2014, 31(3): 1-7.
[12] JIANG Sheng-Yi, WANG Lian-Xi. Some Challenges in Clustering Analysis [J]. Journal of Guangdong University of Technology, 2014, 31(3): 32-38.
[13] Li Yun, Bao Hong. Research on Speech Recognition by Group Technology [J]. Journal of Guangdong University of Technology, 2014, 31(2): 54-57.
[14] YAO Lei. A Novel Parameter Optimization Algorithm for Mamdani Fuzzy Neural Networks Based on PSO [J]. Journal of Guangdong University of Technology, 2014, 31(1): 36-39.
[15] Liu Lin, Huang Ying, He Zhenhua. Segmentation of Medical Images Based on Extension Detecting Technology [J]. Journal of Guangdong University of Technology, 2013, 30(3): 18-22.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!