一种改进的mpts-HDBSCAN算法

    An Improved mpts-HDBSCAN Algorithm

    • 摘要: 聚类分析是非监督模式分类的一个重要分支.DBSCAN算法是基于密度聚类的最常见算法,且具有可发现任意形状的簇并且对噪声点不敏感等优点而得到广泛研究与应用.本文首先研究了DBSCAN所存在的一些问题,以及当前基于DBSCAN算法改进算法所存在的不足.其次,对于mpts-HDBSCAN算法处理密度分布不均匀数据聚类效果不理想的情况,提出了一种新的分区算法.分区算法根据数据分布的直方图确定分组数据,根据分区阈值这个标准来确定是否对数据进行划分处理;然后运用mpts-HDBSCAN算法对划分后的子数据进行聚类,并对聚类的结果进行合并.实验结果表明,改进后的算法对于处理密度不均匀数据具有更好的效果.

       

      Abstract: Cluster analysis is an important branch of non-supervised model classification, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is one of the most common algorithms in density-based clustering methods. It's widely researched and applied in many fields as it can find clusters of arbitrary shapes with noises. Some shortcomings of DBSCAN and also recently improved algorithms based on DBSCAN are focused on. A new data partitioning method is proposed to solve the problem that mpts-HDBSCAN clustering quality will degrade when applied in varied density dataset. Firstly the proposed partitioning method calculates the numbers of the group based on the histogram of the data distribution. Secondly it is determined whether to partition the dataset based on the threshold value. Sub-datasets generated by partitioning method will bind with mpts-HDBSCAN to find clusters and finally merge the sub-clusters to one. Experiment shows the proposed binding algorithm is more effective than mpts-HDBSCAN in finding clusters when dataset density is not even.

       

    /

    返回文章
    返回