广东工业大学学报 ›› 2018, Vol. 35 ›› Issue (06): 77-82.doi: 10.12052/gdutxb.180053

• • 上一篇    下一篇

核模糊谱聚类LOF降噪方法研究

张巍, 麦志深   

  1. 广东工业大学 计算机学院, 广东 广州 510006
  • 收稿日期:2018-03-22 出版日期:2018-11-23 发布日期:2018-11-23
  • 作者简介:张巍(1964-),女,副教授,主要研究方向为协同计算、数据挖掘、大数据.E-mail:weizhang@gdut.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(61673123,61603100,61772141,61702110);广东省科技计划项目(2015B090901016,2016B010108007);广东省教育厅项目(粤教高函[2014]97号,粤教高函2015[133]号);广州市科技计划项目(2016201604030034,201604046017,201604020145,201508010067)

A Research on Local Outlier Factor De-noising Method for Kernel Fuzzy Spectral Clustering

Zhang Wei, Mai Zhi-shen   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2018-03-22 Online:2018-11-23 Published:2018-11-23

摘要: 为解决核模糊相似性度量谱聚类算法的样本点降噪问题,优化聚类效果和稳定性,本文从分析异常点分布特性出发,引入局部异常因子(LOF)算法,提出聚类中心候选对象的概念,过滤数据集的噪声数据,从而优化初始聚类中心的计算,突出正常样本点在聚类中心调整中的影响力,使聚类算法更易于得出准确的聚类结果. 同时提出一种局部过滤因子以修正相似性度量的方法,该方法通过放大正常数据之间的权值、缩小正常数据与噪声数据间的权值,使优化后的核模糊谱聚类算法大大降低对异常点的敏感度. 算法有效性实验和算法稳定性实验表明:该方法对相似性度量修正的有效性使核模糊谱聚类算法更为稳定和鲁棒.

关键词: 谱聚类, 核模糊相似性度量, 聚类中心候选对象, 局部过滤因子

Abstract: To deal with noise reduction of kernel fuzzy spectral clustering for obtaining a better cluster capability, a new spectral clustering based on local outlier factor was presented. With the proposed method the distribution features of outliers were analyzed. Based on the analysis result, the cluster center candidate object was proposed by using local filter factor algorithm. And then a process was constructed to filter the noise data and highlighted normal data's influence in the clustering center adjustment. Secondly, the local filter factor was presented by using local outlier factor between two arbitrary objects. Then the local filter factor was used as weighting factor to improve the similarity measure of kernel fuzzy spectral clustering. The improved similarity measure made weight between normal data and normal data became bigger and weight between normal data and noise data smaller. Therefore, the improved kernel fuzzy spectral clustering can reduce greatly the sensitivity of outliers. The validity experiment and stability experiment results show the proposed method has better clustering accuracy and robustness.

Key words: spectral clustering, kernel fuzzy spectral clustering, cluster center candidate object, local filter factor

中图分类号: 

  • TP391
[1] SHAH M, NAIR S. A survey of data mining clustering algorithms[J]. International Journal of Computer Applications, 2015, 128(1):1-5
[2] NG A Y, JORDAN M I, WEISS Y. On spectral clustering:Analysis and an algorithm[C]//14th International Conference on Neural Information Processing Systems:Natural and Synthetic. Vancouver:MIT Press, 2002:849-856.
[3] VONLUXBURG U. A tutorial on spectral clustering[J]. Statistics and Computing, 2007, 17(4):395-416
[4] 滕少华, 吴昊, 李日贵, 等. 可调多趟聚类挖掘在电信数据分析中的应用[J]. 广东工业大学学报, 2014, 31(3):1-7 TENG S H, WU H, LI R G, et al. The application of the adjustable multi-times clustering algorithm in telecom data[J]. Journal of Guangdong University of Technology, 2014, 31(3):1-7
[5] HAN J, PEI J, KAMBER M. Data mining:concepts and techniques[M].[S.l.]:Elsevier, 2011.
[6] 张巍, 黄健华, 刘冬宁, 等. 一种改进的结合评分和评论信息的推荐方法[J]. 广东工业大学学报, 2017, 34(6):27-31 ZHANG W, HUANG J H, LIU D N, et al. An improved recommendation method using rating and review information[J]. Journal of Guangdong University of Technology, 2017, 34(6):27-31
[7] YANG Y, WANG Y, CHEUNG Y. Kernel fuzzy similarity measure-based spectral clustering for image segmentation[C]//International Conference on Human-Computer Interaction. Berlin, Heidelberg:Springer, 2013:246-253.
[8] ZHAO F, LIU H, JIAO L. Spectral clustering with fuzzy similarity measure[J]. Digital Signal Processing, 2011, 21(6):701-709
[9] LI Q, REN Y, LI L, et al. Fuzzy based affinity learning for spectral clustering[J]. Pattern Recognition, 2016, 60:531-542
[10] BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF:identifying density-based local outliers[C]//ACM SIGMOD International Conference on Management of Data. Dallas:ACM, 2000, 29(2):93-104.
[11] CHEN W Y, SONG Y, BAI H, et al. Parallel spectral clustering in distributed systems[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(3):568-586
[12] GOYAL S, KUMAR S, ZAVERI M A, et al. Fuzzy similarity measure based spectral clustering framework for noisy image segmentation[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2017, 25(4):649-673
[13] AKOGLU L, TONG H, KOUTRA D. Graph based anomaly detection and description:a survey[J]. Data Mining and Knowledge Discovery, 2015, 29(3):626-688
[14] CELEBI M E, KINGRAVI H A, VELA P A. A comparative study of efficient initialization methods for the k-means clustering algorithm[J]. Expert Systems with Applications, 2013, 40(1):200-210
[15] MEILA M, SHI J. Learning segmentation by random walks[C]//Advances in Neural Information Processing Systems. Cambridge:MIT Press, 2001:873-879.
[16] WU M, SCHOLKOPF B. A local learning approach for clustering[C]//Advances in Neural Information Processing Systems. Cambridge:MIT Press, 2007:1529-1536.
[1] 谢国波, 林立, 林志毅, 贺笛轩, 文刚. 基于YOLOv4-MP的绝缘子爆裂缺陷检测方法[J]. 广东工业大学学报, 2023, 40(02): 15-21.
[2] 陈靖宇, 吕毅. 基于脉冲神经网络的冷链制冷机结霜检测方法[J]. 广东工业大学学报, 2023, 40(01): 29-38.
[3] 叶文权, 李斯, 凌捷. 基于多级残差U-Net的稀疏SPECT图像重建[J]. 广东工业大学学报, 2023, 40(01): 61-67.
[4] 邹恒, 高军礼, 张树文, 宋海涛. 围棋机器人落子指引装置的设计与实现[J]. 广东工业大学学报, 2023, 40(01): 77-82,91.
[5] 谢光强, 许浩然, 李杨, 陈广福. 基于多智能体强化学习的社交网络舆情增强一致性方法[J]. 广东工业大学学报, 2022, 39(06): 36-43.
[6] 刘信宏, 苏成悦, 陈静, 徐胜, 罗文骏, 李艺洪, 刘拔. 高分辨率桥梁裂缝图像实时检测[J]. 广东工业大学学报, 2022, 39(06): 73-79.
[7] 熊武, 刘义. 粒子滤波算法在BDS高铁铁轨静态形变监测中的应用研究[J]. 广东工业大学学报, 2022, 39(04): 66-72.
[8] 易闽琦, 刘洪伟, 高鸿铭. 电商平台产品共同购买网络的影响因素研究[J]. 广东工业大学学报, 2022, 39(03): 16-24.
[9] 丘展春, 费伦科, 滕少华, 张巍. 余弦相似度保持的掌纹识别算法[J]. 广东工业大学学报, 2022, 39(03): 55-62.
[10] 郑佳碧, 杨振国, 刘文印. 基于细粒度混杂平衡的营销效果评估方法[J]. 广东工业大学学报, 2022, 39(02): 55-61.
[11] Gary Yen, 栗波, 谢胜利. 地球流体动力学模型恢复的长短期记忆网络渐进优化方法[J]. 广东工业大学学报, 2021, 38(06): 1-8.
[12] 李光程, 赵庆林, 谢侃. 去中心化的数据处理方案设计[J]. 广东工业大学学报, 2021, 38(06): 77-83.
[13] 谢光强, 赵俊伟, 李杨, 许浩然. 基于多集群系统的车辆协同换道控制[J]. 广东工业大学学报, 2021, 38(05): 1-9.
[14] 张巍, 张圳彬. 联合图嵌入与特征加权的无监督特征选择[J]. 广东工业大学学报, 2021, 38(05): 16-23.
[15] 邓杰航, 袁仲鸣, 林好润, 顾国生. 协同超像素和视觉显著性的图像质量评价[J]. 广东工业大学学报, 2021, 38(05): 33-39.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!