广东工业大学学报 ›› 2018, Vol. 35 ›› Issue (05): 20-25,50.doi: 10.12052/gdutxb.180023

• 综合研究 • 上一篇    下一篇

结合ReliefF和互信息的多标签特征选择算法

陈平华1, 黄辉1, 麦淼2, 周宏虹3   

  1. 1. 广东工业大学 计算机学院, 广东 广州 510006;
    2. 广东南方报业传媒集团有限公司, 广东 广州 510601;
    3. 广东省科技创新监测研究中心, 广东 广州 510033
  • 收稿日期:2018-01-29 出版日期:2018-07-10 发布日期:2018-07-10
  • 通信作者: 黄辉(1991-),男,硕士研究生,主要研究方向为数据挖掘、机器学习,E-mail:daniel_allo@163.com E-mail:daniel_allo@163.com
  • 作者简介:陈平华(1967-),男,教授,主要研究方向为大数据与推荐系统.
  • 基金资助:
    国家自然科学基金资助项目(61572144);广东省科技计划项目(2013B091300009,2014B070706007,2017B030307002)

Multi-label Feature Selection Algorithm Based on ReliefF and Mutual Information

Chen Ping-hua1, Huang Hui1, Mai Miao2, Zhou Hong-hong3   

  1. 1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China;
    2. Guangdong Nanfang Media Group, Guangzhou 510601, China;
    3. Guangdong Science and Technology Innovation Monitoring and Research Center, Guangzhou 510033, China
  • Received:2018-01-29 Online:2018-07-10 Published:2018-07-10

摘要: 针对传统单标签特征选择算法不能直接应用于多标签数据的问题,提出一种多标签特征选择算法——MML-RF算法.在ReliefF的基础上,MML-RF算法提出新的类内最近邻样本查找方式,并结合多标签的贡献值改进特征权值的计算方法,能很好地适应多标签数据的特点;同时为了减少特征冗余,MML-RF算法以互信息作为特征冗余度量方式,提出一种去冗余方法,能够得到更小的特征子集.实验表明,MML-RF多标签特征选择算法得到的特征子集规模较小,且在多标签数据集上具有很好的分类效果,能够提升多标签学习和数据挖掘工作的效率.

关键词: 特征选择, 多标签学习, ReliefF, 互信息, 特征冗余

Abstract: In view of the problem that the traditional feature selection algorithm can not be applied to the multi-label learning context, a MML-RF algorithm is presented. The MML-RF improves the way of defining and searching nearest neighbor on the basis of the ReliefF, and introduces a new parameter to consider the contribution values of different labels. The improved weighting formula enables MML-RF to be used to the multi-label dataset. MML-RF algorithm makes use of mutual information as the measure of feature redundancy, and puts forward a solution to redundancy, which can get smaller subset of features. Experiments show that the feature subset of MML-RF is smaller, and has good classification effect on multi-label dataset, which can further enhance the efficiency of subsequent multi-label learning and data mining.

Key words: feature selection, multi-label learning, ReliefF, mutual information, feature redundancy

中图分类号: 

  • TP181
[1] ?O'LEARY D, KUBBY J. Feature selection and ANN solar power prediction[J/OL]. Journal of Renewable Energy, 2017, 2437387[2017-12-05]. https://doi.org/10.1155/2017/2437387.
[2] CHANDRASHEKAR G, SAHIN F. A survey on feature selection methods[J]. Computers & Electrical Engineering, 2014, 40(1):16-28
[3] 姚旭, 王晓丹, 张玉玺, 等. 特征选择方法综述[J]. 控制与决策, 2012, 27(2):161-166 YAO X, WANG X D, ZHANG Y X, et al. Summary of feature selection algorithms[J]. Control and Decision, 2012, 27(2):161-166
[4] 徐峻岭, 周毓明, 陈林, 等. 基于互信息的无监督特征选择[J]. 计算机研究与发展, 2012, 49(2):372-382 XU J L, ZHOU Y M, CHEN L, et al. An unsupervised feature selection approach based on mutual information[J]. Journal of Computer Research and Development, 2012, 49(2):372-382
[5] ROBNIK-ŠIKONJA M, KONONENKO I. Theoretical and empirical analysis of ReliefF and RReliefF[J]. Machine Learning, 2003, 53(1-2):23-69
[6] XIE Y, LI D, ZHANG D, et al. An improved multi-label Relief feature selection algorithm for unbalanced datasets[C]//Advances in Intelligent Systems and Interactive Applications.[S.l.]:Springer, 2017:141-151.
[7] FU Z, LU G, TING K M, et al. A survey of audio-based music classification and annotation[J]. IEEE Transactions on Multimedia, 2011, 13(2):303-319
[8] TANG J, ALELYANI S, LIU H. Feature selection for classification:a review[J]. Documentación Administrativa, 2014:313-334
[9] ZHANG M L, ZHOU Z H. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8):1819-1837
[10] KANJ S, ABDALLAH F, DENOEUX T, et al. Editing training data for multi-label classification with the k-nearest neighbor rule[J]. Pattern Analysis and Applications, 2016, 19(1):145-161
[11] QIU W R, ZHENG Q S, SUN B Q, et al. Multi-iPPseEvo:a multi-label classifier for identifying human phosphorylated proteins by incorporating evolutionary information into Chou's General PseAAC via Grey System Theory[J/OL]. Molecular Informatics, 2016, 36(3)[2017-11-25]. https://doi.org/10.1002/minf.201600085.
[12] 贺科达, 朱铮涛, 程昱. 基于改进TF-IDF算法的文本分类方法研究[J]. 广东工业大学学报, 2016, 33(5):49-53 HE K D, ZHU Z T, CHENG Y. A research on text classification method based on improved TF-IDF algorithm[J]. Journal of Guangdong University of Technology, 2016, 33(5):49-53
[13] ZHAO K, CHU W S, DE L T F, et al. Joint patch and multi-label learning for facial action unit detection[C]//Computer Vision and Pattern Recognition.[S.l.]:IEEE, 2015:2207-2216.
[14] WU B, ZHONG E, HORNER A, et al. Music emotions recognition by multi-label multi-layer multi-instance multi-view learning[C]//ACM International Conference on Multimedia.[S.l.]:ACM, 2014:117-126.
[15] CHEN G, YE D, XING Z, et al. Ensemble application of convolutional and recurrent neural networks for multi-label text categorization[C]//International Joint Conference on Neural Networks.[S.l.]:IEEE, 2017:2377-2383.
[16] 陈平华, 周鹏. 一种应用于噪声点分布密集环境下的噪声点识别算法[J]. 广东工业大学学报, 2014, 31(3):39-43 CHEN P H, ZHOU P. A recognition algorithm of noise applied to environments with intensive noise-data distribution[J]. Journal of Guangdong University of Technology, 2014, 31(3):39-43
[17] 黄莉莉, 汤进, 孙登第, 等. 基于多标签ReliefF的特征选择算法[J]. 计算机应用, 2012, 32(10):2888-2890 HUANG L L, TANG J, SUN D D, et al. Feature selection algorithm based on multi-label ReliefF[J]. Journal of Computer Applications, 2012, 32(10):2888-2890
[18] VERGARA J R, ESTÉVEZ P A. A review of feature selection methods based on mutual information[J]. Neural Computing and Applications, 2014, 24(1):175-186
[19] 胡学钢, 许尧, 李培培, 等. 一种过滤式多标签特征选择算法[J]. 南京大学学报(自然科学版), 2015, 51(4):723-730 HU X G, XU Y, LI P P, et al. A fillter multi-label feature selection algorithm[J]. Journal of Nanjing University (Natural Sciences), 2015, 51(4):723-730
[20] TSOUMAKAS G, SPYROMITROS-XIOUFIS E, VILCEK J, et al. MULAN:a Java library for multi-label learning[J]. Journal of Machine Learning Research, 2011, 12(7):2411-2414
[21] CHERMAN E A, VALVERDE-REBAZA J, MONARD M C. Lazy multi-label learning algorithms based on mutuality strategies[J]. Journal of Intelligent & Robotic Systems, 2015, 80(1):261-276
[22] REYES O, MORELL C, Ventura S. Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context[J]. Neurocomputing, 2015(161):168-182
[23] LEE J, KIM D W. Mutual information-based multi-label feature selection using interaction information[J]. Expert Systems with Applications, 2015, 42(4):2013-2025
[24] RODRIGUES D, PEREIRA L A M, NAKAMURA R Y M, et al. A wrapper approach for feature selection based on bat algorithm and optimum-path forest[J]. Expert Systems with Applications, 2014, 41(5):2250-2258
[25] 张浩荣, 陈平华, 熊建斌. 基于蚁群模拟退火算法的云环境任务调度[J]. 广东工业大学学报, 2014, 31(3):77-82 ZHANG H R, CHEN P H, XIONG J B. Task scheduling algorithm based on simulated annealing ant colony algorithm in cloud computing environment[J]. Journal of Guangdong University of Technology, 2014, 31(3):77-82
[1] 张巍, 张圳彬. 联合图嵌入与特征加权的无监督特征选择[J]. 广东工业大学学报, 2021, 38(05): 16-23.
[2] 谭有新, 滕少华. 短文本特征的组合加权方法[J]. 广东工业大学学报, 2020, 37(05): 51-61.
[3] 滕少华, 冯镇业, 滕璐瑶, 房小兆. 联合低秩表示与图嵌入的无监督特征选择[J]. 广东工业大学学报, 2019, 36(05): 7-13.
[4] 贺科达, 朱铮涛, 程昱. 基于改进TF-IDF算法的文本分类方法研究[J]. 广东工业大学学报, 2016, 33(05): 49-53.
[5] 张浩. 一种高维数据的因果推断算法[J]. 广东工业大学学报, 2015, 32(1): 117-120.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!