广东工业大学学报 ›› 2017, Vol. 34 ›› Issue (03): 1-7.doi: 10.12052/gdutxb.170008

• 大数据基础理论与应用专题 •    下一篇

基于正交投影的降维分类方法研究

滕少华, 卢东略, 霍颖翔, 张巍   

  1. 广东工业大学 计算机学院, 广东 广州 510006
  • 收稿日期:2017-01-11 出版日期:2017-05-09 发布日期:2017-05-09
  • 通信作者: 张巍(1964-),女,副教授,主要研究方向为协同计算、数据挖掘、网络安全、大数据.E-mail:weizhang@gdut.edu.cn E-mail:weizhang@gdut.edu.cn
  • 作者简介:滕少华(1962-),男,教授,博士,主要研究方向为协同计算、数据挖掘、网络安全、大数据.
  • 基金资助:

    国家自然科学基金资助项目(61402118,61673123);广东省科技计划项目(2015B090901016,2016B010108007);广东省教育厅项目(粤教高函2015[133]号,粤教高函[2014]97号);广州市科技计划项目(201604020145,2016201604030034,201508010067)

Classification Method Based on Dimension Reduction

Teng Shao-hua, Lu Dong-lue, Huo Ying-xiang, Zhang Wei   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2017-01-11 Online:2017-05-09 Published:2017-05-09

摘要:

大数据时代要求数据挖掘算法能高效处理海量数据,针对传统分类算法建模时间长、分类耗时久、算法难以理解等问题,提出一种基于正交投影的降维分类方法,通过投影方式将多维分类问题转化为多个二维投影面组合问题,建立投影面的密度模型进行分类.相比于常用的支持向量机(Support Vector Machine,SVM)、逻辑回归(Logistic Regression,LR)、k最近邻(K-Nearest Neighbor,KNN)等分类算法,基于正交投影降维的分类方法能够在不丢失分类准确度的同时,拥有较高的模型训练效率与分类效率.其算法易于实现,可用于实时性要求较高的应用场合,如入侵检测,交通调度等.

关键词: 数据挖掘, 分类, 正交投影, 降维

Abstract:

Data mining algorithm in the era of big data needs to be able to efficiently deal with massive data. Traditional classification algorithms take a long time to train a model and classify the test dataset, and the algorithm is difficult to understand. To deal with the problems, a classification method based on dimension reduction is proposed in this paper. The multidimensional classification problem is transformed into multiple 2D projection surface combination by projection, and a density model of the projection surface is trained for classification. Compared with Support Vector Machines (SVM), Logistic Regression (LR), K-Nearest Neighbor (KNN) and other algorithms, the classification method based on dimension reduction has higher training efficiency and classification efficiency without loss of accuracy. The method is easy to implement, so it can be used for real-time application, such as intrusion detection and traffic scheduling.

Key words: data mining, classification, orthogonal projection, dimension reduction

中图分类号: 

  • TP391

[1] TENG L Y, TENG S H, TANG F, et al. A collaborative and adaptive intrusion detection based on SVMs and decision trees[C]//IEEE International Conference on Data Mining Workshop.[S.l.]:IEEE, 2014:898-905.
[2] 滕少华, 严远驰, 刘冬宁, 等. 基于FCM-C4.5的双过滤入侵检测机制[J]. 计算机应用与软件, 2016, 33(1):307-311. TENG S H, YAN Y C, LIU D N, et al. A dual filtration intrusion detection mechanism based on FCM and C4.5[J]. Computer Applications and Software, 2016, 33(1):307-311.
[3] VARUNA S, NATESAN P. An integration of k-means clustering and naïve bayes classifier for intrusion detection[C]//International Conference on Signal Processing, Communication and NETWORKING.[S.l.]:IEEE, 2015.
[4] GUMUS F, SAKAR C O, ERDEM Z, et al. Online naive bayes classification for network intrusion detection[C]//IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.[S.l.]:IEEE, 2014:670-674.
[5] 华辉有, 陈启买, 刘海, 等. 一种融合Kmeans和KNN的网络入侵检测算法[J]. 计算机科学, 2016, 43(3):158-162. HUA H Y, CHEN Q M, LIU H, et al. Hybrid K-means with KNN for network intrusion detection algorithm[J]. Computer Science, 2016, 43(3):158-162.
[6] 张雪芹, 顾春华, 吴吉义, 等. 基于约简支持向量机的快速入侵检测算法[J]. 华南理工大学学报(自然科学版), 2011, 39(2):108-112. ZHANG X Q, GU C H, WU J Y.et al. Fast intrusion detection algorithm based on reduced SVM[J]. Journal of South China University of Technology (Natural Science Edition). 2011, 39(2):108-112.
[7] 毕孝儒. 基于粗糙集属性约简和加权SVM的入侵检测方法研究[D]. 西安:西安科技大学计算机学院, 2011.
[8] DU H, TENG S, FU X, et al. A cooperative intrusion detection system based on improved parallel SVM[C]//Pervasive Computing (JCPC), 2009 Joint Conferences on.[S.1.]:IEEE, 2009:515-518.
[9] 刘琪. DH-SVM:基于SVM和动态混合算法的公交车辆路段运行时间估计与预测方法的研究[D]. 济南:山东大学微电子学院, 2015.
[10] 柏丛, 彭仲仁. 基于动态模型的公交车行程时间预测[J]. 计算机工程与应用, 2016, 52(3):103-107. BAI C, PENG Z R. Bus travel time prediction based on dynamic model[J]. Computer Engineering and Applications, 2016, 52(3):103-107.
[11] 杨婷, 滕少华. 改进的贝叶斯分类方法在电信客户流失中的研究与应用[J]. 广东工业大学学报, 2015, 32(3):67-72. YANG T, TENG S H. Research and application of improved bayes algorithm for the telecommunication customer churn[J]. Journal of Guangdong University of Technology, 2015, 32(3):67-72.
[12] 夏琴晔, 杨宜民. 基于biSCAN和SVM的机器人目标识别新算法研究[J]. 广东工业大学学报, 2013, 30(4):65-69. XIA Q Y, YANG Y M. Research on a new algorithm for robots's recognition of objects based on biSCAN and SVM[J]. Journal of Guangdong University of Technology, 2013, 30(4):65-69.
[13] LOHWEG V. UCI Machine Learning Repository:banknote authentication Data Set[EB/OL]. (2012-03-01)[2017-02-22]. http://archive.ics.uci.edu/ml/datasets/banknote+authentication#.
[14] BHATT R B, SHARMA G, DHALL A, et al. Efficient skin region segmentation using low complexity fuzzy decision tree model[C]//India Conference (INDICON), 2009 Annual IEEE.[S.1.]:IEEE, 2009.
[15] MALERBA D. UCI Machine Learning Repository:Page Blocks Classification Data Set[EB/OL]. (1996-11-03)[2017-02-22]. http://archive.ics.uci.edu/ml/datasets/Page+Blocks+Classification.
[16] MALERBA D. UCI Machine Learning Repository:Letter Recognition Data Set[EB/OL]. (1991-01-01)[2017-02-22]. http://archive.ics.uci.edu/ml/datasets/Letter+Recognition.

[1] 张欣, 王振友. 概率条件下基于双目标交替优化的知识表示模型[J]. 广东工业大学学报, 2022, 39(04): 24-31.
[2] 刘高勇, 谭依雯, 艾丹祥, 黄靖钊. 基于观点挖掘的突发事件微博意见领袖识别方法[J]. 广东工业大学学报, 2021, 38(04): 41-51.
[3] 王彦光, 朱鸿斌, 徐维超. ROC曲线及其分析方法综述[J]. 广东工业大学学报, 2021, 38(01): 46-53.
[4] 滕少华, 陈成, 霍颖翔. 小样本纠错的多层入侵检测分类研究[J]. 广东工业大学学报, 2020, 37(03): 9-16.
[5] 冯广, 孔立斌, 石鸣鸣, 贺敏慧, 何雅萱. 基于Inception与Residual组合网络的农作物病虫害识别[J]. 广东工业大学学报, 2020, 37(03): 17-22.
[6] 曾碧卿, 韩旭丽, 王盛玉, 徐如阳, 周武. 基于双注意力卷积神经网络模型的情感分析研究[J]. 广东工业大学学报, 2019, 36(04): 10-17.
[7] 刘贻新, 梁霄, 朱怀念, 张光宇. 新兴技术产业化障碍因素的识别及其分类:可持续转型视角[J]. 广东工业大学学报, 2018, 35(04): 1-9.
[8] 饶东宁, 黄思宏. 基于THUCTC的金融语料情感分析模型优化[J]. 广东工业大学学报, 2018, 35(03): 37-42.
[9] 黎启祥, 肖燕珊, 郝志峰, 阮奕邦. 基于抗噪声的多任务多示例学习算法研究[J]. 广东工业大学学报, 2018, 35(03): 47-53.
[10] 陈丽, 曹熙, 林俊杰, 高鸿铭, 刘飞雅, 李艳艳. 基于数据挖掘的短期电力负荷风险预测分析[J]. 广东工业大学学报, 2017, 34(03): 105-109.
[11] 贺科达, 朱铮涛, 程昱. 基于改进TF-IDF算法的文本分类方法研究[J]. 广东工业大学学报, 2016, 33(05): 49-53.
[12] 陈保颖, 高学军. 一种新的三维二次自治型混沌系统的分类准则[J]. 广东工业大学学报, 2016, 33(01): 26-28.
[13] 陈静, 刘洋. 基于最小熵的流形学习排列方法[J]. 广东工业大学学报, 2015, 32(3): 39-45.
[14] 杨婷, 滕少华. 改进的贝叶斯分类方法在电信客户流失中的研究与应用[J]. 广东工业大学学报, 2015, 32(3): 67-72.
[15] 范丹君, 骆德汉, 于昊. 一种基于电子鼻的辛味中药材的分类鉴别方法研究[J]. 广东工业大学学报, 2015, 32(3): 91-96.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!