广东工业大学学报 ›› 2017, Vol. 34 ›› Issue (02): 86-91.doi: 10.12052/gdutxb.150120

• 综合研究 • 上一篇    下一篇

IFAMR:一种基于MapReduce的高效频繁项挖掘算法

刘祥佳, 程良伦   

  1. 广东工业大学 计算机学院, 广东 广州 510006
  • 收稿日期:2015-11-14 出版日期:2017-03-09 发布日期:2017-03-09
  • 作者简介:刘祥佳(1989-),男,硕士研究生,主要研究方向为大数据处理、数据挖掘.
  • 基金资助:

    国家基金广东省联合基金重点项目(U2012A002D01);国家自然科学基金青年科学基金资助项目(61502110)

IFAMR: An Efficent Frequent Itemset Mining Algorithm Based on MapReduce

Liu Xiang-jia, Cheng Liang-lun   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2015-11-14 Online:2017-03-09 Published:2017-03-09

摘要:

针对单机无法高效地利用现有的并行计算框架来加快对海量数据进行频繁项挖掘的问题,根据Apriori算法基本原理,结合MapReduce并行计算模型的优势,在FAMR算法的基础上提出了一种改进的高效频繁项挖掘算法IFAMR.该算法首先采用AprioriTID算法来对原始数据进行预处理,删除所有的低频1-项集,然后计算出每次处理集(L)和最小支持度(N)的长度来确定Map操作结束后的最大合并候选集合.IFAMR算法减少了Map函数中的低频项集的生成,通过与已有的算法进行实验对比表明,该算法有效地减少了内存占用,提高了算法的挖掘效率.

关键词: 频繁项集挖掘, MapReduce, FAMR, Apriori

Abstract:

Considering that single host in the existing parallel computing framework is not efficient in accelerating massive data mining frequent item, an improved efficient algorithm for mining frequent item IFAMR is proposed, combining the advantages of the MapReduce parallel computing model according to the basic principle of Apriori algorithm and based on the algorithm FAMR. This algorithm first uses AprioriTID algorithm to preprocess the raw data, deleting all the low-frequency 1-itemsets, and then calculate the length of each transaction set (L) and minimum support (N) to determine the maximum merger candidate Map sets of operations. IFAMR algorithm reduces the Map function to generate a low-frequency item set, and the algorithms are proved by experimental comparison to have greatly reduced memory footprint and effectively improved the efficiency of the mining process.

Key words: FIM (frequent itemset mining), MapReduce, FAMR (frequent itemset mining algorithm based on MapReduce), Apriori

中图分类号: 

  • TP311

[1] 李文俊. 基于MapReduce的Skyline查询算法研究[D]. 长沙:湖南大学计算机学院, 2014:22-45.
[2] 王乐. 数据流模式挖掘算法及应用研究[D]. 大连, 大连理工大学计算机学院, 2013:33-50.
[3] 张启徽. 关联规则挖掘中查找频繁项集的改进算法[J]. 统计与决策, 2015, 23(4):32-35.ZHANG Q H. Lookup in association rules mining algorithm of frequent itemset[J].Journal of Statistics and Decesion, 2015, 23(4):32-35.
[4] CHEN X S, ZHANG S, TONG T. FP-Growth algorithm based on boolean matrix and MapReduce[J]. Journal of South China University of Technology, 2014, 42(1):135-141.
[5] 陈平, 王利钢, 李博涵. 基于项约束的关联规则挖掘研究综述[J]. 制造业自动化, 2014, 19(16):46-53.CHEN P, WANG L G, LI B H. Based on constrained association rules mining research were reviewed[J]. Journal of Manufacturing Automation 2014, 19(16):46-53.
[6] 张巍, 刘峰, 滕少华. 改进的PrefixSpan算法及其在序列模式挖掘中的应用[J]. 广东工业大学学报, 2013, 30(4):49-54.ZHANG W, LIU F, TENG S H. Improved prefixspan algorithm and its application in sequential pattern mining[J].Journal of Guangdong University of Technology, 2013, 30(4):49-54.
[7] 寇香霞, 任永功, 宋奎勇. 一种基于滑动窗口的数据流频繁项集挖掘算法[J]. 计算机应用与软件, 2013, 30(1):143-146.KOU X X, REN Y G, SONG K Y. A sliding window based on the data flow algorithm for mining frequent itemset[J]. Journal of Computer Applications and Software, 2013, 30(1):143-146.
[8] LI N, Z ENG L, HE Q, et al. Parallel implementation of Apriori algorithm based on MapReduce[C]//Proc of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing. Bellinghan, USA:SNPD, 2012:236-24.
[9] YAHYA O, HEGAZY O, EZAT E. An efficient implementation of apriori algorithm based on hadoop-mapReduce model[C]//International Journal of Reviews in Computing. Washington D C, USA:IJRIC, 2012:760-771
[10] HUANG Y S. An efficient frequent patterns mining algorithm based on mapReduce framework[C]//Proc of the Human Computer Interaction with Mobile Devices and Services (HCIMD).Lisbon:ACM, 2013:260-269.
[11] 吴建章, 韩立新, 曾晓勤. 一种基于多核微机的闭频繁项集挖掘算法[J]. 计算机应用与软件, 2013, 30(3):44-46.WU J Z, HAN L X, ZENG X Q. A multi-core microcomputer based algorithm for mining frequent closed itemset[J]. Journal of Computer Applications and Software, 2013, 30(3):44-46.
[12] 杨鹏坤, 彭慧, 周晓锋. 改进的基于频繁模式树的最大频繁项集挖掘算法-FP-MFIA[J]. 计算机应用, 2015, 35(3):775-778.YANG P K, PENG H, ZHOU X F. Improved based on frequent pattern tree algorithm for mining maximum frequent itemset--FPMFIA[J]. Journal of Computer Applications, 2015, 35(3):775-778.
[13] 郑海雁, 王远方, 熊政. 标签集约束近似频繁模式的并行挖掘[J].计算机工程与应用, 2015, 6(3):1-10.ZHENG H Y, WANG Y F, XIONG Z. Label set constraint approximation parallel mining frequent patterns[J].Journal of Computer Engineering and Applications, 2015, 6(3):1-10.
[14] LIN M, LEE P, HSUEH S. Apriori-based frequent itemset mining algorithms on MapReduce[C]//Proc of the 16th International Conference on Ubiquitous Information Management and Communication. Washington D C, USA:ICUIMC, 2012:1029-1040.

[1] 林穗, 郑志豪. 基于关联规则的客户行为建模与商品推荐研究[J]. 广东工业大学学报, 2018, 35(03): 90-94.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!