Journal of Guangdong University of Technology ›› 2017, Vol. 34 ›› Issue (02): 86-91.doi: 10.12052/gdutxb.150120

Previous Articles     Next Articles

IFAMR: An Efficent Frequent Itemset Mining Algorithm Based on MapReduce

Liu Xiang-jia, Cheng Liang-lun   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2015-11-14 Online:2017-03-09 Published:2017-03-09

Abstract:

Considering that single host in the existing parallel computing framework is not efficient in accelerating massive data mining frequent item, an improved efficient algorithm for mining frequent item IFAMR is proposed, combining the advantages of the MapReduce parallel computing model according to the basic principle of Apriori algorithm and based on the algorithm FAMR. This algorithm first uses AprioriTID algorithm to preprocess the raw data, deleting all the low-frequency 1-itemsets, and then calculate the length of each transaction set (L) and minimum support (N) to determine the maximum merger candidate Map sets of operations. IFAMR algorithm reduces the Map function to generate a low-frequency item set, and the algorithms are proved by experimental comparison to have greatly reduced memory footprint and effectively improved the efficiency of the mining process.

Key words: FIM (frequent itemset mining), MapReduce, FAMR (frequent itemset mining algorithm based on MapReduce), Apriori

CLC Number: 

  • TP311

[1] 李文俊. 基于MapReduce的Skyline查询算法研究[D]. 长沙:湖南大学计算机学院, 2014:22-45.
[2] 王乐. 数据流模式挖掘算法及应用研究[D]. 大连, 大连理工大学计算机学院, 2013:33-50.
[3] 张启徽. 关联规则挖掘中查找频繁项集的改进算法[J]. 统计与决策, 2015, 23(4):32-35.ZHANG Q H. Lookup in association rules mining algorithm of frequent itemset[J].Journal of Statistics and Decesion, 2015, 23(4):32-35.
[4] CHEN X S, ZHANG S, TONG T. FP-Growth algorithm based on boolean matrix and MapReduce[J]. Journal of South China University of Technology, 2014, 42(1):135-141.
[5] 陈平, 王利钢, 李博涵. 基于项约束的关联规则挖掘研究综述[J]. 制造业自动化, 2014, 19(16):46-53.CHEN P, WANG L G, LI B H. Based on constrained association rules mining research were reviewed[J]. Journal of Manufacturing Automation 2014, 19(16):46-53.
[6] 张巍, 刘峰, 滕少华. 改进的PrefixSpan算法及其在序列模式挖掘中的应用[J]. 广东工业大学学报, 2013, 30(4):49-54.ZHANG W, LIU F, TENG S H. Improved prefixspan algorithm and its application in sequential pattern mining[J].Journal of Guangdong University of Technology, 2013, 30(4):49-54.
[7] 寇香霞, 任永功, 宋奎勇. 一种基于滑动窗口的数据流频繁项集挖掘算法[J]. 计算机应用与软件, 2013, 30(1):143-146.KOU X X, REN Y G, SONG K Y. A sliding window based on the data flow algorithm for mining frequent itemset[J]. Journal of Computer Applications and Software, 2013, 30(1):143-146.
[8] LI N, Z ENG L, HE Q, et al. Parallel implementation of Apriori algorithm based on MapReduce[C]//Proc of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing. Bellinghan, USA:SNPD, 2012:236-24.
[9] YAHYA O, HEGAZY O, EZAT E. An efficient implementation of apriori algorithm based on hadoop-mapReduce model[C]//International Journal of Reviews in Computing. Washington D C, USA:IJRIC, 2012:760-771
[10] HUANG Y S. An efficient frequent patterns mining algorithm based on mapReduce framework[C]//Proc of the Human Computer Interaction with Mobile Devices and Services (HCIMD).Lisbon:ACM, 2013:260-269.
[11] 吴建章, 韩立新, 曾晓勤. 一种基于多核微机的闭频繁项集挖掘算法[J]. 计算机应用与软件, 2013, 30(3):44-46.WU J Z, HAN L X, ZENG X Q. A multi-core microcomputer based algorithm for mining frequent closed itemset[J]. Journal of Computer Applications and Software, 2013, 30(3):44-46.
[12] 杨鹏坤, 彭慧, 周晓锋. 改进的基于频繁模式树的最大频繁项集挖掘算法-FP-MFIA[J]. 计算机应用, 2015, 35(3):775-778.YANG P K, PENG H, ZHOU X F. Improved based on frequent pattern tree algorithm for mining maximum frequent itemset--FPMFIA[J]. Journal of Computer Applications, 2015, 35(3):775-778.
[13] 郑海雁, 王远方, 熊政. 标签集约束近似频繁模式的并行挖掘[J].计算机工程与应用, 2015, 6(3):1-10.ZHENG H Y, WANG Y F, XIONG Z. Label set constraint approximation parallel mining frequent patterns[J].Journal of Computer Engineering and Applications, 2015, 6(3):1-10.
[14] LIN M, LEE P, HSUEH S. Apriori-based frequent itemset mining algorithms on MapReduce[C]//Proc of the 16th International Conference on Ubiquitous Information Management and Communication. Washington D C, USA:ICUIMC, 2012:1029-1040.

[1] Lin Sui, Zheng Zhi-hao. A Research of a Recommender System Based on Customer Behavior Modeling by Mining Association Rules [J]. Journal of Guangdong University of Technology, 2018, 35(03): 90-94.
[2] CAI Yong-Qiang, CHEN Ping-Hua, LI Hui. Parallel DBSCAN Algorithm Based on Cloud Computing Platform [J]. Journal of Guangdong University of Technology, 2016, 33(01): 51-56.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!