广东工业大学学报 ›› 2013, Vol. 30 ›› Issue (4): 49-54.doi: 10.3969/j.issn.1007-7162.2013.04.008

• 综合研究 • 上一篇    下一篇

改进的PrefixSpan算法及其在序列模式挖掘中的应用

张巍,刘峰, 滕少华   

  1. 广东工业大学 计算机学院, 广东 广州 510006
  • 收稿日期:2013-10-08 出版日期:2013-12-30 发布日期:2013-12-30
  • 作者简介:张巍(1964-),女,副教授,主要研究方向为数据挖掘、协同计算.
  • 基金资助:

    教育部重点实验室基金资助项目(110411);广东省自然科学基金资助项目(10451009001004804, 9151009001000007);广东省科技计划项目(2012B091000173);广州市科技计划项目(2012J5100054, 2013J4500028)、韶关市科技计划项目(2010CXY/C05)

Improved Prefixspan Algorithm and Its Application in Sequential Pattern Mining

Zhang Wei, Liu Feng, Teng Shao-hua   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2013-10-08 Online:2013-12-30 Published:2013-12-30

摘要: 由于序列模式挖掘需要花费大量计算时间,并需要占用大量存储空间.减少计算量、节省存储空间开销成为序列模式挖掘的关键.因PrefixSpan 算法不产生候选,而适当应用Bitmap数据结构可避免重复扫描数据库,基于此,本文提出了BM-PrefixSpan算法,用于序列模式挖掘,设计并构造了PFPBM(Prefix of First Position on BitMap)表用于记录序列中的每个项在位图中第1次出现的位置.实验结果表明,BM-PrefixSpan算法综合了PrefixSpan和SPAM算法的优点,能够更快、更好地挖掘出序列模式.

关键词: 序列模式;前缀投影序列模式挖掘;序列模式挖掘;位图;数据挖掘

Abstract: Because sequential pattern mining needs a lot of computing time and storage space, how to reduce the amount of calculation and storage space becomes the key of the sequential pattern mining algorithm. Combining the PrefixSpan algorithm with Bitmap data structure, this text proposes an improved sequential pattern mining algorithm BM-PrefixSpan. The PFPBM (Prefix of First Position on BitMap) table was designed and implemented. When a new item appeared in a sequence, it was recorded in the PFPBM table. The experimental results show that the BM-prefixspan algorithm mines sequential patterns faster and better than others.

Key words: sequence pattern; PrefixSpan(Prefixprojected Sequential Pattern Mining);  SPAM(Sequence Pattern Mining); bitmap; data mining

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!