Journal of Guangdong University of Technology ›› 2017, Vol. 34 ›› Issue (03): 83-88.doi: 10.12052/gdutxb.170042

Previous Articles     Next Articles

An Online Product Review Information Collection Method Based on Storm

Luo Kui-yong, Hao Zhi-feng, Cai Rui-chu, Wen Wen, Yuan Qin   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2017-02-28 Online:2017-05-09 Published:2017-05-09

Abstract:

With regard to getting comment information of the products in the electricity sales website as soon as possible and grasping product public opinion in real time, a method of online product reviews information collection based on Storm is presented. The concept of flow computation is applied to the web crawler, and the SHHD (Simhash Hamming Distance) algorithm is used to dynamically adjust the acquisition period. Experimental results show that information collection based on Storm has the advantages of large throughput and easy updating. The SHHD algorithm can effectively reduce the acquisition system on the network bandwidth and system resources consumption and achieve an adaptive incremental online product review information collection process. SHHD has certain advantages in the lag of product comment information acquisition than Poisson and SART.

Key words: product review information, Storm, adaptability, Incremental acquisition

CLC Number: 

  • TP391

[1] 中国互联网信息中心(CNNIC). 2015年中国网络购物市场研究报告[R]. 北京:CNNIC, 2016. 6
[2] NIRAJ S, ASHUTOSH D, SHARMA A K. Design of a priority based frequency regulated incremental crawler[J]. International Journal of Computer Applications, 2010, 1(1):42-47.
[3] SHARMA AK, DIXIT A. Self adjusting refresh time based architecture for incremental web crawler[J]. International Journal of Computer Science and Network Security, 2008, 8(12):349-354.
[4] TESSERA D, CALZAROSSA M. Modeling and predicting temporal patterns of web content changes[J]. Journal of Network and Computer Applications, 2015, 2015(56):115-123.
[5] SIA K C, CHO J, CHO H K. Efficient monitoring algorithm for fast news alerts[J]. IEEE Transactions on Knowledge & Data Engineering, 2007, 19(7):950-961.
[6] 孟涛, 王继民, 闫宏飞. 网页变化与增量搜集技术[J]. 软件学报, 2006, 17(5):1051-1067. MENG T, WANG J M, YAN H F. Web evolution and incremental crawling[J]. Journal of Software, 2006, 17(5):1051-1067.
[7] CHO J, GARCIA-MOLINA H. Estimating frequency of change[J]. Acm Transactions on Internet Technology, 2003, 3(3):256-290.
[8] DIXIT A, SHARMA A K. A mathematical model for crawler revisit frequency[C]//Advance Computing Conference (IACC), 2010 IEEE 2nd International.[S.l.]:IEEE, 2010:316-319.
[9] 崔星灿, 禹晓辉, 刘洋, 等. 分布式流处理技术综述[J]. 计算机研究与发展, 2015, 52(2):318-332. Cui X C, Yu X H, LIU Y, et al. Distributed stream processing:a survey[J]. Journal of Computer Research and Development, 2015, 52(2):318-332.
[10] 邓立龙, 徐海水. Storm实现的应用模型研究[J]. 广东工业大学学报, 2014, 31(3):114-118. Deng L L, Xu H S. Research on applied models based on Storm[J]. Journal of Guangdong University of Technology, 2014, 31(3):114-118.
[11] YANG W, LIU X, ZHANG L, et al. Big data real-time pro-cessing based on Storm[C]//201312th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.[S.l.]:IEEE, 2013:1784-1787.
[12] UDAPURE T V. Study of web crawler and its different types[J]. IOSR Journals (IOSR Journal of Computer Engineering), 2014, 1(16):1-5.
[13] 董博, 郑庆华, 宋凯磊, 等. 基于多SimHash指纹的近似文本检测[J]. 小型微型计算机系统, 2011, 32(11):2152-2157. DONG B, ZHENG Q H, SONG K L, et al. Efficient near-duplicate detection based on multiple simhash fingerprints[J]. Journal of Chinese Computer Systems, 2011, 32(11):2152-2157.
[14] 寇月, 李冬, 申德荣等. D-EEM:一种基于DOM树的Deep Web实体抽取机制[J]. 计算机研究与发展, 2010, 47(5):858-86. KOU Y, LI D, SHEN D R. D-EEM:A DOM-tree based entity extraction mechanism for deep web[J]. Journal of Computer Research and Development, 2010, 47(5):858-86.
[15] MANKU G S, JAIN A, DAS SARMA A. Detecting near-duplicates for web crawling[C]//International Conference on World Wide Web.[S.l.]:ACM, 2007:141-150.

[1] Zhou Qian-qian, Li A-ting, Zhang Xi. Vulnerability Analysis and Traffic Route Optimization in Urban Flood Events [J]. Journal of Guangdong University of Technology, 2018, 35(04): 81-85.
[2] DENG Li-Long, XU Hai-Shui. Research on Applied Models Based on Storm [J]. Journal of Guangdong University of Technology, 2014, 31(3): 114-118.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!