[1] 中国互联网信息中心(CNNIC). 2015年中国网络购物市场研究报告[R]. 北京:CNNIC, 2016. 6
[2] NIRAJ S, ASHUTOSH D, SHARMA A K. Design of a priority based frequency regulated incremental crawler[J]. International Journal of Computer Applications, 2010, 1(1):42-47.
[3] SHARMA AK, DIXIT A. Self adjusting refresh time based architecture for incremental web crawler[J]. International Journal of Computer Science and Network Security, 2008, 8(12):349-354.
[4] TESSERA D, CALZAROSSA M. Modeling and predicting temporal patterns of web content changes[J]. Journal of Network and Computer Applications, 2015, 2015(56):115-123.
[5] SIA K C, CHO J, CHO H K. Efficient monitoring algorithm for fast news alerts[J]. IEEE Transactions on Knowledge & Data Engineering, 2007, 19(7):950-961.
[6] 孟涛, 王继民, 闫宏飞. 网页变化与增量搜集技术[J]. 软件学报, 2006, 17(5):1051-1067. MENG T, WANG J M, YAN H F. Web evolution and incremental crawling[J]. Journal of Software, 2006, 17(5):1051-1067.
[7] CHO J, GARCIA-MOLINA H. Estimating frequency of change[J]. Acm Transactions on Internet Technology, 2003, 3(3):256-290.
[8] DIXIT A, SHARMA A K. A mathematical model for crawler revisit frequency[C]//Advance Computing Conference (IACC), 2010 IEEE 2nd International.[S.l.]:IEEE, 2010:316-319.
[9] 崔星灿, 禹晓辉, 刘洋, 等. 分布式流处理技术综述[J]. 计算机研究与发展, 2015, 52(2):318-332. Cui X C, Yu X H, LIU Y, et al. Distributed stream processing:a survey[J]. Journal of Computer Research and Development, 2015, 52(2):318-332.
[10] 邓立龙, 徐海水. Storm实现的应用模型研究[J]. 广东工业大学学报, 2014, 31(3):114-118. Deng L L, Xu H S. Research on applied models based on Storm[J]. Journal of Guangdong University of Technology, 2014, 31(3):114-118.
[11] YANG W, LIU X, ZHANG L, et al. Big data real-time pro-cessing based on Storm[C]//201312th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.[S.l.]:IEEE, 2013:1784-1787.
[12] UDAPURE T V. Study of web crawler and its different types[J]. IOSR Journals (IOSR Journal of Computer Engineering), 2014, 1(16):1-5.
[13] 董博, 郑庆华, 宋凯磊, 等. 基于多SimHash指纹的近似文本检测[J]. 小型微型计算机系统, 2011, 32(11):2152-2157. DONG B, ZHENG Q H, SONG K L, et al. Efficient near-duplicate detection based on multiple simhash fingerprints[J]. Journal of Chinese Computer Systems, 2011, 32(11):2152-2157.
[14] 寇月, 李冬, 申德荣等. D-EEM:一种基于DOM树的Deep Web实体抽取机制[J]. 计算机研究与发展, 2010, 47(5):858-86. KOU Y, LI D, SHEN D R. D-EEM:A DOM-tree based entity extraction mechanism for deep web[J]. Journal of Computer Research and Development, 2010, 47(5):858-86.
[15] MANKU G S, JAIN A, DAS SARMA A. Detecting near-duplicates for web crawling[C]//International Conference on World Wide Web.[S.l.]:ACM, 2007:141-150. |