广东工业大学学报 ›› 2020, Vol. 37 ›› Issue (05): 51-61.doi: 10.12052/gdutxb.200019
谭有新, 滕少华
Tan You-xin, Teng Shao-hua
摘要: 文本情感分析是自然语言处理的典型任务,但是现有情感分析正确率不高,其中词的特征化是一个重要原因。本文提出了一种短文本特征的组合加权方法(a Combined Weighting method for Short Text Features,CWSTF),可以有效提高情感分析正确率。CWSTF方法以随机森林为基础评估特征对于情感的贡献度并排序,进而依排序来进行特征选择。然后考虑特征在文档中的重要性TF-IDF (Term Frequency-Inverse Document Frequency),以特征在文档中的重要性和情感贡献度确定该特征的权重。最后,用支持向量SVM (Support Vector Machine)、朴素贝叶斯NB (Naive Bayes)、最大熵ME (Maximum Entropy)、K最近邻KNN (K-NearestNeighbor)等分类器进行比较实验,实验结果表明采用本文方法处理的特征,比其余方法能有效提高情感分类正确率。
中图分类号:
[1] 张巍, 史文鑫, 刘冬宁, 等. 一种改进的带有情感信息的词向量学习方法[J]. 计算机应用研究, 2017, 34(8): 2287-2290 ZHANG W, SHI W X, LIU D N, et al. Improved approach of word vector learning via sentiment information [J]. Application Research of Computers, 2017, 34(8): 2287-2290 [2] DENG Z H, LUO K H, YU H L. A study of supervised term weighting scheme for sentiment analysis [J]. Expert Systems with Applications, 2014, 41(7): 3506-3513 [3] DAS O, BALABANTARAY R C. Sentiment analysis of movie reviews using POS tags and term frequencies [J]. International Journal of Computer Applications, 2014, 96(25): 34-41 [4] BEHDENNA S, BARIGOU F, BELALEM G. Sentiment analysis at document level[C]//International Conference on Smart Trends for Information Technology and Computer Communications. Singapore: Springer, 2016: 159-168. [5] PARLAR T, ÖZEL S A. A new feature selection method for sentiment analysis of Turkish reviews[C]//2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA). Sinaia: IEEE, 2016: 1-6. [6] JOULIN A, GRAVE É, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. East Stroudsburg: The Association for Computational Linguistics, 2017: 427-431. [7] CLAYPO N, JAIYEN S. Opinion mining for Thai restaurant reviews using neural networks and mRMR feature selection[C]//2014 International Computer Science and Engineering Conference (ICSEC). Khon Kaen: IEEE, 2014: 394-397. [8] ZHANG L, QIAN G Q, FAN W G, et al. Sentiment analysis based on light reviews [J]. Ruan Jian Xue Bao/Journal of Software, 2014, 25(12): 2790-2807 [9] PANG B, LEE L. Opinion mining and sentiment analysis [J]. Foundations and Trends® in Information Retrieval, 2008, 2(1-2): 1-135 [10] MA B J, YUAN H, WU Y. Exploring performance of clustering methods on document sentiment analysis [J]. Journal of Information Science, 2017, 43(1): 54-74 [11] RAMTEKE J, SHAH S, GODHIA D, et al. Election result prediction using Twitter sentiment analysis[C]//2016 International Conference on Inventive Computation Technologies (ICICT). Coimbatore: IEEE, 2016, 1: 1-5. [12] WU W, GAO B, YANG H, et al. The impacts of reviews on hotel satisfaction: a sentiment analysis method [J]. Data Analysis and Knowledge Discovery, 2017, 1(3): 62-71 [13] SHIVAPRASAD T K, SHETTY J. Sentiment analysis of product reviews: a review[C]//2017 International Conference on Inventive Communication and Computational Technologies (ICICCT). Coimbatore: IEEE, 2017: 298-301. [14] 莫赞, 罗敏瑶. 在线评论对消费者购买决策的影响研究——基于评论可信度和信任倾向的中介、调节作用[J]. 广东工业大学学报, 2019, 36(2): 58-65 MO Z, LUO M Y. A research of the influence of online reviews on consumer purchase decision based on mediation and adjustment of reliability comments and trust tendency [J]. Journal of Guangdong University of Technology, 2019, 36(2): 58-65 [15] 张巍, 黄健华, 刘冬宁, 等. 一种改进的结合评分和评论信息的推荐方法[J]. 广东工业大学学报, 2017, 34(6): 31-35, 52 ZHANG W, HUANG J H, LIU D N, et al. An improved recommendation method combining scoring and comment information [J]. Journal of Guangdong University of Technology, 2017, 34(6): 31-35, 52 [16] ALLEN T T, SUI Z, PARKER N L. Timely decision analysis enabled by efficient social media modeling [J]. Decision Analysis, 2017, 14(4): 250-260 [17] KNOX G, EORGE, VAN OEST, RUTGER. Customer complaints and recovery effectiveness: a customer base approach [J]. Journal of Marketing A Quarterly Publication of the American Marketing Association, 2014, 78(5): 42-57 [18] SALEH M R, MARTÍN-VALDIVIA M T, M-ONTEJO-RÁEZ A, et al. Experiments with SVM to classify opinions in different domains [J]. Expert Systems with Applications, 2011, 38(12): 14799-14804 [19] WU K, LU B L, UCHIYAMA M, et al. A probabilistic approach to feature selection for multiclass text categorization[C]//International Symposium on Neural Networks. Heidelberg: Springer, 2007: 1310-1317. [20] BIDI N, ELBERRICHI Z. Feature selection for text classification using genetic algorithms[C]//2016 8th International Conference on Modelling, Identification and Control (ICMIC). Algiers: IEEE, 2016: 806-810. [21] WANG X, CAO J, LIU Y, et al. Text clustering based on the improved tfidf by the iterative algorithm[C]//2012 IEEE Symposium on Electrical & Electronics Engineering (EEESYM). Malaysia: IEEE, 2012: 140-143. [22] LEE J, KIM D. Memetic feature selection algorithm for multilabel classification [J]. Information Sciences, 2015, 293(293): 80-96 [23] XUE B, ZHANG M, BROWNE W N, et al. A survey on evolutionary computation approaches to feature selection [J]. IEEE Transactions on Evolutionary Computation, 2015, 20(4): 606-626 [24] AL-JADIR I, WONG K W, FUNG C C, et al. Text dimensionality reduction for document clustering using hybrid memetic feature selection[C]//International Workshop on Multi-Disciplinary Trends in Artificial Intelligence. Cham: Springer, 2017: 281-289. [25] KUMBHAR P, MALI M, ATIQUE M. A genetic-fuzzy approach for automatic text categorization[C]//2017 IEEE 7th International Advance Computing Conference (IACC). Hyderabad: IEEE, 2017: 572-578. [26] WANG S, MANNING C D. Baselines and bigrams: Simple, good sentiment and topic classification[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Pennsylvania: Association for Computational Linguistics, 2012: 90-94. [27] KUMAR H M, HARISH B S, DARSHAN H K, et al. Sentiment analysis on IMDB movie reviews using hybrid feature extraction method [J]. International Journal of Interactive Multimedia and Artificial Intelligence, 2019, 5(5): 109-114 [28] MAAS A L, DALY R E, PHAM P T, et al. Learning word vectors for sentiment analysis[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Portland: Association for Computational Linguistics, 2011: 142-150. [29] ZHENG L, WANG H, GAO S. Sentimental feature selection for sentiment analysis of Chinese online reviews [J]. International Journal of Machine Learning and Cybernetics, 2018, 9(1): 75-84 [30] AYMEN ABU-ERRUB. Arabic text classification algorithm using TFIDF and Chi Square measurements [J]. International Journal of Computer Applications, 2014, 93(6): 40-45 [31] ISMAIL H M, BELKHOUCHE B, ZAKI N. Semantic Twitter sentiment analysis based on a fuzzy thesaurus [J]. Soft Computing, 2018, 22(18): 6011-6024 |
[1] | 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9. |
[2] | 张巍, 张圳彬. 联合图嵌入与特征加权的无监督特征选择[J]. 广东工业大学学报, 2021, 38(05): 16-23. |
[3] | 滕少华, 冯镇业, 滕璐瑶, 房小兆. 联合低秩表示与图嵌入的无监督特征选择[J]. 广东工业大学学报, 2019, 36(05): 7-13. |
[4] | 陈平华, 黄辉, 麦淼, 周宏虹. 结合ReliefF和互信息的多标签特征选择算法[J]. 广东工业大学学报, 2018, 35(05): 20-25,50. |
[5] | 饶东宁, 黄思宏. 基于THUCTC的金融语料情感分析模型优化[J]. 广东工业大学学报, 2018, 35(03): 37-42. |
[6] | 陈炳丰, 郝志峰, 蔡瑞初, 温雯, 王丽娟, 黄浩, 蔡晓凤. 面向汽车评论的细粒度情感分析方法研究[J]. 广东工业大学学报, 2017, 34(03): 8-14. |
[7] | 梁礼欣, 郝志峰, 蔡瑞初, 温雯. 基于混合高斯分布伪样本生成的情感分析方法[J]. 广东工业大学学报, 2016, 33(06): 85-90. |
[8] | 贺科达, 朱铮涛, 程昱. 基于改进TF-IDF算法的文本分类方法研究[J]. 广东工业大学学报, 2016, 33(05): 49-53. |
|