Journal of Guangdong University of Technology ›› 2020, Vol. 37 ›› Issue (05): 51-61.doi: 10.12052/gdutxb.200019

Previous Articles     Next Articles

Combined Weighting Method for Short Text Features

Tan You-xin, Teng Shao-hua   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2020-02-02 Online:2020-09-17 Published:2020-09-17

Abstract: Text sentiment analysis is a typical task of natural language processing, but the accuracy of existing sentiment analysis is not high, and word characterization is an important reason. A combined weighting method for short text features (CWSTF) is proposed, which can effectively improve the accuracy of sentiment analysis. The CWSTF method evaluates the contribution of features to emotions based on random forests and ranks them, and then filters features based on ranks. Then, the importance of the feature in the document is calculated by TF-IDF (Term Frequency-Inverse Document Frequency), and the final weight of the feature is determined by the importance of the feature in the document and the contribution to the sentiment; Finally, four such classifiers as SVM (Support Vector Machine), NB (Naive Bayes), ME (Maximum Entropy), and KNN (K-Nearest Neighbor) are used for comparison experiments. The experimental results show that the features processed by proposed method can more effectively improve the accuracy of sentiment classification than other methods.

Key words: sentiment analysis, feature selection, combined weighting

CLC Number: 

  • TP391
[1] 张巍, 史文鑫, 刘冬宁, 等. 一种改进的带有情感信息的词向量学习方法[J]. 计算机应用研究, 2017, 34(8): 2287-2290
ZHANG W, SHI W X, LIU D N, et al. Improved approach of word vector learning via sentiment information [J]. Application Research of Computers, 2017, 34(8): 2287-2290
[2] DENG Z H, LUO K H, YU H L. A study of supervised term weighting scheme for sentiment analysis [J]. Expert Systems with Applications, 2014, 41(7): 3506-3513
[3] DAS O, BALABANTARAY R C. Sentiment analysis of movie reviews using POS tags and term frequencies [J]. International Journal of Computer Applications, 2014, 96(25): 34-41
[4] BEHDENNA S, BARIGOU F, BELALEM G. Sentiment analysis at document level[C]//International Conference on Smart Trends for Information Technology and Computer Communications. Singapore: Springer, 2016: 159-168.
[5] PARLAR T, ÖZEL S A. A new feature selection method for sentiment analysis of Turkish reviews[C]//2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA). Sinaia: IEEE, 2016: 1-6.
[6] JOULIN A, GRAVE É, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. East Stroudsburg: The Association for Computational Linguistics, 2017: 427-431.
[7] CLAYPO N, JAIYEN S. Opinion mining for Thai restaurant reviews using neural networks and mRMR feature selection[C]//2014 International Computer Science and Engineering Conference (ICSEC). Khon Kaen: IEEE, 2014: 394-397.
[8] ZHANG L, QIAN G Q, FAN W G, et al. Sentiment analysis based on light reviews [J]. Ruan Jian Xue Bao/Journal of Software, 2014, 25(12): 2790-2807
[9] PANG B, LEE L. Opinion mining and sentiment analysis [J]. Foundations and Trends® in Information Retrieval, 2008, 2(1-2): 1-135
[10] MA B J, YUAN H, WU Y. Exploring performance of clustering methods on document sentiment analysis [J]. Journal of Information Science, 2017, 43(1): 54-74
[11] RAMTEKE J, SHAH S, GODHIA D, et al. Election result prediction using Twitter sentiment analysis[C]//2016 International Conference on Inventive Computation Technologies (ICICT). Coimbatore: IEEE, 2016, 1: 1-5.
[12] WU W, GAO B, YANG H, et al. The impacts of reviews on hotel satisfaction: a sentiment analysis method [J]. Data Analysis and Knowledge Discovery, 2017, 1(3): 62-71
[13] SHIVAPRASAD T K, SHETTY J. Sentiment analysis of product reviews: a review[C]//2017 International Conference on Inventive Communication and Computational Technologies (ICICCT). Coimbatore: IEEE, 2017: 298-301.
[14] 莫赞, 罗敏瑶. 在线评论对消费者购买决策的影响研究——基于评论可信度和信任倾向的中介、调节作用[J]. 广东工业大学学报, 2019, 36(2): 58-65
MO Z, LUO M Y. A research of the influence of online reviews on consumer purchase decision based on mediation and adjustment of reliability comments and trust tendency [J]. Journal of Guangdong University of Technology, 2019, 36(2): 58-65
[15] 张巍, 黄健华, 刘冬宁, 等. 一种改进的结合评分和评论信息的推荐方法[J]. 广东工业大学学报, 2017, 34(6): 31-35, 52
ZHANG W, HUANG J H, LIU D N, et al. An improved recommendation method combining scoring and comment information [J]. Journal of Guangdong University of Technology, 2017, 34(6): 31-35, 52
[16] ALLEN T T, SUI Z, PARKER N L. Timely decision analysis enabled by efficient social media modeling [J]. Decision Analysis, 2017, 14(4): 250-260
[17] KNOX G, EORGE, VAN OEST, RUTGER. Customer complaints and recovery effectiveness: a customer base approach [J]. Journal of Marketing A Quarterly Publication of the American Marketing Association, 2014, 78(5): 42-57
[18] SALEH M R, MARTÍN-VALDIVIA M T, M-ONTEJO-RÁEZ A, et al. Experiments with SVM to classify opinions in different domains [J]. Expert Systems with Applications, 2011, 38(12): 14799-14804
[19] WU K, LU B L, UCHIYAMA M, et al. A probabilistic approach to feature selection for multiclass text categorization[C]//International Symposium on Neural Networks. Heidelberg: Springer, 2007: 1310-1317.
[20] BIDI N, ELBERRICHI Z. Feature selection for text classification using genetic algorithms[C]//2016 8th International Conference on Modelling, Identification and Control (ICMIC). Algiers: IEEE, 2016: 806-810.
[21] WANG X, CAO J, LIU Y, et al. Text clustering based on the improved tfidf by the iterative algorithm[C]//2012 IEEE Symposium on Electrical & Electronics Engineering (EEESYM). Malaysia: IEEE, 2012: 140-143.
[22] LEE J, KIM D. Memetic feature selection algorithm for multilabel classification [J]. Information Sciences, 2015, 293(293): 80-96
[23] XUE B, ZHANG M, BROWNE W N, et al. A survey on evolutionary computation approaches to feature selection [J]. IEEE Transactions on Evolutionary Computation, 2015, 20(4): 606-626
[24] AL-JADIR I, WONG K W, FUNG C C, et al. Text dimensionality reduction for document clustering using hybrid memetic feature selection[C]//International Workshop on Multi-Disciplinary Trends in Artificial Intelligence. Cham: Springer, 2017: 281-289.
[25] KUMBHAR P, MALI M, ATIQUE M. A genetic-fuzzy approach for automatic text categorization[C]//2017 IEEE 7th International Advance Computing Conference (IACC). Hyderabad: IEEE, 2017: 572-578.
[26] WANG S, MANNING C D. Baselines and bigrams: Simple, good sentiment and topic classification[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Pennsylvania: Association for Computational Linguistics, 2012: 90-94.
[27] KUMAR H M, HARISH B S, DARSHAN H K, et al. Sentiment analysis on IMDB movie reviews using hybrid feature extraction method [J]. International Journal of Interactive Multimedia and Artificial Intelligence, 2019, 5(5): 109-114
[28] MAAS A L, DALY R E, PHAM P T, et al. Learning word vectors for sentiment analysis[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Portland: Association for Computational Linguistics, 2011: 142-150.
[29] ZHENG L, WANG H, GAO S. Sentimental feature selection for sentiment analysis of Chinese online reviews [J]. International Journal of Machine Learning and Cybernetics, 2018, 9(1): 75-84
[30] AYMEN ABU-ERRUB. Arabic text classification algorithm using TFIDF and Chi Square measurements [J]. International Journal of Computer Applications, 2014, 93(6): 40-45
[31] ISMAIL H M, BELKHOUCHE B, ZAKI N. Semantic Twitter sentiment analysis based on a fuzzy thesaurus [J]. Soft Computing, 2018, 22(18): 6011-6024
[1] Liu Hong-wei, Lin Wei-zhen, Wen Zhan-ming, Chen Yan-jun, Yi Min-qi. A MABM-based Model for Identifying Consumers' Sentiment Polarity―Taking Movie Reviews as an Example [J]. Journal of Guangdong University of Technology, 2022, 39(06): 1-9.
[2] Zhang Wei, Zhang Zhen-bin. Joint Graph Embedding and Feature Weighting for Unsupervised Feature Selection [J]. Journal of Guangdong University of Technology, 2021, 38(05): 16-23.
[3] Teng Shao-hua, Feng Zhen-ye, Teng Lu-yao, Fang Xiao-zhao. Joint Low-Rank Representation and Graph Embedding for Unsupervised Feature Selection [J]. Journal of Guangdong University of Technology, 2019, 36(05): 7-13.
[4] Chen Ping-hua, Huang Hui, Mai Miao, Zhou Hong-hong. Multi-label Feature Selection Algorithm Based on ReliefF and Mutual Information [J]. Journal of Guangdong University of Technology, 2018, 35(05): 20-25,50.
[5] Rao Dong-ning, Huang Si-hong. Model Optimization of Financial Corpus Sentiment Analysis Based on THUCTC [J]. Journal of Guangdong University of Technology, 2018, 35(03): 37-42.
[6] Chen Bing-feng, Hao Zhi-feng, Cai Rui-chu, Wen Wen, Wang Li-juan, Huang Hao, Cai Xiao-feng. A Fine-grained Sentiment Analysis Algorithm for Automotive Reviews [J]. Journal of Guangdong University of Technology, 2017, 34(03): 8-14.
[7] LIANG Li-Xin, HAO Zhi-Feng, CAI Rui-Chu, WEN Wen. An Approach to Sentiment Analysis of Chinese Microblogs Based on Gaussian Mixture Distribution Pseudo-sample Generation [J]. Journal of Guangdong University of Technology, 2016, 33(06): 85-90.
[8] HE Ke-da, ZHU Zheng-tao, CHENG Yu. A Research on Text Classification Method Based on Improved TF-IDF Algorithm [J]. Journal of Guangdong University of Technology, 2016, 33(05): 49-53.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!