Journal of Guangdong University of Technology ›› 2016, Vol. 33 ›› Issue (06): 85-90.doi: 10.3969/j.issn.1007-7162.2016.06.015

Previous Articles     Next Articles

An Approach to Sentiment Analysis of Chinese Microblogs Based on Gaussian Mixture Distribution Pseudo-sample Generation

Liang Li-xin, Hao Zhi-feng, Cai Rui-chu, Wen Wen   

  1. School of Computers, Guangdong University of Technology, Guangzhou, 510006
  • Received:2016-03-23 Online:2016-11-18 Published:2016-11-18

Abstract:

 Since informal words and expressions are widely used in miscroblogs, sentiment analysis of the microblogs is a difficult scientific problem, especially with the data in imbalanced sentiment distribution. GWCRF (Gaussian Mixture Distribution Word2vec CRF), a method based on pseudo-sample generation technique and Conditional Random Field (CRF) for sentiment analysis of microblogs in imbalance distribution is presented. In the proposed method, firstly, the Gaussian Mixture Distribution is leveraged to generate pseudo-samples, which can increase the samples of minor classes for balancing the train data sets. Secondly, Word2vec technology is leveraged to enrich the microblog message and overcome the problem that sentiment lexicon is not large enough. Moveover, the CRF model is proposed to apply in the above balanced and extended train data sets. Experimental results on the microblog data demonstrate that this method outperforms the state-of-art methods in sentiment analysis of the microblog data sets with imbalanced sentiment distribution.

Key words: sentiment analysis; Gaussian mixture distribution; conditional random field; sentiment; imbalance; Word2vec

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!