Journal of Guangdong University of Technology ›› 2016, Vol. 33 ›› Issue (06): 85-90.doi: 10.3969/j.issn.1007-7162.2016.06.015
Previous Articles Next Articles
Liang Li-xin, Hao Zhi-feng, Cai Rui-chu, Wen Wen
Received:
Online:
Published:
Abstract:
Since informal words and expressions are widely used in miscroblogs, sentiment analysis of the microblogs is a difficult scientific problem, especially with the data in imbalanced sentiment distribution. GWCRF (Gaussian Mixture Distribution Word2vec CRF), a method based on pseudo-sample generation technique and Conditional Random Field (CRF) for sentiment analysis of microblogs in imbalance distribution is presented. In the proposed method, firstly, the Gaussian Mixture Distribution is leveraged to generate pseudo-samples, which can increase the samples of minor classes for balancing the train data sets. Secondly, Word2vec technology is leveraged to enrich the microblog message and overcome the problem that sentiment lexicon is not large enough. Moveover, the CRF model is proposed to apply in the above balanced and extended train data sets. Experimental results on the microblog data demonstrate that this method outperforms the state-of-art methods in sentiment analysis of the microblog data sets with imbalanced sentiment distribution.
Key words: sentiment analysis; Gaussian mixture distribution; conditional random field; sentiment; imbalance; Word2vec
LIANG Li-Xin, HAO Zhi-Feng, CAI Rui-Chu, WEN Wen. An Approach to Sentiment Analysis of Chinese Microblogs Based on Gaussian Mixture Distribution Pseudo-sample Generation[J].Journal of Guangdong University of Technology, 2016, 33(06): 85-90.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://xbzrb.gdut.edu.cn/EN/10.3969/j.issn.1007-7162.2016.06.015
https://xbzrb.gdut.edu.cn/EN/Y2016/V33/I06/85
Cited