广东工业大学学报 ›› 2016, Vol. 33 ›› Issue (06): 85-90.doi: 10.3969/j.issn.1007-7162.2016.06.015

• 综合研究 • 上一篇    下一篇

基于混合高斯分布伪样本生成的情感分析方法

梁礼欣, 郝志峰, 蔡瑞初, 温雯   

  1. 广东工业大学 计算机学院,广东 广州 510006
  • 收稿日期:2016-03-23 出版日期:2016-11-18 发布日期:2016-11-18
  • 作者简介:梁礼欣(1990-),男,硕士研究生,主要研究方向为文本情感分析、数据挖掘.
  • 基金资助:

    国家自然科学基金资助项目(61472089,61572143)

An Approach to Sentiment Analysis of Chinese Microblogs Based on Gaussian Mixture Distribution Pseudo-sample Generation

Liang Li-xin, Hao Zhi-feng, Cai Rui-chu, Wen Wen   

  1. School of Computers, Guangdong University of Technology, Guangzhou, 510006
  • Received:2016-03-23 Online:2016-11-18 Published:2016-11-18

摘要:

针对微博行文自由性大,情感倾向识别困难的问题,提出了一种基于混合高斯分布伪样本生成技术和条件随机场模型的新方法。该方法首先利用混合高斯分布模型来为训练集中的少数类生成伪样本从而构建一个情感倾向分布平衡的训练集,然后通过使用Word2vec来扩展微博句子以丰富它的情感信息,从而缓解情感词典不足够大对情感分类的负面影响;最后将条件随机场模型应用在上面已经平衡和扩展后的训练集上.实验结果表明该方法比现有方法在数据集情感倾向分布不平衡时能更有效地识别微博的情感倾向.

关键词: 情感分析; 混合高斯分布; 条件随机场; 情感倾向; 不平衡性;Word2vec

Abstract:

 Since informal words and expressions are widely used in miscroblogs, sentiment analysis of the microblogs is a difficult scientific problem, especially with the data in imbalanced sentiment distribution. GWCRF (Gaussian Mixture Distribution Word2vec CRF), a method based on pseudo-sample generation technique and Conditional Random Field (CRF) for sentiment analysis of microblogs in imbalance distribution is presented. In the proposed method, firstly, the Gaussian Mixture Distribution is leveraged to generate pseudo-samples, which can increase the samples of minor classes for balancing the train data sets. Secondly, Word2vec technology is leveraged to enrich the microblog message and overcome the problem that sentiment lexicon is not large enough. Moveover, the CRF model is proposed to apply in the above balanced and extended train data sets. Experimental results on the microblog data demonstrate that this method outperforms the state-of-art methods in sentiment analysis of the microblog data sets with imbalanced sentiment distribution.

Key words: sentiment analysis; Gaussian mixture distribution; conditional random field; sentiment; imbalance; Word2vec

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!