广东工业大学学报 ›› 2018, Vol. 35 ›› Issue (03): 37-42.doi: 10.12052/gdutxb.180016
饶东宁, 黄思宏
Rao Dong-ning, Huang Si-hong
摘要: 近几年,情感分析技术引起人们的兴趣,在金融应用上,可以作为投资者投资前的参考.但是现有方法存在应用过于专一、数据偏差、结果过于笼统和不够精确的问题.因此本文优化一个通用的中文文本分类器,用于对在线评论数据和股票新闻数据进行情感分析.收集整理了2万条数据作为语料库,每条数据分别由3个人进行独立标注.之后对THUCTC进行优化,具体从3个方面对中文文本分类器进行优化,首先是词语切分,使用词干词典方法结合不同的分词法,实验比较后得到二分法为最好的结果;其次,为分类器选择最好的内核,发现Liblinear内核对即时性要求较高的投资人更好,另一方面Libsvm在提高准确率方面更有优势;最后在金融导向的情绪字典方面,它由Chi-square和TF-IDF方法构建,可用在普通文本分类器上.通过这种方式,本文的结果可以被推广且不会失去准确性.
中图分类号:
[1] SAIF H, HE Y, FERNANDEZ M, et al. Contextual semantics for sentiment analysis of Twitter[J]. Information Processing & Management, 2015, 52(1):5-19. [2] QIN Z, CONG Y, WAN T. Topic modeling of Chinese language beyond a bag-of-words[J]. Computer Speech & Language, 2016, 40:60-78. [3] QIAN Q, HUANG M, LEI J, et al. Linguistically regularized LSTMs for Sentiment Classification[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Canada:ACL, 2017:1679-1689. [4] LI J, SUN M S. Scalable term selection for text categorization[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.Prague:EMNLP-CoNLL, 2007:774-782. [5] LI J Y, SUN M S, et al. A comparison and semi-quantitative analysis of words and character-bigrams as features in Chinese text categorization[C]//Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Sydney:ACL, 2006:17-21. [6] 时永宾, 余青松. 基于共现词卡方值的关键词提取算法[J]. 计算机工程, 2016, 42(6):191-195.SHI Y B, YU Q S. Key words extraction algorithm based on Chi-square value of co-concurrence words[J]. Computer Engineering, 2016, 42(6):191-195. [7] ZHU J, WANG H, ZHU M, et al. Aspect-based opinion polling from customer reviews[J]. IEEE Transactions on Affective Computing, 2011, 2(1):37-49. [8] 李心丹. 行为金融理论:研究体系及展望[J]. 金融研究, 2005,(1):175-190.LI X D. Behavioral finance theory:Research system and prospects[J]. Journal of Financial Research, 2005,(1):175-190. [9] BEKAERT G, EHRMANN M, FRATZSCHER M, et al. The global crisis and equity market contagion[J]. Journal of Finance, 2014, 69(6):2597-2649. [10] GIANNETTI M, WANG T Y. Corporate scandals and household stock market participation[J]. Social Science Electronic Publishing, 2016, 71(6):2591-2636. [11] AVDIS E. Information tradeoffs in dynamic financial markets[J]. Journal of Financial Economics, 2016, 122(3):568-584. [12] EDELEN R M, INCE O S, KADLEC G B. Institutional investors and stock return anomalies[J]. Journal of Financial Economics, 2016, 119(3):472-488. [13] CHANG T Y, HARTZMARK S M, SOLOMON D H, et al. Being surprised by the unsurprising:earnings seasonality and stock returns[J]. Social Science Electronic Publishing, 2016, 30(8):281-323. [14] RUAN X, WILSON S, MIHALCEA R. Finding optimists and pessimists on Twitter[C]//Meeting of the Association for Computational Linguistics. Berlin:ACL, 2016:320-325. [15] 张对. 网络股评影响股市走势吗——基于股票情感分析的视角[J]. 现代经济信息, 2015,(1):355-357.ZHANG D. Internet stock analysts do affect the stock market trend_stock-based sentiment analysis perspective[J]. Modern Economic Information, 2015,(1):355-357. [16] 江腾蛟, 万常选, 刘德喜, 等. 基于语义分析的评价对象-情感词对抽取[J]. 计算机学报, 2017, 40(3):617-633.JIANG T J, WANG C X, LIU D X, et al. Extracting target-opinion pairs based on semantic analysis[J]. Chinese Journal of Computers, 2017, 40(3):617-633. [17] 饶东宁, 温远丽, 魏来, 等. 基于Spark平台的社交网络在不同文化环境中的中心度加权算法[J]. 广东工业大学学报, 2017, 34(3):15-20.RAO D N, WEN Y L, WEI L, et al. A weighted centrality algorithm for social networks based on Spark platform in different cultural environments[J]. Journal of Guangdong University of Technology, 2017, 34(3):15-20. [18] 林穗, 赵菲. 基于Spark的线性模型在广告投放系统中的应用研究[J]. 广东工业大学学报, 2016, 33(5):28-33.LIN S, ZHAO F. An application research of linear model in the advertising system based on Spark[J]. Journal of Guangdong University of Technology, 2016, 33(5):28-33. [19] 王洪伟, 郑丽娟, 刘仲英, 等. 中文网络评论的情感特征项选择研究[J]. 信息系统学报, 2012,(1):76-86.WANG H W, ZHENG L J, LIU Z Y, et al. Emotional feature selection of Chinese web comments[J]. China Journal of Information Systems, 2012,(1):76-86. [20] CATAL C, GULDAN S. Product review management software based on multiple classifiers[J]. Iet Software, 2017, 11(3):89-92. |
[1] | 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9. |
[2] | 谭有新, 滕少华. 短文本特征的组合加权方法[J]. 广东工业大学学报, 2020, 37(05): 51-61. |
[3] | 陈炳丰, 郝志峰, 蔡瑞初, 温雯, 王丽娟, 黄浩, 蔡晓凤. 面向汽车评论的细粒度情感分析方法研究[J]. 广东工业大学学报, 2017, 34(03): 8-14. |
[4] | 梁礼欣, 郝志峰, 蔡瑞初, 温雯. 基于混合高斯分布伪样本生成的情感分析方法[J]. 广东工业大学学报, 2016, 33(06): 85-90. |
[5] | 贺科达, 朱铮涛, 程昱. 基于改进TF-IDF算法的文本分类方法研究[J]. 广东工业大学学报, 2016, 33(05): 49-53. |
[6] | 邹丽娜,凌捷. 一种基于特征提取的二级文本分类方法[J]. 广东工业大学学报, 2012, 29(4): 65-68. |
|