Journal of Guangdong University of Technology ›› 2018, Vol. 35 ›› Issue (03): 37-42.doi: 10.12052/gdutxb.180016

Previous Articles     Next Articles

Model Optimization of Financial Corpus Sentiment Analysis Based on THUCTC

Rao Dong-ning, Huang Si-hong   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2018-01-01 Online:2018-05-09 Published:2018-04-26

Abstract: Sentiment analysis has attracted interest recently. In financial applications, it can be a reference for investors. However, existing approaches are either so specific as to cause data drift or too general to be precise. Therefore, a general Chinese text classifier for online reviews and news on stocks is optimized. A corpus with 20000 items is first collected. Then, each item is labeled by three persons as ground truth. After that, the THUCTC is optimized, thus optimizing a general Chinese text classifier in three aspects. First, by tokenization, the THUCTC is modified to a 2-gram with a stemming dictionary method and got better results. Second, the best kernel is selected for classifier. The Liblinear kernel is found to be better for people pressed for time. On the other hand, the Libsvm kernel is good at promoting accuracy. Third, a finance-oriented sentiment dictionary is set based on Chi-square and TF-IDF approach. It can be used by on-the-shelf general text classifiers. In this way, the result can be generalized without the loss of preciseness.

Key words: sentiment analysis, text categorization, stock price trend prediction, Chinese word segmentation

CLC Number: 

  • TP181
[1] SAIF H, HE Y, FERNANDEZ M, et al. Contextual semantics for sentiment analysis of Twitter[J]. Information Processing & Management, 2015, 52(1):5-19.
[2] QIN Z, CONG Y, WAN T. Topic modeling of Chinese language beyond a bag-of-words[J]. Computer Speech & Language, 2016, 40:60-78.
[3] QIAN Q, HUANG M, LEI J, et al. Linguistically regularized LSTMs for Sentiment Classification[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Canada:ACL, 2017:1679-1689.
[4] LI J, SUN M S. Scalable term selection for text categorization[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.Prague:EMNLP-CoNLL, 2007:774-782.
[5] LI J Y, SUN M S, et al. A comparison and semi-quantitative analysis of words and character-bigrams as features in Chinese text categorization[C]//Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Sydney:ACL, 2006:17-21.
[6] 时永宾, 余青松. 基于共现词卡方值的关键词提取算法[J]. 计算机工程, 2016, 42(6):191-195.SHI Y B, YU Q S. Key words extraction algorithm based on Chi-square value of co-concurrence words[J]. Computer Engineering, 2016, 42(6):191-195.
[7] ZHU J, WANG H, ZHU M, et al. Aspect-based opinion polling from customer reviews[J]. IEEE Transactions on Affective Computing, 2011, 2(1):37-49.
[8] 李心丹. 行为金融理论:研究体系及展望[J]. 金融研究, 2005,(1):175-190.LI X D. Behavioral finance theory:Research system and prospects[J]. Journal of Financial Research, 2005,(1):175-190.
[9] BEKAERT G, EHRMANN M, FRATZSCHER M, et al. The global crisis and equity market contagion[J]. Journal of Finance, 2014, 69(6):2597-2649.
[10] GIANNETTI M, WANG T Y. Corporate scandals and household stock market participation[J]. Social Science Electronic Publishing, 2016, 71(6):2591-2636.
[11] AVDIS E. Information tradeoffs in dynamic financial markets[J]. Journal of Financial Economics, 2016, 122(3):568-584.
[12] EDELEN R M, INCE O S, KADLEC G B. Institutional investors and stock return anomalies[J]. Journal of Financial Economics, 2016, 119(3):472-488.
[13] CHANG T Y, HARTZMARK S M, SOLOMON D H, et al. Being surprised by the unsurprising:earnings seasonality and stock returns[J]. Social Science Electronic Publishing, 2016, 30(8):281-323.
[14] RUAN X, WILSON S, MIHALCEA R. Finding optimists and pessimists on Twitter[C]//Meeting of the Association for Computational Linguistics. Berlin:ACL, 2016:320-325.
[15] 张对. 网络股评影响股市走势吗——基于股票情感分析的视角[J]. 现代经济信息, 2015,(1):355-357.ZHANG D. Internet stock analysts do affect the stock market trend_stock-based sentiment analysis perspective[J]. Modern Economic Information, 2015,(1):355-357.
[16] 江腾蛟, 万常选, 刘德喜, 等. 基于语义分析的评价对象-情感词对抽取[J]. 计算机学报, 2017, 40(3):617-633.JIANG T J, WANG C X, LIU D X, et al. Extracting target-opinion pairs based on semantic analysis[J]. Chinese Journal of Computers, 2017, 40(3):617-633.
[17] 饶东宁, 温远丽, 魏来, 等. 基于Spark平台的社交网络在不同文化环境中的中心度加权算法[J]. 广东工业大学学报, 2017, 34(3):15-20.RAO D N, WEN Y L, WEI L, et al. A weighted centrality algorithm for social networks based on Spark platform in different cultural environments[J]. Journal of Guangdong University of Technology, 2017, 34(3):15-20.
[18] 林穗, 赵菲. 基于Spark的线性模型在广告投放系统中的应用研究[J]. 广东工业大学学报, 2016, 33(5):28-33.LIN S, ZHAO F. An application research of linear model in the advertising system based on Spark[J]. Journal of Guangdong University of Technology, 2016, 33(5):28-33.
[19] 王洪伟, 郑丽娟, 刘仲英, 等. 中文网络评论的情感特征项选择研究[J]. 信息系统学报, 2012,(1):76-86.WANG H W, ZHENG L J, LIU Z Y, et al. Emotional feature selection of Chinese web comments[J]. China Journal of Information Systems, 2012,(1):76-86.
[20] CATAL C, GULDAN S. Product review management software based on multiple classifiers[J]. Iet Software, 2017, 11(3):89-92.
[1] Liu Hong-wei, Lin Wei-zhen, Wen Zhan-ming, Chen Yan-jun, Yi Min-qi. A MABM-based Model for Identifying Consumers' Sentiment Polarity―Taking Movie Reviews as an Example [J]. Journal of Guangdong University of Technology, 2022, 39(06): 1-9.
[2] Tan You-xin, Teng Shao-hua. Combined Weighting Method for Short Text Features [J]. Journal of Guangdong University of Technology, 2020, 37(05): 51-61.
[3] Chen Bing-feng, Hao Zhi-feng, Cai Rui-chu, Wen Wen, Wang Li-juan, Huang Hao, Cai Xiao-feng. A Fine-grained Sentiment Analysis Algorithm for Automotive Reviews [J]. Journal of Guangdong University of Technology, 2017, 34(03): 8-14.
[4] LIANG Li-Xin, HAO Zhi-Feng, CAI Rui-Chu, WEN Wen. An Approach to Sentiment Analysis of Chinese Microblogs Based on Gaussian Mixture Distribution Pseudo-sample Generation [J]. Journal of Guangdong University of Technology, 2016, 33(06): 85-90.
[5] YANG Ye. Web Instructional Resources Mining and Text Categorization System [J]. Journal of Guangdong University of Technology, 2005, 22(2): 79-82.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!