Journal of Guangdong University of Technology ›› 2016, Vol. 33 ›› Issue (05): 49-53.doi: 10.3969/j.issn.1007-7162.2016.05.009

Previous Articles     Next Articles

A Research on Text Classification Method Based on Improved TF-IDF Algorithm

He Ke-da, Zhu Zheng-tao,Cheng Yu   

  1. School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2015-09-22 Online:2016-09-10 Published:2016-09-10

Abstract:

Establishing category keywords is the key problem in text classification, which should be solved first. On the basis of the classification of text by using the category keywords and TF-IDF algorithm, an improved TF-IDF algorithm has been proposed to overcome the shortcomings of the vector space model, which cannot well adjust the weights. Firstly, category keyword library should be established, and the expansion and duplication be carried out. The weight of keywords in the document is modified by the addition of the length of the document, and the shortage of the original features of the entry class distinction ability is solved effectively. By using Bayesian classification method, combined with the experiments, the effectiveness of the algorithm is verified, and the accuracy of text classification improved.

Key words: keyword extraction; feature selection; text classification; pretreatment

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!