Journal of Guangdong University of Technology ›› 2016, Vol. 33 ›› Issue (05): 49-53.doi: 10.3969/j.issn.1007-7162.2016.05.009

Previous Articles     Next Articles

A Research on Text Classification Method Based on Improved TF-IDF Algorithm

He Ke-da, Zhu Zheng-tao,Cheng Yu   

  1. School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2015-09-22 Online:2016-09-10 Published:2016-09-10

Abstract:

Establishing category keywords is the key problem in text classification, which should be solved first. On the basis of the classification of text by using the category keywords and TF-IDF algorithm, an improved TF-IDF algorithm has been proposed to overcome the shortcomings of the vector space model, which cannot well adjust the weights. Firstly, category keyword library should be established, and the expansion and duplication be carried out. The weight of keywords in the document is modified by the addition of the length of the document, and the shortage of the original features of the entry class distinction ability is solved effectively. By using Bayesian classification method, combined with the experiments, the effectiveness of the algorithm is verified, and the accuracy of text classification improved.

Key words: keyword extraction; feature selection; text classification; pretreatment

No related articles found!
Viewed
Full text
3468
HTML PDF
Just accepted Online first Issue Just accepted Online first Issue
0 0 0 0 0 3468

  From Others local
  Times 455 3013
  Rate 13% 87%

Abstract
481
Just accepted Online first Issue
0 0 481
  From Others local
  Times 158 323
  Rate 33% 67%

Cited

Web of Science  Crossref   ScienceDirect  Search for Citations in Google Scholar >>
 
This page requires you have already subscribed to WoS.
  Shared   
  Discussed   
No Suggested Reading articles found!