Journal of Guangdong University of Technology ›› 2016, Vol. 33 ›› Issue (05): 49-53.doi: 10.3969/j.issn.1007-7162.2016.05.009
Previous Articles Next Articles
He Ke-da, Zhu Zheng-tao,Cheng Yu
Received:
Online:
Published:
Abstract:
Establishing category keywords is the key problem in text classification, which should be solved first. On the basis of the classification of text by using the category keywords and TF-IDF algorithm, an improved TF-IDF algorithm has been proposed to overcome the shortcomings of the vector space model, which cannot well adjust the weights. Firstly, category keyword library should be established, and the expansion and duplication be carried out. The weight of keywords in the document is modified by the addition of the length of the document, and the shortage of the original features of the entry class distinction ability is solved effectively. By using Bayesian classification method, combined with the experiments, the effectiveness of the algorithm is verified, and the accuracy of text classification improved.
Key words: keyword extraction; feature selection; text classification; pretreatment
HE Ke-da, ZHU Zheng-tao, CHENG Yu. A Research on Text Classification Method Based on Improved TF-IDF Algorithm[J].Journal of Guangdong University of Technology, 2016, 33(05): 49-53.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://xbzrb.gdut.edu.cn/EN/10.3969/j.issn.1007-7162.2016.05.009
https://xbzrb.gdut.edu.cn/EN/Y2016/V33/I05/49
Cited