广东工业大学学报 ›› 2012, Vol. 29 ›› Issue (4): 65-68.doi: 10.3969/j.issn.1007-7162.2012.04.013

• 综合研究 • 上一篇    下一篇

一种基于特征提取的二级文本分类方法

邹丽娜,凌捷   

  1. 广东工业大学 计算机学院,广东 广州 510006
  • 收稿日期:2012-02-17 出版日期:2012-12-25 发布日期:2012-12-25
  • 作者简介:邹丽娜(1987-),女,硕士研究生,主要研究方向为网络信息采集及处理.
  • 基金资助:

    广东省教育部产学研合作资助项目(2011A090200068);广东省自然科学基金资助项目(9151009001000043)

A Twolevel Text Classification Based on Feature Extraction

Zou Li-na, Ling Jie   

  1. School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2012-02-17 Online:2012-12-25 Published:2012-12-25

摘要: 提出了一种改进的基于特征提取的二级文本分类方法.通过提取出文本的特征项并计算其权重值,将文本表示成由特征项和权重值组成的向量,利用向量的夹角余弦计算二级分类模型下文本之间的相似度,可以更准确快速地定位海量信息.实验结果表明本文提出的分类方法的准确率优于传统的类中心分类法,提高了系统的适应性和分类能力.

关键词: 文本分类;特征提取;向量空间模型;KNN算法

Abstract: An improved twolevel text classification method is proposed, based on feature extraction. First, the characteristics of the text were extracted, and the weights were calculated. Then, the text was represented as a vector composed of characteristics and weight value. The vector angle cosine was used to calculate the similarity among the text so as to position the vast amount of information more accurately and rapidly. The experimental results show that the proposed classification method is superior to the existing center classification method in accuracy of classification, improving the adaptability and classification ability of the system.

Key words: text classification; feature extraction; vector space model; KNN algorithm

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!