基于规则的中文时间表达式识别与规范化
Recognition and Normalization of Chinese Time Expressions Based on Rules
-
摘要: 为了解决文本时间表达式的识别与规范化问题, 针对时间表达式在形式上的多样性与非结构化, 提出了对时态元素进行刻画的思想, 划分时间表达式类别及规范形式;在此基础上, 采用正则表达式与Trie树结构相结合的方式构建出时间短语识别树, 自动进行中文时间表达式的识别与分类;最后, 提出规范化算法与修正算法处理识别后的结果, 得到规范化形式. 以中文语料进行实验, 中文表达式识别与规范化工作达到较好的效果.Abstract: Concerning the problem with the recognition and normalization of time expressions in texts, aiming at the diversity and unstructured forms of time expressions, it proposed the idea of describing temporal elements to divide the types of time expressions and their forms of normalization. With the method that combined regular expressions with Trie tree structure, it built the recognition tree of time expressions, which could recognize time expressions automatically. Finally, it proposed the normalization algorithm and correction algorithm to deal with the recognized results. The results are pretty good.
下载: