Abstract:
Significant financial information can be retrieved from the vast amount of textual data provided in Chinese business accounting reports (annual reports). Nevertheless, due to the unstructured nature, this textual information usually is difficult to be obtained and analyzed via traditional computer and database techniques. To address this issue, a set of unified domain-specific ontology is presented, combined with Chinese Natural language processing (NLP), which transforms accounting reports in unstructured text into a structured XBRL-based form via three different dimensions, namely word attribute description, word relation organization, and related knowledge links respectively.