广东工业大学学报 ›› 2017, Vol. 34 ›› Issue (03): 89-95.doi: 10.12052/gdutxb.170029

• 大数据基础理论与应用专题 • 上一篇    下一篇

基于领域本体的网络财务报告文本信息抽取研究

梁倬骞1,3, 王东2, 朱慧2, 潘定1   

  1. 1. 暨南大学 管理学院, 广东 广州 510632;
    2. 广州大学 工商管理学院, 广东 广州 510006;
    3. 暨南大学 信息学院, 广东 广州 510632
  • 收稿日期:2017-02-17 出版日期:2017-05-09 发布日期:2017-05-09
  • 通信作者: 王东(1984-),男,讲师,博士,主要研究方向为信息管理与信息系统,E-mail:wangdong@gzhu.edu.cn E-mail:wangdong@gzhu.edu.cn
  • 作者简介:梁倬骞(1984-),男,实验师,博士研究生,主要研究方向为信息管理与信息系统.
  • 基金资助:

    国家自然科学基金资助项目(71171097,71671048);中央高校基本科研业务费专项资金资助项目(15JNLH005);广东省自然科学基金资助项目(2015A030310506)

A Research on Text Information Extraction from Annual Report Based on Domain Ontology

Liang Zhuo-qian1,3, Wang Dong2, Zhu Hui2, Pan Ding1   

  1. 1. School of Management, Jinan University, Guangzhou 510632, China;
    2. School of Business Administration, Guangzhou 510006, China;
    3. School of Information, Jinan University, Guangzhou 510632, China
  • Received:2017-02-17 Online:2017-05-09 Published:2017-05-09

摘要:

企业财务报告中存在大量蕴含着许多重要财务信息的非结构化文本信息.这类信息难以被计算机识别、分析和处理,也难以通过数据库技术进行管理.本文结合本体相关理论和自然语言处理(Natural Language Processing,NLP)技术,从词语属性描述、词语关系组织和相关知识链接3个维度构建财务报告领域本体,利用NLP工具对中文财务报告中的文本信息进行处理,将非结构化文本信息转化为结构化信息并使用XBRL表示,在一定程度上实现了文本信息的数据库存储与计算机分析处理.

关键词: 可扩展商业报告语言, 领域本体, 财务报告

Abstract:

Significant financial information can be retrieved from the vast amount of textual data provided in Chinese business accounting reports (annual reports). Nevertheless, due to the unstructured nature, this textual information usually is difficult to be obtained and analyzed via traditional computer and database techniques. To address this issue, a set of unified domain-specific ontology is presented, combined with Chinese Natural language processing (NLP), which transforms accounting reports in unstructured text into a structured XBRL-based form via three different dimensions, namely word attribute description, word relation organization, and related knowledge links respectively.

Key words: extensible business reporting language(XBRL), domain ontology, financial report

中图分类号: 

  • TP391

[1] LI H Q, ZHAI J. Literature review of XBRL semantic research[C]//. 2015 International Conference on Computer Science and Intelligent Communication. HK:Atlantis, 2015:316-320.
[2] LI M J, ZHOU Z H, DU M J. Detection and resolution of structural conflictions in heterogeneous XBRL taxonomies[C]//. The 5th International Conference on New Trends in Information Science and Service Science. HI:IEEE, 2011:312-317.
[3] LI M J, ZHOU Z H, DU M J. XBRL in the Chinese financial ecosystem[J]. IT professional, 2013, 15(6):36-42.
[4] 李吉梅, 杜美杰. 基于XBRL的异构财务信息集成算法[J]. 吉林大学学报(工学版), 42(S1):266-270. LI J M, DU M J. Information integration algorithm of heterogeneous XBRL financial reporting[J]. Journal of Jilin University:Engineering and Technology Edition, 2012, 42(S1):266-270.
[5] PAN D, PAN Y S. Incorprating XBRL into business intelligence applications based on formal semantics[C]//2011 China Academic Accounting Association Annual Meeting. XM:Elsevier, 2011:1758-1765.
[6] 冯志伟. 现代术语学引论(增订本)[M]. 北京:商务印书馆, 2011. 12-195.
[7] 杨周南, 朱建国, 刘锋. XBRL分类标准认证的理论基础和方法学体系研究[J]. 会计研究, 2010, 1(11):10-15 YANG Z N, ZHU J G, LIU F. Research on the theory basis and methodology system of xbrl taxonomy recognition[J]. Accounting Research. 2010, 1(11):10-15.
[8] DEBRECENY R, FELDEN C, OCHOCKI B, et al. XBRL for interactive data[M]. NY:Springer, 2009. 189-211.
[9] LARA R, CANTADOR I, CASTELLS P. XBRL taxonomies and OWL ontologies for investment funds[C]//In the 1st International Workshop on Ontologizing Industrial Standards at the 25th International Conference on Conceptual Modelling. AZ:Springer, 2006:271-280.
[10] BAO J, RONG G, LI X, et al. Representing financial reports on the semantic web:a faithful translation from XBRL to OWL[C]//International Workshop on Rules and Rule Markup Languages for the Semantic Web. DC:Springer, 2010:144-152.
[11] HUANG M, WANG D, WANG K. Ontology-based semantic retrieval of XBRL data[C]//2011 International Conference on Business Computing and Global Informatization, SH:IEEE, 2011:363-366.
[12] ZHU H. Semantic integration approach to efficient business data supply chain:integration approach to interoperable XBRL[EB/OL]. (2007-10-01)[2016-04-01]. http://web.mit.edu/smadnick/www/wp/2007-10.pdf
[13] ROMILLA C, YOON VY, REDMOND RT, et al. Ontology based integration of XBRL filings for financial decision making[J]. Decision Support Systems, 2014, 1(68):64-76.
[14] GARCIA R, GIL R. Publishing XBRL as linked open data[C]//In Proceedings of World Wide Web Workshop:Linked Data on the Web, Madrid:CEUR-WS, 2009:538
[15] KAMPGEN B, WELLER T, O'RIAIN S. Accepting the XBRL challenge with linked data for financial data integration[J]. Lecture Notes in Computer Science, 2014, 1(8465):595-610
[16] 吴忠生, 张天西, 陈志德. 基于领域本体的XBRL财务报告转换研究[J]. 计算机应用研究, 2013, 1(30):3643-3646 WU Z S, ZHANG T X, CHEN Z D. Research on conversion between XBRL financial reports based on domain ontology[J]. Application Research of Computers. 2013, 1(30):3643-3646.
[17] ANTONINA K, CAMILLA M, BARBRO B. Mining textual contents of financial reports[J]. The International Journal of Digital Accounting Research, 2004, 4(7):1-29
[18] MENDEZ NUNEZ S, TRIVIO G. Combining semantic web technologies and computational theory of perceptions for text generation in financial analysis[C]//2010 IEEE International Conference on Fuzzy Systems. Barcelona:IEEE, 2012:1-8.
[19] GRUBER T R. Toward principles for the design of ontologies used for knowledge sharing[J]. International journal of human-computer studies, 1995, 1(43):907-928.
[20] 李群. 非寿险业务的会计核算[J]. 财务与会计. 2009, 1(5):20-26. LI Q. Accounting for non-life insurance business[J]. Financial and Accounting. 2009, 1(5):20-26.
[21] 黄蓉, 徐璐璐. 公司关联交易文献评述[J]. 广东工业大学学报, 2016, 33(06):102-106. HUANG RONG, XU LU-LU. Summarization of Related Party Transactions in Listed Company. JOURNAL OF GUANGDONG UNIVERSITY OF TECHNOLOGY, 2016, 33(06):102-106.

[1] 谢国波, 林立, 林志毅, 贺笛轩, 文刚. 基于YOLOv4-MP的绝缘子爆裂缺陷检测方法[J]. 广东工业大学学报, 2023, 40(02): 15-21.
[2] 陈靖宇, 吕毅. 基于脉冲神经网络的冷链制冷机结霜检测方法[J]. 广东工业大学学报, 2023, 40(01): 29-38.
[3] 叶文权, 李斯, 凌捷. 基于多级残差U-Net的稀疏SPECT图像重建[J]. 广东工业大学学报, 2023, 40(01): 61-67.
[4] 邹恒, 高军礼, 张树文, 宋海涛. 围棋机器人落子指引装置的设计与实现[J]. 广东工业大学学报, 2023, 40(01): 77-82,91.
[5] 谢光强, 许浩然, 李杨, 陈广福. 基于多智能体强化学习的社交网络舆情增强一致性方法[J]. 广东工业大学学报, 2022, 39(06): 36-43.
[6] 刘信宏, 苏成悦, 陈静, 徐胜, 罗文骏, 李艺洪, 刘拔. 高分辨率桥梁裂缝图像实时检测[J]. 广东工业大学学报, 2022, 39(06): 73-79.
[7] 熊武, 刘义. 粒子滤波算法在BDS高铁铁轨静态形变监测中的应用研究[J]. 广东工业大学学报, 2022, 39(04): 66-72.
[8] 易闽琦, 刘洪伟, 高鸿铭. 电商平台产品共同购买网络的影响因素研究[J]. 广东工业大学学报, 2022, 39(03): 16-24.
[9] 丘展春, 费伦科, 滕少华, 张巍. 余弦相似度保持的掌纹识别算法[J]. 广东工业大学学报, 2022, 39(03): 55-62.
[10] 郑佳碧, 杨振国, 刘文印. 基于细粒度混杂平衡的营销效果评估方法[J]. 广东工业大学学报, 2022, 39(02): 55-61.
[11] Gary Yen, 栗波, 谢胜利. 地球流体动力学模型恢复的长短期记忆网络渐进优化方法[J]. 广东工业大学学报, 2021, 38(06): 1-8.
[12] 李光程, 赵庆林, 谢侃. 去中心化的数据处理方案设计[J]. 广东工业大学学报, 2021, 38(06): 77-83.
[13] 谢光强, 赵俊伟, 李杨, 许浩然. 基于多集群系统的车辆协同换道控制[J]. 广东工业大学学报, 2021, 38(05): 1-9.
[14] 张巍, 张圳彬. 联合图嵌入与特征加权的无监督特征选择[J]. 广东工业大学学报, 2021, 38(05): 16-23.
[15] 邓杰航, 袁仲鸣, 林好润, 顾国生. 协同超像素和视觉显著性的图像质量评价[J]. 广东工业大学学报, 2021, 38(05): 33-39.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!