• •
陈嘉鸿, 黄国恒, 谭喆
Chen Jia-hong, Huang Guo-heng, Tan Zhe
摘要: 医学报告自动生成技术对辅助诊断起着重要作用,能够极大减轻医护工作者的工作量。随着深度学习在医学领域不断发展,医学报告自动生成技术已成为智慧医疗领域里的研究热点之一。目前,医学报告生成的主要挑战是图像中的病灶区域难以被模型捕捉,以及视觉和语言语义之间存在较大的语义鸿沟,其一致性问题仍没有很好地解决。因此,本文提出了跨模态差异注意力网络拉近不同模态之间的语义,该网络包括反向注意力模块和语义一致模块:反向注意力模块更全面探索医学图像中的重要区域;语义一致模块利用大语言模型的特征作为参考,引导视觉特征不断靠近参考文本特征,使得视觉语义更准确地转化成一致地语言语义。实验表明,跨模态差异注意力网络在IU X-Ray和MIMIC-CXR两个公开数据集上的表现均优于之前的模型,在BLEU4上的指标分数分别达到17.9%和10.9%,相比于基线模型,本文模型性能有较大的提高,证明了本文所提模型能生成准确和流畅的医学报告。
中图分类号:
[1] 李小雷. 人工智能在医学影像图像处理中的研究进展[J]. 中国医学计算机成像杂志, 2023, 29(4): 454-457. LI X L. Advances in the application of artificial intelligence in medical image processing [J]. Chinese Computed Medical Imaging, 2023, 29(4): 454-457. [2] 丛超. 基于多模态医学影像智能分析的深度学习算法研究与应用 [D]. 重庆: 中国人民解放军陆军军医大学, 2023. [3] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: A neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2015: 3156-3164. [4] XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]//International Conference on Machine Learning. San Diego: ACM , 2015: 2048-2057. [5] CHEN Z, SING Y, CHANG T H, et al. Generating radiology reports via memory-driven transformer[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg: ACL, 2020: 1439-1449. [6] JING B, XIE P, XING E. On the automatic generation of medical imaging reports[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . Stroudsburg: ACL, 2018: 2577-2586. [7] LI Y, LIANG X, HU Z, et al. Hybrid retrieval-generation reinforced agent for medical image report generation[C]//Advances in Neural Information Processing Systems. California: NIPS, 2018: 1537-1547. [8] LIU G, HSU T M H, MCDERMOTT M, et al. Clinically accurate chest x-ray report generation[C]//Machine Learning for Healthcare Conference. New York: Proceedings of Machine Learning Research (PMLR) , 2019: 249-269. [9] RENNIE S J, MARCHERET E, MROUEH Y, et al. Self-critical sequence training for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 7008-7024. [10] LU J, XIONG C, PARIKH D, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 375-383. [11] ANDERSON P, HE X, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 6077-6086. [12] FARHADI A, HEJRATI M, SADEGHI M A, et al. Every picture tells a story: generating sentences from images[C]//Computer Vision–ECCV 2010: 11th European Conference on Computer Vision. Berlin Heidelberg: Springer, 2010: 15-29. [13] MITCHELL M, DODGE J, GOYAL A, et al. Midge: generating image descriptions from computer vision detections[C]//Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2012: 747-756. [14] GONG Y, WANG L, HODOSH M, et al. Improving image-sentence embeddings using large weakly annotated photo collections[C]//Computer Vision–ECCV 2014: 13th European Conference. Berlin Heidelberg: Springer, 2014: 529-545. [15] GUPTA A, VERMA Y, JAWAHAR C. Choosing linguistics over vision to describe images[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2012, 26(1) : 606-612. [16] KRAUSE J, JOHNSON J, KRISHNA R, et al. A hierarchical approach for generating descriptive image paragraphs[C]//Proceedings of the IEEE Conference on Computer Vision and Ppattern Recognition. New York: IEEE, 2017: 317-325. [17] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. La Jolla: NIPS, 2017: 30. [18] YU J, LI J, YU Z, et al. Multimodal transformer with multi-view visual representation for image captioning [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(12): 4467-4480. [19] LI Y, LIANG X, HU Z, et al. Hybrid retrieval-generation reinforced agent for medical image report generation[J]. Advances in Neural Information Processing Systems. La Jolla: NIPS, 2018, 31. [20] WANG X, PENG Y, LU L, et al. Tienet: text-image embedding network for common thorax disease classification and reporting in chest x-rays[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 9049-9058. [21] YIN C, QIAN B, WEI J, et al. Automatic generation of medical imaging diagnostic report with hierarchical recurrent neural network[C]//2019 IEEE International Conference on Data Mining (ICDM) . New York: IEEE, 2019: 728-737. [22] LI C Y, LIANG X, HU Z, et al. Knowledge-driven encode, retrieve, paraphrase for medical image report generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2019, 33: 6666-6673. [23] ZHANG Y, WANG X, XU Z, et al. When radiology report generation meets knowledge graph[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2020, 34: 12910-12917. [24] YOU D, LIU F, GE S, et al. Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation[C]//Medical Image Computing and Computer Assisted Intervention-MICCAI 2021. Berlin Heidelberg: Springer, 2021: 72-82. [25] IRVIN J, RAJPURKAR P, KO M, et al. Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2019, 33(1) : 590-597. [26] ALSENTZER E, MURPHY J, BOAG W, et al. Publicly Available Clinical BERT Embeddings[C]//Proceedings of the 2nd Clinical Natural Language Processing Workshop. Stroudsburg: ACL, 2019: 72-78. [27] DEMNER-FUSHMAN D, KOHLI M D, ROSENMAN M B, et al. Preparing a collection of radiology examinations for distribution and retrieval [J]. Journal of the American Medical Informatics Association, 2016, 23: 304-310. [28] JOHNSON A E W, POLLARD T J, GREENBAUM N R, et al. MIMIC-CXR:a large publicly available database of labeled chest radiographs[J]. arXiv: 1901.07042 (2019-11-14) [2024-03-18].https://arxiv.org/abs/1901.07042. [29] JING B, WANG Z, XING E. Show, describe and conclude: on exploiting the structure information of chest x-ray reports[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 6570-6580. [30] LIU F, GE S, WU X. Competence-based multimodal curriculum learning for medical report generation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) . Stroudsburg: ACL, 2021: 3001-3012. [31] CHEN Z, SHEN Y, SONG Y, et al. Cross-modal Memory Networks for Radiology Report Generation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) . Stroudsburg: ACL, 2021: 5904-5914. [32] LIU F, WU X, GE S, et al. Exploring and distilling posterior and prior knowledge for radiology report generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2021: 13753-13762. [33] 谭立玮, 张淑军, 韩琪等. 面向医学影像报告生成的门归一化编解码网络[J]. 智能系统学报, 2024, 19: 411-419. TAN L W, ZHANG S J, HAN Q. Gate normalized encoder-decoder network for medical image report generation [J]. CAAI Transactions on Intelligent Systems, 2024, 19: 411-419. [34] WANG J, BHALERAO A, HE Y. Cross-modal prototype driven network for radiology report generation[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland. Berlin Heidelberg: Springer, 2022: 563-579. [35] PAPINENI K, ROUKOS S, WARD T, et al. Bleu: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Somerset: ACL, 2002: 311-318. [36] DENKOWSKI M, LAVIE A. Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems[C]//Proceedings of the Sixth Workshop on Statistical Machine Translation. Stroudsburg: ACL, 2011: 85-91. [37] LIN C Y. Rouge: a package for automatic evaluation of summaries[C]//Text Summarization Branches Out. Barcelona: ACL, 2004: 74-81. |
[1] | 李雪森, 谭北海, 余荣, 薛先斌. 基于YOLOv5的轻量化无人机航拍小目标检测算法[J]. 广东工业大学学报, 2024, 41(03): 71-80. |
[2] | 涂泽良, 程良伦, 黄国恒. 基于局部正交特征融合的小样本图像分类[J]. 广东工业大学学报, 2024, 41(02): 73-83. |
[3] | 杨镇雄, 谭台哲. 基于生成对抗网络的低光照图像增强算法[J]. 广东工业大学学报, 2024, 41(01): 55-62. |
[4] | 曾安, 赖峻浩, 杨宝瑶, 潘丹. 基于多尺度卷积和注意力机制的病理图像分割网络[J]. 广东工业大学学报, 2024, 41(0): 0-. |
[5] | 赖志茂, 章云, 李东. 基于Transformer的人脸深度伪造检测技术综述[J]. 广东工业大学学报, 2023, 40(06): 155-167. |
[6] | 曾安, 陈旭宙, 姬玉柱, 潘丹, 徐小维. 基于自注意力和三维卷积的心脏多类分割方法[J]. 广东工业大学学报, 2023, 40(06): 168-175. |
[7] | 吴亚迪, 陈平华. 基于用户长短期偏好和音乐情感注意力的音乐推荐模型[J]. 广东工业大学学报, 2023, 40(04): 37-44. |
[8] | 曹智雄, 吴晓鸰, 骆晓伟, 凌捷. 融合迁移学习与YOLOv5的安全帽佩戴检测算法[J]. 广东工业大学学报, 2023, 40(04): 67-76. |
[9] | 赖东升, 冯开平, 罗立宏. 基于多特征融合的表情识别算法[J]. 广东工业大学学报, 2023, 40(03): 10-16. |
[10] | 吴俊贤, 何元烈. 基于通道注意力的自监督深度估计方法[J]. 广东工业大学学报, 2023, 40(02): 22-29. |
[11] | 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9. |
[12] | 滕少华, 董谱, 张巍. 融合语义结构的注意力文本摘要模型[J]. 广东工业大学学报, 2021, 38(03): 1-8. |
[13] | 梁观术, 曹江中, 戴青云, 黄云飞. 一种基于注意力机制的无监督商标检索方法[J]. 广东工业大学学报, 2020, 37(06): 41-49. |
[14] | 曾碧卿, 韩旭丽, 王盛玉, 徐如阳, 周武. 基于双注意力卷积神经网络模型的情感分析研究[J]. 广东工业大学学报, 2019, 36(04): 10-17. |
[15] | 高俊艳, 刘文印, 杨振国. 结合注意力与特征融合的目标跟踪[J]. 广东工业大学学报, 2019, 36(04): 18-23. |
|