基于跨模态差异注意力的医学报告生成

陈嘉鸿; 黄国恒; 谭喆

doi:10.12052/gdutxb.240002

基于跨模态差异注意力的医学报告生成

Cross-modal Discrepancy Attention Network for Medical Report Generation

摘要

摘要: 医学报告自动生成技术对辅助诊断起着重要作用，能够极大减轻医护工作者的工作量。随着深度学习在医学领域不断发展，医学报告自动生成技术已成为智慧医疗领域里的研究热点之一。目前，医学报告生成的主要挑战是图像中的病灶区域难以被模型捕捉，以及视觉和语言语义之间存在较大的语义鸿沟，其一致性问题仍没有很好地解决。因此，本文提出了跨模态差异注意力网络拉近不同模态之间的语义，该网络包括反向注意力模块和语义一致模块：反向注意力模块更全面探索医学图像中的重要区域；语义一致模块利用大语言模型的特征作为参考，引导视觉特征不断靠近参考文本特征，使得视觉语义更准确地转化成一致的语言语义。实验表明，跨模态差异注意力网络在IU X-Ray和MIMIC-CXR两个公开数据集上的表现均优于之前的模型，在BLEU4上的指标分数分别达到17.9%和10.9%，相比于基线模型，本文模型性能有较大的提高，证明了本文所提模型能生成准确和流畅的医学报告。

Abstract: Automatic medical report generation technology plays an important role in auxiliary diagnosis and can greatly reduce the workload of medical workers. As deep learning continues to develop in the medical field, automatic medical report generation technology has become one of the research hotspots. Currently, the main challenges in medical report generation are (1) the difficulty of capturing lesion regions in images by models, and (2) the large semantic gap between visual and language semantics, whose consistency problem is still not well solved. Therefore, in order to solve the above problems, a Cross-Modal Discrepancy Attention Network (CDAN) is proposed to bring closer the semantics between different modalities. The network includes a Reverse Attention (RA) module and a Semantic Consistency (SC) module: (1) the Reverse Attention module explores important areas in medical images more comprehensively, and (2) the Semantic Consistency module utilizes the features of the large language model as a reference to guide the visual features to continuously approach the reference language features, so that the visual semantics can be more accurately converted into language semantics. Experiments show that the Cross-Modal Discrepancy Attention Network is better than the previous model on both IU X-Ray and MIMIC-CXR public datasets, with BLEU4 scores reaching 17.9% and 10.9% respectively. Compared with the baseline model, improvement is significant in performance, which proves that the proposed model is capable of generating accurate and fluent medical reports.

HTML全文

参考文献(37)

施引文献

资源附件(0)