基于视觉语言模型的肺部影像病灶分割及报告生成

赵海沣; 闫玉光; 许柏炎; 刘百川; 蔡瑞初

doi:10.12052/gdutxb.260016

基于视觉语言模型的肺部影像病灶分割及报告生成

A Vision-Language Model Framework for Pulmonary Lesion Segmentation and Automatic Report Generation

摘要

摘要: 随着医学影像数据规模的快速增长，如何实现肺部影像中病灶的精准分割与自动化诊断报告生成，成为临床智能辅助诊断的重要研究方向。传统方法通常将病灶分割与影像报告生成作为独立任务进行建模，难以充分利用视觉与语言信息之间的内在关联。视觉语言模型(Vision-Language Model, VLM)的发展为多模态医学影像理解提供了新的研究范式。本文提出了一种基于视觉语言模型的肺部影像分割及自动报告生成模型。该模型充分利用VLM强大的跨模态对齐与语义理解能力，在肺部CT影像的基础上，实现统一模型框架下对肺部医学影像中关键解剖结构及病灶区域的精准分割，并进一步生成与分割结果语义一致的自然语言影像报告，实现病灶分析与诊断信息的自动化表达。实验结果表明，该模型在肺结节分割与报告生成中表现良好，分割Dice系数达85.42%，报告生成的解剖定位与结节性质分类准确率分别达84.12%和85.56%，验证了 VLM 在融合视觉感知与医学语义理解方面的有效性。

Abstract: With the rapid expansion of medical imaging data, accurate lesion segmentation and automated diagnostic report generation from pulmonary images have become increasingly important for intelligent clinical decision support. Conventional methods generally treat lesion segmentation and radiology report generation as independent tasks, which limits their ability to exploit the intrinsic correlations between visual features and linguistic semantics. Recent advances in vision-language models (VLMs) have provided a new paradigm for multimodal medical image understanding by enabling joint modeling of visual perception and language representation. In this work, a unified VLM-based framework is developed for pulmonary image segmentation and automatic report generation. Leveraging the strong cross-modal alignment and semantic reasoning capabilities of VLMs, the proposed framework operates on lung CT images to achieve precise segmentation of key anatomical structures and lesion regions within a single model. Moreover, natural language radiology reports are generated in a manner that is semantically consistent with the corresponding segmentation results, enabling automated and coherent expression of lesion characteristics and diagnostic information. Experimental results demonstrate that the proposed model performs well in pulmonary nodule segmentation and report generation, achieving a segmentation dice of 85.42%, and anatomical localization and nodule type classification accuracy rates of 84.12% and 85.56%, respectively, validating the effectiveness of VLMs in integrating visual perception with medical semantic understanding.

HTML全文

参考文献(24)

施引文献

资源附件(0)