A Vision-Language Model Framework for Pulmonary Lesion Segmentation and Automatic Report Generation

Zhao Haifeng; Yan Yuguang; Xu Boyan; Liu Baichuan; Cai Ruichu

doi:10.12052/gdutxb.260016

Zhao Haifeng, Yan Yuguang, Xu Boyan, et al. A vision-language model framework for pulmonary lesion segmentation and automatic report generationJ. Journal of Guangdong University of Technology. DOI: 10.12052/gdutxb.260016

Citation:

A Vision-Language Model Framework for Pulmonary Lesion Segmentation and Automatic Report Generation

Graphical Abstract

Abstract

Abstract

With the rapid expansion of medical imaging data, accurate lesion segmentation and automated diagnostic report generation from pulmonary images have become increasingly important for intelligent clinical decision support. Conventional methods generally treat lesion segmentation and radiology report generation as independent tasks, which limits their ability to exploit the intrinsic correlations between visual features and linguistic semantics. Recent advances in vision-language models (VLMs) have provided a new paradigm for multimodal medical image understanding by enabling joint modeling of visual perception and language representation. In this work, a unified VLM-based framework is developed for pulmonary image segmentation and automatic report generation. Leveraging the strong cross-modal alignment and semantic reasoning capabilities of VLMs, the proposed framework operates on lung CT images to achieve precise segmentation of key anatomical structures and lesion regions within a single model. Moreover, natural language radiology reports are generated in a manner that is semantically consistent with the corresponding segmentation results, enabling automated and coherent expression of lesion characteristics and diagnostic information. Experimental results demonstrate that the proposed model performs well in pulmonary nodule segmentation and report generation, achieving a segmentation dice of 85.42%, and anatomical localization and nodule type classification accuracy rates of 84.12% and 85.56%, respectively, validating the effectiveness of VLMs in integrating visual perception with medical semantic understanding.

FullText(HTML)

References (24)

Cited By

Turn off MathJax

Article Contents

A Vision-Language Model Framework for Pulmonary Lesion Segmentation and Automatic Report Generation

Abstract

Catalog

Export File

Citation

Format

Content