Zhao Haifeng, Yan Yuguang, Xu Boyan, et al. A vision-language model framework for pulmonary lesion segmentation and automatic report generationJ. Journal of Guangdong University of Technology. DOI: 10.12052/gdutxb.260016
    Citation: Zhao Haifeng, Yan Yuguang, Xu Boyan, et al. A vision-language model framework for pulmonary lesion segmentation and automatic report generationJ. Journal of Guangdong University of Technology. DOI: 10.12052/gdutxb.260016

    A Vision-Language Model Framework for Pulmonary Lesion Segmentation and Automatic Report Generation

    • With the rapid expansion of medical imaging data, accurate lesion segmentation and automated diagnostic report generation from pulmonary images have become increasingly important for intelligent clinical decision support. Conventional methods generally treat lesion segmentation and radiology report generation as independent tasks, which limits their ability to exploit the intrinsic correlations between visual features and linguistic semantics. Recent advances in vision-language models (VLMs) have provided a new paradigm for multimodal medical image understanding by enabling joint modeling of visual perception and language representation. In this work, a unified VLM-based framework is developed for pulmonary image segmentation and automatic report generation. Leveraging the strong cross-modal alignment and semantic reasoning capabilities of VLMs, the proposed framework operates on lung CT images to achieve precise segmentation of key anatomical structures and lesion regions within a single model. Moreover, natural language radiology reports are generated in a manner that is semantically consistent with the corresponding segmentation results, enabling automated and coherent expression of lesion characteristics and diagnostic information. Experimental results demonstrate that the proposed model performs well in pulmonary nodule segmentation and report generation, achieving a segmentation dice of 85.42%, and anatomical localization and nodule type classification accuracy rates of 84.12% and 85.56%, respectively, validating the effectiveness of VLMs in integrating visual perception with medical semantic understanding.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return