Abstract:
Automated radiology report generation is crucial for reducing radiologist workload and minimizing diagnostic errors. Although existing studies have conducted in-depth research on lesion regions, there is potential for enhancement in generating detailed descriptions. Current methods tend to diminish sensitivity to the semantic information of visual lesions and weaken the critical association between visual and textual semantics. This paper introduces a novel Prior Prompt-Driven Semantic Consistency Model (PPD-SCM) to address these limitations. The Prompt-Lesion Enhancement module in the proposed model systematically integrates both normal and abnormal diagnostic descriptions from radiological chest X-ray images to construct prior prompts. By employing a prompt attention mechanism that fuses visual features with textual prompts, this module enhances the model's ability to perceive potential lesion features. Furthermore, this study introduces a Visual-Textual Semantic Consistency (VTSC) module that employs contrastive learning to deeply align visual and textual semantics. By leveraging prompt tokens to guide the model in generating enriched contextual information, the VTSC optimizes the subsequent report generation process. It effectively reduces the semantic gap between medical images and the generated reports, thereby enhancing the accuracy and reliability of report generation. Extensive experimental results on the IU X-Ray and MIMIC-MV datasets demonstrate that our proposed method significantly outperforms existing approaches in generating high-quality radiology reports.