融合临床结构化信息的提示引导Mamba直肠癌分割网络

杨晓君; 刘鸿杰; 杨世斌; 徐明; 李韵灵; 王毅敏

doi:10.12052/gdutxb.260023

摘要: 直肠癌病灶的精准分割对术前规划与治疗方案制定具有重要价值。现有直肠癌磁共振成像(Magnetic Resonance Imaging, MRI)分割方法多以单一影像模态为主要输入，未能充分利用血清肿瘤标志物浓度等临床结构化信息对病灶分割的潜在约束作用。此外，直肠癌病灶区域常表现出低对比度、边界模糊等特性，且跳跃连接在传递浅层细节时易引入噪声与冗余信息，从而增加分割难度。针对上述问题，本文提出一种融合临床结构化信息的提示引导Mamba直肠癌分割网络(Prompt-guided Clinical Mamba U-Net, CM-UNet)。该方法的核心在于构建可提示特征融合模块(Promptable Feature Fusion Module, PFM)，利用对比语言–图像预训练模型(Contrastive Language–Image Pre-training Model, CLIP) 将临床结构化信息编码为文本嵌入表示，在编码—解码多阶段特征学习过程中持续注入，以实现影像特征与临床提示信息的协同建模；同时，在跳跃连接中引入混合并行注意力模块(Mixed Parallel Attention Module, MPAM)，对跨层特征进行自适应筛选，以减弱噪声与冗余响应对解码过程的干扰；此外，在编码端引入轻量级频域增强模块(Frequency Domain Enhancement Module, FEM)，作为对边界与纹理细节表征的补充。在由中山大学附属第一医院与中国人民解放军联勤保障部队第九四〇医院提供数据整理形成的CRC-370数据集上，本文方法相较于最优对比基线在交并比(Intersection over Union, IoU) 与Dice系数两个指标上分别提升2.21和2.25个百分点。消融实验进一步表明，PFM是性能提升的主要来源，MPAM对跨层特征传递具有补充作用，而FEM在中高层特征上的引入可带来一定增益。结果表明，在Mamba分割框架中融合患者级临床结构化信息，有助于提升直肠癌MRI病灶分割性能。

Abstract: Accurate segmentation of rectal cancer lesions is crucial for preoperative planning and treatment strategy formulation. Existing rectal cancer magnetic resonance imaging (MRI) segmentation methods mainly rely on a single-modality imaging input and fail to fully exploit the potential constraints provided by clinical structured information, such as serum tumor marker levels. In addition, rectal cancer lesions often exhibit low contrast and ambiguous boundaries. Meanwhile, skip connections tend to introduce noise and redundant information when transmitting shallow features, thereby increasing segmentation difficulty. To address these issues, this paper proposes a novel segmentation network, termed PCM-UNet (Prompt-guided Clinical Mamba U-Net), which integrates clinical structured information via a prompt-guided Mamba framework. The core of the proposed method lies in the design of a Promptable Feature Fusion Module (PFM), which leverages the Contrastive Language–Image Pre-training (CLIP) model to encode clinical structured information into text embeddings. These embeddings are continuously injected into the multi-stage feature learning process of the encoder–decoder architecture, enabling collaborative modeling of imaging features and clinical prompts. Furthermore, a Mixed Parallel Attention Module (MPAM) is introduced into the skip connections to adaptively filter cross-level features, thereby alleviating the interference caused by noise and redundant responses during decoding. In addition, a lightweight Frequency Domain Enhancement Module (FEM) is incorporated into the encoder to complement the representation of boundary and texture details. Experiments conducted on the CRC-370 dataset, curated from data provided by the First Affiliated Hospital of Sun Yat-sen University and the 940th Hospital of the Joint Logistics Support Force of the Chinese People’s Liberation Army, demonstrate that the proposed method outperforms the best competing baseline by 2.21% and 2.25% in terms of Intersection over Union (IoU) and Dice coefficient, respectively. Ablation studies further indicate that PFM serves as the primary contributor to performance improvement, while MPAM provides complementary benefits for cross-level feature transmission, and FEM yields additional gains when applied to mid- and high-level features. These results demonstrate that incorporating patient-level clinical structured information into the Mamba-based segmentation framework effectively enhances the segmentation performance of rectal cancer MRI lesions.

融合临床结构化信息的提示引导Mamba直肠癌分割网络

A Prompt-guided Mamba Network for Rectal Cancer Segmentation via Clinical Structured Information Fusion