双模态迭代交叉注意力融合集成框架

    Bimodal Iterative Cross-Attention Fusion Ensemble Framework

    • 摘要: 阿尔茨海默病(Alzheimer's disease, AD) 作为一种进行性神经退行性疾病,其早期诊断与临床干预始终面临重大挑战。在医学影像领域中,结构性磁共振成像(Structural Magnetic Resonance Imaging, sMRI) 通过高分辨率解剖成像精准捕捉脑萎缩等结构性改变,而氟脱氧葡萄糖正电子发射断层扫描(Fluorodeoxyglucose Positron Emission Tomography, FDG-PET) 则通过监测葡萄糖代谢水平有效反映脑功能变化,二者在AD相关脑部病理改变的检测中具有重要互补价值。然而,现有的AD多模态分类模型存在特征融合效果欠佳、模态间信息交互不充分以及特征分布不一致的问题,限制了其在 AD 诊断中的应用。为了解决这些问题,提出了双模态迭代交叉注意力融合集成框架(Bimodal Iterative Cross-Attention Fusion Ensemble Framework,BICAFEF) 。该框架由基分类器和元分类器组成。基分类器通过残差网络(Residual Network, ResNet) 模块提取sMRI和FDG-PET切块的特征,并设计基于卷积操作和自适应聚合池化操作的空间特征收缩(Spatial Feature Shrinking, SFS) 模块减少模态间的冗余信息,突出关键特征。同时,构建迭代交叉注意力机制,在多轮迭代中动态捕捉和强化模态间的全局依赖关系和互补信息,解决无法充分挖掘模态间互补特性的难题,从而提升AD分类性能。最终,为了提升全脑分类精度,框架构建了元分类器对基分类器进行筛选和集成,剔除准确率低于 75% 的分类器,保留高性能分类器,进一步提高分类的鲁棒性和准确性。此外,通过可视化分析进一步验证了框架对关键脑区的关注,展现出该框架在sMRI和PET模态下对AD相关病变区域的有效识别能力。实验结果表明,该框架在ADvs.HC(正常组) 中的五折分类准确率(Accuracy, ACC) 为94.3%,敏感度(Sensitivity, SEN) 为92.6%,特异度(Specificity, SPE) 为96.3%,ROC曲线下面积(Area Under Curve, AUC) 为97.5%,马修斯相关系数(Matthews correlation coefficient, MCC) 为88.7%,优于现有先进的同类框架。

       

      Abstract: Alzheimer’s disease (AD) , as a progressive neurodegenerative disorder, presents significant challenges in early diagnosis and clinical intervention. In medical imaging, structural magnetic resonance imaging (sMRI) captures brain atrophy and structural alterations through high-resolution anatomical imaging, while fluorodeoxyglucose positron emission tomography (FDG-PET) effectively reflects functional changes by monitoring cerebral glucose metabolism. These two modalities hold complementary value in detecting AD-related pathological brain changes. However, existing multimodal AD classification models are limited by suboptimal feature fusion, insufficient inter-modal information interaction, and feature distribution discrepancies, hindering their diagnostic utility. To address these issues, a Bimodal Iterative Cross-Attention Fusion Ensemble Framework (BICAFEF) is proposed. This framework comprises base classifiers and a meta-classifier. The base classifiers employ ResNet modules to extract features from sMRI and FDG-PET image patches. A Spatial Feature Shrinking (SFS) module, integrating convolutional operations and adaptive aggregation pooling, is designed to reduce inter-modal redundancy and emphasize discriminative features. Additionally, an iterative cross-attention mechanism is constructed to dynamically capture and reinforce global dependencies and complementary information across modalities through multi-round iterations, thereby resolving the challenge of insufficiently exploiting inter-modal synergies and enhancing AD classification performance. To further improve whole-brain classification accuracy, the framework incorporates a meta-classifier to screen and ensemble base classifiers by discarding those with accuracy below 75%, retaining high-performance classifiers to boost robustness and precision. Visualization analyses validate the framework’s focus on critical brain regions, demonstrating its capability to effectively identify AD-related pathological areas in sMRI and PET modalities. Experimental results show that the framework achieves a five-fold classification accuracy (ACC) of 94.3%, sensitivity (SEN) of 92.6%, specificity (SPE) of 96.3%, AUC of 97.5%, and Matthews correlation coefficient (MCC) of 88.7% in AD vs. healthy control (HC) classification, outperforming state-of-the-art multimodal frameworks.

       

    /

    返回文章
    返回