Abstract:
Alzheimer’s disease (AD) , as a progressive neurodegenerative disorder, presents significant challenges in early diagnosis and clinical intervention. In medical imaging, structural magnetic resonance imaging (sMRI) captures brain atrophy and structural alterations through high-resolution anatomical imaging, while fluorodeoxyglucose positron emission tomography (FDG-PET) effectively reflects functional changes by monitoring cerebral glucose metabolism. These two modalities hold complementary value in detecting AD-related pathological brain changes. However, existing multimodal AD classification models are limited by suboptimal feature fusion, insufficient inter-modal information interaction, and feature distribution discrepancies, hindering their diagnostic utility. To address these issues, a Bimodal Iterative Cross-Attention Fusion Ensemble Framework (BICAFEF) is proposed. This framework comprises base classifiers and a meta-classifier. The base classifiers employ ResNet modules to extract features from sMRI and FDG-PET image patches. A Spatial Feature Shrinking (SFS) module, integrating convolutional operations and adaptive aggregation pooling, is designed to reduce inter-modal redundancy and emphasize discriminative features. Additionally, an iterative cross-attention mechanism is constructed to dynamically capture and reinforce global dependencies and complementary information across modalities through multi-round iterations, thereby resolving the challenge of insufficiently exploiting inter-modal synergies and enhancing AD classification performance. To further improve whole-brain classification accuracy, the framework incorporates a meta-classifier to screen and ensemble base classifiers by discarding those with accuracy below 75%, retaining high-performance classifiers to boost robustness and precision. Visualization analyses validate the framework’s focus on critical brain regions, demonstrating its capability to effectively identify AD-related pathological areas in sMRI and PET modalities. Experimental results show that the framework achieves a five-fold classification accuracy (ACC) of 94.3%, sensitivity (SEN) of 92.6%, specificity (SPE) of 96.3%, AUC of 97.5%, and Matthews correlation coefficient (MCC) of 88.7% in AD vs. healthy control (HC) classification, outperforming state-of-the-art multimodal frameworks.