Abstract:
To address the limitations of existing medical image diagnosis methods based on convolutional neural networks and Transformers, such as insufficient long-range dependency modeling and quadratic computational complexity, this paper proposes a 3D Positron Emission Tomography (PET) image classification framework named SSHCM(State Space Hybrid Convolutional Model) . The framework is built upon a multi-scale channel space perception Mamba architecture, which integrates a linear state-space model with a multi-scale feature interaction mechanism, using stacked LMamba blocks to capture long-range dependencies in 3D voxel sequences dynamically. A layer-wise cross-scale channel attention fusion module is designed to achieve adaptive fusion of global contextual semantic. Additionally, a channel-spatial perception module is constructed by combining large kernel convolutions with an inverted bottleneck structure, enhancing spatial feature fusion and improving lesion localization accuracy. Experimental results on the Alzheimer's Disease Neuroimaging Initiative dataset with
1187 subjects show that the proposed model significantly outperforms ResNet, ViT, and Mamba variant models in terms of both accuracy and AUC. Specifically, the model achieves accuracy rates of 97.03% for AD classification and 83.33% for MCI conversion prediction tasks.