ConvTriCA-FNet: A Convolutional Triple-level Cross-Attention Fusion Network for Tongue Image-Based Gastric Disease Diagnosis
-
-
Abstract
As one of the most prevalent malignant tumors worldwide, gastric cancer requires early identification and intervention of its precancerous lesions to reduce the disease burden. Tongue diagnosis, as a non-invasive and efficient screening method, demonstrates unique value in the preliminary screening of gastric diseases. To address issues in existing deep learning-based tongue image analysis methods, such as limited receptive fields, insufficient multi-level feature fusion, and inadequate utilization of clinical information, a gastric disease classification model is proposed using a convolutional triple-level cross-attention fusion network. The model adopts a dual-path structure to process tongue images and clinical data separately: the tongue image path integrates multi-level visual features through triple-level Transformer branches and cross-attention, while the clinical path employs a clinical representation learning module to extract discriminative information from structured data. This study achieves accurate classification of gastric diseases by effectively integrating local visual features, global semantic information, and clinical prior knowledge. Experiments on 820 clinical cases show that the proposed method attains an accuracy of 71.95% in classifying normal, non-atrophic, and atrophic gastric conditions, outperforming existing mainstream approaches and providing a reliable technical pathway for early intelligent screening of gastric diseases.
-
-