SCA-YOLO: 基于可变形卷积与上下文感知注意力的实时目标检测

    SCA-YOLO: Real-Time Target Detection Based on Deformable Convolution and Context-aware Attention

    • 摘要: YOLO(You Only Look Once)系列算法因其在目标检测中的实时性和准确性而广受赞誉。然而,其性能提升仍面临挑战:标准卷积受局部感受野限制,难以捕获全局上下文特征,影响复杂目标的检测精度;增大卷积核虽能增强特征提取,却显著增加计算成本,降低算法效率。针对上述问题,本文研究了SCA-YOLO模型,引入可变形通道融合模块(Alterable Channel-wise Fusion, C2fAK) 和上下文感知注意力机制(Context-aware Attention++, CAA++)模块提升性能。C2fAK模块结合可变形卷积与通道融合结构(Channel-wise Fusion, C2f) ,提高特征表示能力并平衡计算开销;CAA++模块捕捉远距离上下文信息并减少通道冗余,进一步提升检测精度。实验表明,SCA-YOLO在多个数据集上的性能优于现有方法,展现了其在目标检测中的有效性和高效性。

       

      Abstract: The YOLO (You Only Look Once) -based algorithms have been widely used for real-time object detection and achieved promising performance. However, this performance improvement still faces two challenges. First, standard convolutions with limited receptive fields are hard to capture global contextual features, reducing the detection accuracy of complex objects; Second, increasing the convolution kernel size can enhance feature extraction while the computational cost is significantly increased. To address these issues, this paper investigates the SCA-YOLO model by introducing the Alterable Channel-wise Fusion module (C2fAK) and the Context-Aware Attention++ (CAA++) module to enhance performance. The C2fAK module combines deformable convolution with the Channel-wise Fusion (C2f) structure to enhance feature representation capability while balancing computational overhead. The CAA++ module captures long-range contextual information and reduces channel redundancy, further improving detection accuracy. Experimental results show that the proposed SCA-YOLO outperforms existing methods on multiple datasets, demonstrating its effectiveness and efficiency in object detection.

       

    /

    返回文章
    返回