广东工业大学学报 ›› 2024, Vol. 41 ›› Issue (02): 73-83.doi: 10.12052/gdutxb.230015

• 计算机科学与技术 • 上一篇    

基于局部正交特征融合的小样本图像分类

涂泽良, 程良伦, 黄国恒   

  1. 广东工业大学 计算机学院 广东 广州 510006
  • 收稿日期:2023-02-10 发布日期:2024-04-23
  • 通信作者: 黄国恒(1985-),男,副教授,博士,主要研究方向为计算机视觉、机器学习、模式识别、医学图像处理,E-mail:kevinwong@gdut.edu.cn
  • 作者简介:涂泽良(1994-),男,硕士研究生,主要研究方向为计算机视觉、小样本学习,E-mail:2356472797@qq.com
  • 基金资助:
    广东省重点领域研发计划项目(2019B010153002)

Local Orthogonal Feature Fusion for Few-Shot Image Classification

Tu Ze-liang, Cheng Liang-lun, Huang Guo-Heng   

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2023-02-10 Published:2024-04-23

摘要: 针对目前基于度量学习的小样本图像分类方法中难以充分提取重要特征问题,提出一种基于局部正交特征融合的小样本图像分类方法。首先,利用特征提取网络同时提取局部细节丰富的浅层特征和语义化强的深层特征;然后,通过一个通道注意力模块和一个多尺度特征自适应融合模块分别在浅层特征的通道维度和空间尺度上进行特征增强,以生成更显著且包含更多尺度信息的局部特征。最后,通过一个局部正交特征融合模块对得到的多尺度局部特征和初始深层语义特征进行局部正交特征提取和注意力融合,以充分利用图像的局部和全局特征信息,生成更能代表目标类别的特征表示。在miniImageNet、tieredImageNet 和 CUB-200-2011三个公开数据集上的实验结果表明:提出的方法可以获得更好的分类效果,在5way-5shot任务上的准确率分别达到81.69%、85.36%和89.78%,与baseline模型相比,分类准确率分别提升 5.23%、3.19%和5.99%。

关键词: 图像分类, 小样本学习, 多尺度特征, 注意力机制, 特征融合

Abstract: How to extract important features by existing metric-based few-shot image classification models is a difficulty. A few-shot image classification method based on local orthogonal feature fusion is proposed. First, the feature extraction network is used to simultaneously extract shallow features with rich local details and deep features with strong semantics. Then, a channel attention module and a multi-scale feature adaptive fusion module are used to perform feature enhancement on the channel and scale dimensions of the shallow features, respectively, in order to generate the feature with more salient local features and more scale information. Finally, according to local orthogonal feature extraction and attention fusion, the obtained multi-scale local features and initial deep semantic features are extracted and fused by a local orthogonal feature fusion module. In this way, we can make full use of the local and global feature information of the image. And a feature representation is generated, which can be more representative of the target category. The experimental results on the three public datasets of miniImageNet, tieredImageNet and CUB-200-2011 show that the proposed method can achieve better classification results. The accuracy rate of the proposed method on the 5way-5shot task reaches 81.69%, 85.36% and 89.78% respectively. Compared with the baseline model, the classification accuracy increased by 5.23%, 3.19% and 5.99% respectively.

Key words: image classification, few-shot learning, multi-scale features, attention mechanism, feature fusion

中图分类号: 

  • TP391
[1] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[2] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
[3] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2961-2969.
[4] LI F F, FERGUS R, PERONA P. One-shot learning of object categories [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4): 594-611.
[5] TIAN Y L, WANG Y, KRISHNAN D, et al. Rethinking few-shot image classification: a good embedding is all you need?[C]//European Conference on Computer Vision. Glasgow: Springer, 2020: 266-282.
[6] SNELL J, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: MIT Press , 2017: 4080-4090.
[7] VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona : MIT Press, 2016: 3637-3645.
[8] SUNG F, YANG Y X, ZhANG L, et al. Learning to compare: relation network for few-shot learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 1199-1208.
[9] LI W B, WANG L, XU J L, et al. Revisiting local descriptor based image-to-class measure for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7260-7268.
[10] HOU R B, CHANG H, MA B P, et al. Cross attention network for few-shot classification[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver : MIT Press, 2019: 4003-4014.
[11] ZHANG C, CAI Y J, LIN G S, et al. DeepEMD: Differentiable earth mover's distance for few-shot learning [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022(1): 1-17.
[12] RAVI S, LAROCHELLE H. Optimization as a model for few-shot learning[C]//Proceedings of the IEEE Conference on Learning Representations. Toulon : OpenReview. net, 2017: 2332-2343.
[13] HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780.
[14] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Conference on Machine Learning. Sydney: ACM, 2017: 1126-1135.
[15] LI Z G, ZHOU F W, CHEN F, et al. Meta-sgd: learning to learn quickly for few-shot learning[EB/OL]. arxiv: 1707.09835(2017-09-28) [2023-04-02]. https://arxiv.org/abs/1707.09835.
[16] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[17] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 3-19.
[18] WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11534-11542.
[19] ZHANG H, ZU K K, LU J, et al. EPSANet: an efficient pyramid split attention block on convolutional neural network[EB/OL]. arxiv: 2105.14447(2021-07-22) [2023-04-02] .https://arxiv.org/abs/2105.14447
[20] DAI Y M, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021: 3560-3569.
[21] LIU W, RABINOVICH A, BERG A C. Parsenet: looking wider to see better[EB/OL]. arxiv: 1506.04579v2(2021-11-19) [2023-04-02].https://arxiv.org/abs/1506.04579v2.
[22] LONG J, SHELHAMER E, DARREKK T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
[23] LIN B B, ZHANG S L, YU X. Gait recognition via effective global-local feature representation and local temporal aggregation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Vancouver: IEEE, 2021: 14648-14656.
[24] YANG M, HE D L, FAN M, et al. DOLG: single-stage image retrieval with deep orthogonal fusion of local and global features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Vancouver: IEEE, 2021: 11772-11781.
[25] SONG Y X, ZHU R L, YANG M, et al. DALG: deep attentive local and global modeling for image retrieval[EB/OL]. arxiv: 2207.00287(2022-07-01) [2023-04-02] .https://arxiv.org/abs/2207.00287 .
[26] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: MIT Press , 2017: 6000-6010.
[27] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. arxiv: 1706.05587v3(2017-12-05) [2023-04-02].https://arxiv.org/abs/1706.05587v3 .
[28] LEE K, MAJI S, RAVICHANDRAN A, et al. Meta-learning with differentiable convex optimization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 10657-10665.
[29] LIU B, CAO Y, LIN Y T, et al. Negative margin matters: understanding margin in few-shot classification[C]//European Conference on Computer Vision. Edinburgh: Springer, 2020: 438-455.
[30] SIMON C, KONIUSZ P, NOUK R, et al. Adaptive subspaces for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 4136-4145.
[31] CHEN W Y, LIU Y C, KIRA Z, et al. A Closer look at few-shot classification[C]//International Conference on Learning Representations. New Orleans: OpenReview. net, 2019: 4212-4223.
[32] ORESHKIN B N, RODRIGUE P, LACOSTE A. TADAM: task dependent adaptive metric for improved few-shot learning[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal : MIT Press , 2018: 719-729.
[33] CHEN Y B, LIU Z, XU H J, et al. Meta-baseline: exploring simple meta-learning for few-shot learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montréal : IEEE, 2021: 9062-9071.
[34] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-cam: visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Conference on Computer Vision. Honolulu: IEEE, 2017: 618-626.
[1] 熊荣盛, 王帮海, 杨夏宁. 基于蓝图可分离残差蒸馏网络的图像超分辨率重建[J]. 广东工业大学学报, 2024, 41(02): 65-72.
[2] 杨镇雄, 谭台哲. 基于生成对抗网络的低光照图像增强算法[J]. 广东工业大学学报, 2024, 41(01): 55-62.
[3] 赖志茂, 章云, 李东. 基于Transformer的人脸深度伪造检测技术综述[J]. 广东工业大学学报, 2023, 40(06): 155-167.
[4] 曾安, 陈旭宙, 姬玉柱, 潘丹, 徐小维. 基于自注意力和三维卷积的心脏多类分割方法[J]. 广东工业大学学报, 2023, 40(06): 168-175.
[5] 甘孟坤, 曾安, 张小波. 基于Swin-Unet的主动脉再缩窄预测研究[J]. 广东工业大学学报, 2023, 40(05): 34-40.
[6] 吴亚迪, 陈平华. 基于用户长短期偏好和音乐情感注意力的音乐推荐模型[J]. 广东工业大学学报, 2023, 40(04): 37-44.
[7] 陈晓荣, 杨雪荣, 成思源, 刘国栋. 基于改进Unet网络的锂电池极片表面缺陷检测[J]. 广东工业大学学报, 2023, 40(04): 60-66,93.
[8] 曹智雄, 吴晓鸰, 骆晓伟, 凌捷. 融合迁移学习与YOLOv5的安全帽佩戴检测算法[J]. 广东工业大学学报, 2023, 40(04): 67-76.
[9] 赖东升, 冯开平, 罗立宏. 基于多特征融合的表情识别算法[J]. 广东工业大学学报, 2023, 40(03): 10-16.
[10] 谢国波, 林立, 林志毅, 贺笛轩, 文刚. 基于YOLOv4-MP的绝缘子爆裂缺陷检测方法[J]. 广东工业大学学报, 2023, 40(02): 15-21.
[11] 吴俊贤, 何元烈. 基于通道注意力的自监督深度估计方法[J]. 广东工业大学学报, 2023, 40(02): 22-29.
[12] 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9.
[13] 章云, 王晓东. 基于受限样本的深度学习综述与思考[J]. 广东工业大学学报, 2022, 39(05): 1-8.
[14] 黄剑航, 王振友. 基于特征融合的深度学习目标检测算法研究[J]. 广东工业大学学报, 2021, 38(04): 52-58.
[15] 滕少华, 董谱, 张巍. 融合语义结构的注意力文本摘要模型[J]. 广东工业大学学报, 2021, 38(03): 1-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!