基于局部正交特征融合的小样本图像分类

doi:10.12052/gdutxb.230015

摘要/Abstract

摘要： 针对目前基于度量学习的小样本图像分类方法中难以充分提取重要特征问题，提出一种基于局部正交特征融合的小样本图像分类方法。首先，利用特征提取网络同时提取局部细节丰富的浅层特征和语义化强的深层特征；然后，通过一个通道注意力模块和一个多尺度特征自适应融合模块分别在浅层特征的通道维度和空间尺度上进行特征增强，以生成更显著且包含更多尺度信息的局部特征。最后，通过一个局部正交特征融合模块对得到的多尺度局部特征和初始深层语义特征进行局部正交特征提取和注意力融合，以充分利用图像的局部和全局特征信息，生成更能代表目标类别的特征表示。在miniImageNet、tieredImageNet 和 CUB-200-2011三个公开数据集上的实验结果表明：提出的方法可以获得更好的分类效果，在5way-5shot任务上的准确率分别达到81.69%、85.36%和89.78%，与baseline模型相比，分类准确率分别提升 5.23%、3.19%和5.99%。

关键词: 图像分类, 小样本学习, 多尺度特征, 注意力机制, 特征融合

Abstract: How to extract important features by existing metric-based few-shot image classification models is a difficulty. A few-shot image classification method based on local orthogonal feature fusion is proposed. First, the feature extraction network is used to simultaneously extract shallow features with rich local details and deep features with strong semantics. Then, a channel attention module and a multi-scale feature adaptive fusion module are used to perform feature enhancement on the channel and scale dimensions of the shallow features, respectively, in order to generate the feature with more salient local features and more scale information. Finally, according to local orthogonal feature extraction and attention fusion, the obtained multi-scale local features and initial deep semantic features are extracted and fused by a local orthogonal feature fusion module. In this way, we can make full use of the local and global feature information of the image. And a feature representation is generated, which can be more representative of the target category. The experimental results on the three public datasets of miniImageNet, tieredImageNet and CUB-200-2011 show that the proposed method can achieve better classification results. The accuracy rate of the proposed method on the 5way-5shot task reaches 81.69%, 85.36% and 89.78% respectively. Compared with the baseline model, the classification accuracy increased by 5.23%, 3.19% and 5.99% respectively.

Key words: image classification, few-shot learning, multi-scale features, attention mechanism, feature fusion

中图分类号:

TP391

涂泽良, 程良伦, 黄国恒. 基于局部正交特征融合的小样本图像分类[J]. 广东工业大学学报, 2024, 41(02): 73-83.

Tu Ze-liang, Cheng Liang-lun, Huang Guo-Heng. Local Orthogonal Feature Fusion for Few-Shot Image Classification[J]. Journal of Guangdong University of Technology, 2024, 41(02): 73-83.

参考文献

[1] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[2] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
[3] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2961-2969.
[4] LI F F, FERGUS R, PERONA P. One-shot learning of object categories [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4): 594-611.
[5] TIAN Y L, WANG Y, KRISHNAN D, et al. Rethinking few-shot image classification: a good embedding is all you need?[C]//European Conference on Computer Vision. Glasgow: Springer, 2020: 266-282.
[6] SNELL J, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: MIT Press , 2017: 4080-4090.
[7] VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona : MIT Press, 2016: 3637-3645.
[8] SUNG F, YANG Y X, ZhANG L, et al. Learning to compare: relation network for few-shot learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 1199-1208.
[9] LI W B, WANG L, XU J L, et al. Revisiting local descriptor based image-to-class measure for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7260-7268.
[10] HOU R B, CHANG H, MA B P, et al. Cross attention network for few-shot classification[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver : MIT Press, 2019: 4003-4014.
[11] ZHANG C, CAI Y J, LIN G S, et al. DeepEMD: Differentiable earth mover's distance for few-shot learning [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022(1): 1-17.
[12] RAVI S, LAROCHELLE H. Optimization as a model for few-shot learning[C]//Proceedings of the IEEE Conference on Learning Representations. Toulon : OpenReview. net, 2017: 2332-2343.
[13] HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780.
[14] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Conference on Machine Learning. Sydney: ACM, 2017: 1126-1135.
[15] LI Z G, ZHOU F W, CHEN F, et al. Meta-sgd: learning to learn quickly for few-shot learning[EB/OL]. arxiv: 1707.09835(2017-09-28) [2023-04-02]. https://arxiv.org/abs/1707.09835.
[16] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[17] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 3-19.
[18] WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11534-11542.
[19] ZHANG H, ZU K K, LU J, et al. EPSANet: an efficient pyramid split attention block on convolutional neural network[EB/OL]. arxiv: 2105.14447(2021-07-22) [2023-04-02] .https://arxiv.org/abs/2105.14447
[20] DAI Y M, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021: 3560-3569.
[21] LIU W, RABINOVICH A, BERG A C. Parsenet: looking wider to see better[EB/OL]. arxiv: 1506.04579v2(2021-11-19) [2023-04-02].https://arxiv.org/abs/1506.04579v2.
[22] LONG J, SHELHAMER E, DARREKK T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
[23] LIN B B, ZHANG S L, YU X. Gait recognition via effective global-local feature representation and local temporal aggregation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Vancouver: IEEE, 2021: 14648-14656.
[24] YANG M, HE D L, FAN M, et al. DOLG: single-stage image retrieval with deep orthogonal fusion of local and global features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Vancouver: IEEE, 2021: 11772-11781.
[25] SONG Y X, ZHU R L, YANG M, et al. DALG: deep attentive local and global modeling for image retrieval[EB/OL]. arxiv: 2207.00287(2022-07-01) [2023-04-02] .https://arxiv.org/abs/2207.00287 .
[26] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: MIT Press , 2017: 6000-6010.
[27] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. arxiv: 1706.05587v3(2017-12-05) [2023-04-02].https://arxiv.org/abs/1706.05587v3 .
[28] LEE K, MAJI S, RAVICHANDRAN A, et al. Meta-learning with differentiable convex optimization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 10657-10665.
[29] LIU B, CAO Y, LIN Y T, et al. Negative margin matters: understanding margin in few-shot classification[C]//European Conference on Computer Vision. Edinburgh: Springer, 2020: 438-455.
[30] SIMON C, KONIUSZ P, NOUK R, et al. Adaptive subspaces for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 4136-4145.
[31] CHEN W Y, LIU Y C, KIRA Z, et al. A Closer look at few-shot classification[C]//International Conference on Learning Representations. New Orleans: OpenReview. net, 2019: 4212-4223.
[32] ORESHKIN B N, RODRIGUE P, LACOSTE A. TADAM: task dependent adaptive metric for improved few-shot learning[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal : MIT Press , 2018: 719-729.
[33] CHEN Y B, LIU Z, XU H J, et al. Meta-baseline: exploring simple meta-learning for few-shot learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montréal : IEEE, 2021: 9062-9071.
[34] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-cam: visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Conference on Computer Vision. Honolulu: IEEE, 2017: 618-626.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed