Journal of Guangdong University of Technology ›› 2024, Vol. 41 ›› Issue (02): 73-83.doi: 10.12052/gdutxb.230015

• Computer Science and Technology • Previous Articles    

Local Orthogonal Feature Fusion for Few-Shot Image Classification

Tu Ze-liang, Cheng Liang-lun, Huang Guo-Heng   

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2023-02-10 Published:2024-04-23

Abstract: How to extract important features by existing metric-based few-shot image classification models is a difficulty. A few-shot image classification method based on local orthogonal feature fusion is proposed. First, the feature extraction network is used to simultaneously extract shallow features with rich local details and deep features with strong semantics. Then, a channel attention module and a multi-scale feature adaptive fusion module are used to perform feature enhancement on the channel and scale dimensions of the shallow features, respectively, in order to generate the feature with more salient local features and more scale information. Finally, according to local orthogonal feature extraction and attention fusion, the obtained multi-scale local features and initial deep semantic features are extracted and fused by a local orthogonal feature fusion module. In this way, we can make full use of the local and global feature information of the image. And a feature representation is generated, which can be more representative of the target category. The experimental results on the three public datasets of miniImageNet, tieredImageNet and CUB-200-2011 show that the proposed method can achieve better classification results. The accuracy rate of the proposed method on the 5way-5shot task reaches 81.69%, 85.36% and 89.78% respectively. Compared with the baseline model, the classification accuracy increased by 5.23%, 3.19% and 5.99% respectively.

Key words: image classification, few-shot learning, multi-scale features, attention mechanism, feature fusion

CLC Number: 

  • TP391
[1] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[2] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
[3] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2961-2969.
[4] LI F F, FERGUS R, PERONA P. One-shot learning of object categories [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4): 594-611.
[5] TIAN Y L, WANG Y, KRISHNAN D, et al. Rethinking few-shot image classification: a good embedding is all you need?[C]//European Conference on Computer Vision. Glasgow: Springer, 2020: 266-282.
[6] SNELL J, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: MIT Press , 2017: 4080-4090.
[7] VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona : MIT Press, 2016: 3637-3645.
[8] SUNG F, YANG Y X, ZhANG L, et al. Learning to compare: relation network for few-shot learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 1199-1208.
[9] LI W B, WANG L, XU J L, et al. Revisiting local descriptor based image-to-class measure for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7260-7268.
[10] HOU R B, CHANG H, MA B P, et al. Cross attention network for few-shot classification[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver : MIT Press, 2019: 4003-4014.
[11] ZHANG C, CAI Y J, LIN G S, et al. DeepEMD: Differentiable earth mover's distance for few-shot learning [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022(1): 1-17.
[12] RAVI S, LAROCHELLE H. Optimization as a model for few-shot learning[C]//Proceedings of the IEEE Conference on Learning Representations. Toulon : OpenReview. net, 2017: 2332-2343.
[13] HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780.
[14] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Conference on Machine Learning. Sydney: ACM, 2017: 1126-1135.
[15] LI Z G, ZHOU F W, CHEN F, et al. Meta-sgd: learning to learn quickly for few-shot learning[EB/OL]. arxiv: 1707.09835(2017-09-28) [2023-04-02]. https://arxiv.org/abs/1707.09835.
[16] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[17] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 3-19.
[18] WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11534-11542.
[19] ZHANG H, ZU K K, LU J, et al. EPSANet: an efficient pyramid split attention block on convolutional neural network[EB/OL]. arxiv: 2105.14447(2021-07-22) [2023-04-02] .https://arxiv.org/abs/2105.14447
[20] DAI Y M, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021: 3560-3569.
[21] LIU W, RABINOVICH A, BERG A C. Parsenet: looking wider to see better[EB/OL]. arxiv: 1506.04579v2(2021-11-19) [2023-04-02].https://arxiv.org/abs/1506.04579v2.
[22] LONG J, SHELHAMER E, DARREKK T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
[23] LIN B B, ZHANG S L, YU X. Gait recognition via effective global-local feature representation and local temporal aggregation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Vancouver: IEEE, 2021: 14648-14656.
[24] YANG M, HE D L, FAN M, et al. DOLG: single-stage image retrieval with deep orthogonal fusion of local and global features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Vancouver: IEEE, 2021: 11772-11781.
[25] SONG Y X, ZHU R L, YANG M, et al. DALG: deep attentive local and global modeling for image retrieval[EB/OL]. arxiv: 2207.00287(2022-07-01) [2023-04-02] .https://arxiv.org/abs/2207.00287 .
[26] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: MIT Press , 2017: 6000-6010.
[27] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. arxiv: 1706.05587v3(2017-12-05) [2023-04-02].https://arxiv.org/abs/1706.05587v3 .
[28] LEE K, MAJI S, RAVICHANDRAN A, et al. Meta-learning with differentiable convex optimization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 10657-10665.
[29] LIU B, CAO Y, LIN Y T, et al. Negative margin matters: understanding margin in few-shot classification[C]//European Conference on Computer Vision. Edinburgh: Springer, 2020: 438-455.
[30] SIMON C, KONIUSZ P, NOUK R, et al. Adaptive subspaces for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 4136-4145.
[31] CHEN W Y, LIU Y C, KIRA Z, et al. A Closer look at few-shot classification[C]//International Conference on Learning Representations. New Orleans: OpenReview. net, 2019: 4212-4223.
[32] ORESHKIN B N, RODRIGUE P, LACOSTE A. TADAM: task dependent adaptive metric for improved few-shot learning[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal : MIT Press , 2018: 719-729.
[33] CHEN Y B, LIU Z, XU H J, et al. Meta-baseline: exploring simple meta-learning for few-shot learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montréal : IEEE, 2021: 9062-9071.
[34] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-cam: visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Conference on Computer Vision. Honolulu: IEEE, 2017: 618-626.
[1] Xiong Rong-sheng, Wang Bang-hai, Yang Xia-ning. Super-resolution Reconstruction of Images Based on Blueprint Separable Residual Distillation Network [J]. Journal of Guangdong University of Technology, 2024, 41(02): 65-72.
[2] Lai Zhi-mao, Zhang Yun, Li Dong. A Survey of Deepfake Detection Techniques Based on Transformer [J]. Journal of Guangdong University of Technology, 2023, 40(06): 155-167.
[3] Zeng An, Chen Xu-zhou, Ji Yu-Zhu, Pan Dan, Xu Xiao-Wei. Cardiac Multiclass Segmentation Method Based on Self-attention and 3D Convolution [J]. Journal of Guangdong University of Technology, 2023, 40(06): 168-175.
[4] Gan Meng-kun, Zeng An, Zhang Xiao-bo. Aortic Re-coarctation Prediction Research Based on Swin-Unet [J]. Journal of Guangdong University of Technology, 2023, 40(05): 34-40.
[5] Chen Xiao-rong, Yang Xue-rong, Cheng Si-yuan, Liu Guo-dong. Surface Defect Detection of Lithium Battery Electrodes Based on Improved Unet Network [J]. Journal of Guangdong University of Technology, 2023, 40(04): 60-66,93.
[6] Lai Dong-sheng, Feng Kai-ping, Luo Li-hong. Facial Expression Recognition Based on Multi-feature Fusion [J]. Journal of Guangdong University of Technology, 2023, 40(03): 10-16.
[7] Xie Guo-bo, Lin Li, Lin Zhi-yi, He Di-xuan, Wen Gang. An Insulator Burst Defect Detection Method Based on YOLOv4-MP [J]. Journal of Guangdong University of Technology, 2023, 40(02): 15-21.
[8] Wu Jun-xian, He Yuan-lie. Channel Attentive Self-supervised Network for Monocular Depth Estimation [J]. Journal of Guangdong University of Technology, 2023, 40(02): 22-29.
[9] Liu Hong-wei, Lin Wei-zhen, Wen Zhan-ming, Chen Yan-jun, Yi Min-qi. A MABM-based Model for Identifying Consumers' Sentiment Polarity―Taking Movie Reviews as an Example [J]. Journal of Guangdong University of Technology, 2022, 39(06): 1-9.
[10] Huang Jian-hang, Wang Zhen-you. A Research on Deep Learning Object Detection Algorithm Based on Feature Fusion [J]. Journal of Guangdong University of Technology, 2021, 38(04): 52-58.
[11] Teng Shao-hua, Dong Pu, Zhang Wei. An Attention Text Summarization Model Based on Syntactic Structure Fusion [J]. Journal of Guangdong University of Technology, 2021, 38(03): 1-8.
[12] Liang Guan-shu, Cao Jiang-zhong, Dai Qing-yun, Huang Yun-fei. An Unsupervised Trademark Retrieval Method Based on Attention Mechanism [J]. Journal of Guangdong University of Technology, 2020, 37(06): 41-49.
[13] Zeng Bi-qing, Han Xu-li, Wang Sheng-yu, Xu Ru-yang, Zhou Wu. Sentiment Classification Based on Double Attention Convolutional Neural Network Model [J]. Journal of Guangdong University of Technology, 2019, 36(04): 10-17.
[14] Gao Jun-yan, Liu Wen-yin, Yang Zhen-guo. Object Tracking Combined with Attention and Feature Fusion [J]. Journal of Guangdong University of Technology, 2019, 36(04): 18-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!