广东工业大学学报 ›› 2023, Vol. 40 ›› Issue (04): 24-30,36.doi: 10.12052/gdutxb.220018

• 计算机科学与技术 • 上一篇    下一篇

基于增量式类激活知识的蒸馏学习方法

张家悦, 张灵   

  1. 广东工业大学 计算机学院, 广东 广州 510006
  • 收稿日期:2022-01-30 出版日期:2023-07-25 发布日期:2023-08-02
  • 通信作者: 张灵(1968–),女,教授,主要研究方向为智能优化算法、深度学习及数字图像处理技术、分布式计算及大数据技术,E-mail:1252875930@qq.com
  • 作者简介:张家悦(1998–),男,硕士研究生,主要研究方向为深度学习
  • 基金资助:
    广东省交通运输厅科技项目(科技-2016-02-030)

Knowledge Distillation Method Based on Incremental Class Activation Knowledge

Zhang Jia-yue, Zhang Ling   

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2022-01-30 Online:2023-07-25 Published:2023-08-02

摘要: 由于样本特征缺乏类别判定性且设备资源不足以支持样本类别结构学习,现有的知识蒸馏方法往往忽略了样本的类别知识蒸馏。针对此问题,本文提出一种增量式类激活知识蒸馏方法(Incremental Class Activation Knowledge Distillation,ICAKD)。首先,利用类激活梯度图提取具备类别判定性的样本特征,并提出类激活图约束损失。然后,构建存储类别判定性特征的增量式记忆库,保存多个训练批次样本并迭代更新。最后,计算记忆库内每一类样本的类质中心,并构造类别结构关系,根据类激活图约束和类别结构关系实现类别知识蒸馏。在Cifar10、Cifar100、Tiny-ImageNet、ImageNet等数据集上进行对比实验,结果表明本文所提出的方法对比类别结构蒸馏方法(Category Structure Knowledge Distillation,CSKD)在准确率上有0.4%~1.21%的提升,说明了类别判定性特征和增量式方法对类别知识蒸馏起到促进作用。

关键词: 知识蒸馏, 类激活知识, 增量式记忆库, 类别结构

Abstract: Due to the fact that features are not category-deterministic and the equipment resources are usually limit to support the category structure learning of samples, existing knowledge distillation methods possibly ignore the category knowledge distillation of samples. Therefore, this paper proposes a distillation method based on incremental class activation knowledge (ICAKD). First, this paper uses the class activation gradient map to extract class-discriminative sample features and proposes a class-activation constraint loss. Then, an incremental memory bank is built to store class-deterministic features, and multiple training batch samples are saved and updated iteratively. Finally, our proposed method calculates the quasi-quality center of the samples in the memory bank to construct the category structure relationship, and further performs the category knowledge distillation according to the class-activation constraint and the category structure relationship. Experimental results on the Cifar10, Cifar100, Tiny-ImageNet, and ImageNet datasets show that the proposed method achieves a 0.4%~1.21% improvement in term of accuracy when compared with the Category Structure Knowledge Distillation(CSKD) methods, demonstrating the promising effectiveness of the characteristics and increment of category judgment for category knowledge distillation.

Key words: knowledge distillation, category activation knowledge, incremental memory bank, category structure

中图分类号: 

  • TP391
[1] ZAGORUYKO S, KOMODAKIS N. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer[EB/OL]. arXiv: 1612.03928(2017-02-12)[2022-01-30].https://doi.org/10.48550/arXiv.1612.03928.
[2] CHEN Z, ZHENG X, SHEN H, et al. Improving knowledge distillation via category structure[C]//16th European Conference on Computer Vision. Glasgow: Springer, 2020: 205-219.
[3] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. arXiv: 1503.02531(2015-03-09)[2022-01-30]. https://doi.org/10.48550/arXiv.1503.02531.
[4] MÜLLER R, KORNBLITH S, HINTON G. When does label smoothing help?[C]//Annual Conference on Neural Information Processing Systems 2019. Vancouver: MIT Press, 2019: 4694-4703.
[5] DING Q, WU S, SUN H, et al. Adaptive regularization of labels[EB/OL]. arXiv: 1908.05474(2019-08-15)[2022-01-30].https://doi.org/10.48550/arXiv.1908.05474.
[6] ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: hints for thin deep nets[EB/OL]. arXiv: 1412.6550(2015-03-27)[2022-01-30].https://doi.org/10.48550/arXiv.1412.6550.
[7] JANG Y, LEE H, HWANG S J, et al. Learning what and where to transfer[C]//Proceedings of the 36th International Conference on Machine Learning. Long Beach: PMLR, 2019: 3030-3039.
[8] HUANG Z, WANG N. Like what you like: knowledge distill via neuron selectivity transfer[EB/OL]. arXiv: 1707.012197(2017-12-18)[2022-01-30].https://doi.org/10.48550/arXiv.1707.01219.
[9] WANG K, GAO X, ZHAO Y, et al. Pay attention to features, transfer learn faster CNNs[EB/OL]. (2019-09-26)[2022-01-30].https://openreview.net/forum?id=ryxyCeHtPB.
[10] PARK W, KIM D, LU Y, et al. Relational knowledge distillation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019: 3967-3976.
[11] PENG B, JIN X, LIU J, et al. Correlation congruence for knowledge distillation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019: 5007-5016.
[12] LIU Y, CAO J, LI B, et al. Knowledge distillation via instance relationship graph[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019: 7096-7104.
[13] LI X, WU J, FANG H, et al. Local correlation consistency for knowledge distillation[C]//16th European Conference on Computer Vision. Glasgow: Springer, 2020: 18-33.
[14] TIAN Y, KRISHNAN D, ISOLA P. Contrastive representation distillation[EB/OL]. arXiv: 1910.10699(2022-01-24)[2022-01-30].https://doi.org/10.48550/arXiv.1910.10699.
[15] CHEN D, MEI J P, ZHANG Y, et al. Cross-layer distillation with semantic calibration[C]//Thirty-Fifth AAAI Conference on Artificial Intelligence. Virtual Event: AAAI, 2021, 35(8): 7028-7036.
[16] YUN S, PARK J, LEE K, et al. Regularizing class-wise predictions via self-knowledge distillation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020: 13873-13882.
[17] ZEILER M D, TAYLOR G W, FERGUS R. Adaptive deconvolutional networks for mid and high level feature learning[C]//2011 International Conference on Computer Vision. Barcelona: IEEE, 2011: 2018-2025.
[18] ZHOU B, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 2921-2929.
[19] SELVARAJU R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359.
[20] LI K, WU Z, PENG K C, et al. Tell me where to look: guided attention inference network[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 9215-9223.
[21] VAN O A, KALCHBRENNER N, KAVUKCUOGLU K. Pixel recurrent neural networks[C]//Proceedings of the 33nd International Conference on Machine Learning. New York: JMLR. org, 2016: 1747-1756.
[1] 戴彬, 曾碧, 魏鹏飞, 黄永健. 改进判别式深度Dyna-Q的任务对话策略学习方法[J]. 广东工业大学学报, 2023, 40(04): 9-17,23.
[2] 钟耿君, 李东. 基于通道分离机制的双分支点云处理网络[J]. 广东工业大学学报, 2023, 40(04): 18-23.
[3] 吴亚迪, 陈平华. 基于用户长短期偏好和音乐情感注意力的音乐推荐模型[J]. 广东工业大学学报, 2023, 40(04): 37-44.
[4] 林哲煌, 李东. 语义引导下自适应拓扑推理图卷积网络的人体动作识别[J]. 广东工业大学学报, 2023, 40(04): 45-52.
[5] 黄晓湧, 李伟彤. 基于TSSI和STB-CNN的跌倒检测算法[J]. 广东工业大学学报, 2023, 40(04): 53-59.
[6] 陈晓荣, 杨雪荣, 成思源, 刘国栋. 基于改进Unet网络的锂电池极片表面缺陷检测[J]. 广东工业大学学报, 2023, 40(04): 60-66,93.
[7] 曹智雄, 吴晓鸰, 骆晓伟, 凌捷. 融合迁移学习与YOLOv5的安全帽佩戴检测算法[J]. 广东工业大学学报, 2023, 40(04): 67-76.
[8] 赖东升, 冯开平, 罗立宏. 基于多特征融合的表情识别算法[J]. 广东工业大学学报, 2023, 40(03): 10-16.
[9] 谢国波, 林立, 林志毅, 贺笛轩, 文刚. 基于YOLOv4-MP的绝缘子爆裂缺陷检测方法[J]. 广东工业大学学报, 2023, 40(02): 15-21.
[10] 陈靖宇, 吕毅. 基于脉冲神经网络的冷链制冷机结霜检测方法[J]. 广东工业大学学报, 2023, 40(01): 29-38.
[11] 叶文权, 李斯, 凌捷. 基于多级残差U-Net的稀疏SPECT图像重建[J]. 广东工业大学学报, 2023, 40(01): 61-67.
[12] 邹恒, 高军礼, 张树文, 宋海涛. 围棋机器人落子指引装置的设计与实现[J]. 广东工业大学学报, 2023, 40(01): 77-82,91.
[13] 谢光强, 许浩然, 李杨, 陈广福. 基于多智能体强化学习的社交网络舆情增强一致性方法[J]. 广东工业大学学报, 2022, 39(06): 36-43.
[14] 刘信宏, 苏成悦, 陈静, 徐胜, 罗文骏, 李艺洪, 刘拔. 高分辨率桥梁裂缝图像实时检测[J]. 广东工业大学学报, 2022, 39(06): 73-79.
[15] 熊武, 刘义. 粒子滤波算法在BDS高铁铁轨静态形变监测中的应用研究[J]. 广东工业大学学报, 2022, 39(04): 66-72.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!