广东工业大学学报 ›› 2023, Vol. 40 ›› Issue (04): 24-30,36.doi: 10.12052/gdutxb.220018
张家悦, 张灵
Zhang Jia-yue, Zhang Ling
摘要: 由于样本特征缺乏类别判定性且设备资源不足以支持样本类别结构学习,现有的知识蒸馏方法往往忽略了样本的类别知识蒸馏。针对此问题,本文提出一种增量式类激活知识蒸馏方法(Incremental Class Activation Knowledge Distillation,ICAKD)。首先,利用类激活梯度图提取具备类别判定性的样本特征,并提出类激活图约束损失。然后,构建存储类别判定性特征的增量式记忆库,保存多个训练批次样本并迭代更新。最后,计算记忆库内每一类样本的类质中心,并构造类别结构关系,根据类激活图约束和类别结构关系实现类别知识蒸馏。在Cifar10、Cifar100、Tiny-ImageNet、ImageNet等数据集上进行对比实验,结果表明本文所提出的方法对比类别结构蒸馏方法(Category Structure Knowledge Distillation,CSKD)在准确率上有0.4%~1.21%的提升,说明了类别判定性特征和增量式方法对类别知识蒸馏起到促进作用。
中图分类号:
[1] ZAGORUYKO S, KOMODAKIS N. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer[EB/OL]. arXiv: 1612.03928(2017-02-12)[2022-01-30].https://doi.org/10.48550/arXiv.1612.03928. [2] CHEN Z, ZHENG X, SHEN H, et al. Improving knowledge distillation via category structure[C]//16th European Conference on Computer Vision. Glasgow: Springer, 2020: 205-219. [3] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. arXiv: 1503.02531(2015-03-09)[2022-01-30]. https://doi.org/10.48550/arXiv.1503.02531. [4] MÜLLER R, KORNBLITH S, HINTON G. When does label smoothing help?[C]//Annual Conference on Neural Information Processing Systems 2019. Vancouver: MIT Press, 2019: 4694-4703. [5] DING Q, WU S, SUN H, et al. Adaptive regularization of labels[EB/OL]. arXiv: 1908.05474(2019-08-15)[2022-01-30].https://doi.org/10.48550/arXiv.1908.05474. [6] ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: hints for thin deep nets[EB/OL]. arXiv: 1412.6550(2015-03-27)[2022-01-30].https://doi.org/10.48550/arXiv.1412.6550. [7] JANG Y, LEE H, HWANG S J, et al. Learning what and where to transfer[C]//Proceedings of the 36th International Conference on Machine Learning. Long Beach: PMLR, 2019: 3030-3039. [8] HUANG Z, WANG N. Like what you like: knowledge distill via neuron selectivity transfer[EB/OL]. arXiv: 1707.012197(2017-12-18)[2022-01-30].https://doi.org/10.48550/arXiv.1707.01219. [9] WANG K, GAO X, ZHAO Y, et al. Pay attention to features, transfer learn faster CNNs[EB/OL]. (2019-09-26)[2022-01-30].https://openreview.net/forum?id=ryxyCeHtPB. [10] PARK W, KIM D, LU Y, et al. Relational knowledge distillation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019: 3967-3976. [11] PENG B, JIN X, LIU J, et al. Correlation congruence for knowledge distillation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019: 5007-5016. [12] LIU Y, CAO J, LI B, et al. Knowledge distillation via instance relationship graph[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019: 7096-7104. [13] LI X, WU J, FANG H, et al. Local correlation consistency for knowledge distillation[C]//16th European Conference on Computer Vision. Glasgow: Springer, 2020: 18-33. [14] TIAN Y, KRISHNAN D, ISOLA P. Contrastive representation distillation[EB/OL]. arXiv: 1910.10699(2022-01-24)[2022-01-30].https://doi.org/10.48550/arXiv.1910.10699. [15] CHEN D, MEI J P, ZHANG Y, et al. Cross-layer distillation with semantic calibration[C]//Thirty-Fifth AAAI Conference on Artificial Intelligence. Virtual Event: AAAI, 2021, 35(8): 7028-7036. [16] YUN S, PARK J, LEE K, et al. Regularizing class-wise predictions via self-knowledge distillation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020: 13873-13882. [17] ZEILER M D, TAYLOR G W, FERGUS R. Adaptive deconvolutional networks for mid and high level feature learning[C]//2011 International Conference on Computer Vision. Barcelona: IEEE, 2011: 2018-2025. [18] ZHOU B, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 2921-2929. [19] SELVARAJU R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359. [20] LI K, WU Z, PENG K C, et al. Tell me where to look: guided attention inference network[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 9215-9223. [21] VAN O A, KALCHBRENNER N, KAVUKCUOGLU K. Pixel recurrent neural networks[C]//Proceedings of the 33nd International Conference on Machine Learning. New York: JMLR. org, 2016: 1747-1756. |
[1] | 戴彬, 曾碧, 魏鹏飞, 黄永健. 改进判别式深度Dyna-Q的任务对话策略学习方法[J]. 广东工业大学学报, 2023, 40(04): 9-17,23. |
[2] | 钟耿君, 李东. 基于通道分离机制的双分支点云处理网络[J]. 广东工业大学学报, 2023, 40(04): 18-23. |
[3] | 吴亚迪, 陈平华. 基于用户长短期偏好和音乐情感注意力的音乐推荐模型[J]. 广东工业大学学报, 2023, 40(04): 37-44. |
[4] | 林哲煌, 李东. 语义引导下自适应拓扑推理图卷积网络的人体动作识别[J]. 广东工业大学学报, 2023, 40(04): 45-52. |
[5] | 黄晓湧, 李伟彤. 基于TSSI和STB-CNN的跌倒检测算法[J]. 广东工业大学学报, 2023, 40(04): 53-59. |
[6] | 陈晓荣, 杨雪荣, 成思源, 刘国栋. 基于改进Unet网络的锂电池极片表面缺陷检测[J]. 广东工业大学学报, 2023, 40(04): 60-66,93. |
[7] | 曹智雄, 吴晓鸰, 骆晓伟, 凌捷. 融合迁移学习与YOLOv5的安全帽佩戴检测算法[J]. 广东工业大学学报, 2023, 40(04): 67-76. |
[8] | 赖东升, 冯开平, 罗立宏. 基于多特征融合的表情识别算法[J]. 广东工业大学学报, 2023, 40(03): 10-16. |
[9] | 谢国波, 林立, 林志毅, 贺笛轩, 文刚. 基于YOLOv4-MP的绝缘子爆裂缺陷检测方法[J]. 广东工业大学学报, 2023, 40(02): 15-21. |
[10] | 陈靖宇, 吕毅. 基于脉冲神经网络的冷链制冷机结霜检测方法[J]. 广东工业大学学报, 2023, 40(01): 29-38. |
[11] | 叶文权, 李斯, 凌捷. 基于多级残差U-Net的稀疏SPECT图像重建[J]. 广东工业大学学报, 2023, 40(01): 61-67. |
[12] | 邹恒, 高军礼, 张树文, 宋海涛. 围棋机器人落子指引装置的设计与实现[J]. 广东工业大学学报, 2023, 40(01): 77-82,91. |
[13] | 谢光强, 许浩然, 李杨, 陈广福. 基于多智能体强化学习的社交网络舆情增强一致性方法[J]. 广东工业大学学报, 2022, 39(06): 36-43. |
[14] | 刘信宏, 苏成悦, 陈静, 徐胜, 罗文骏, 李艺洪, 刘拔. 高分辨率桥梁裂缝图像实时检测[J]. 广东工业大学学报, 2022, 39(06): 73-79. |
[15] | 熊武, 刘义. 粒子滤波算法在BDS高铁铁轨静态形变监测中的应用研究[J]. 广东工业大学学报, 2022, 39(04): 66-72. |
|