基于受限样本的深度学习综述与思考

doi:10.12052/gdutxb.220092

摘要/Abstract

摘要： 深度学习目前依靠大数据和强算力取得了较大进展，但在样本受限情况下的表现差强人意，主要问题在于函数空间(簇)的建构和在数据集受限情况下算法的设计。据此，本文对受限样本下的深度学习进行了分类综述。另外，从目前对大脑的研究来看，人的认知过程在大脑中是分区域的，每个区域担负的功能是不同的，对每个区域功能的学习过程也应该是有差异的。因此，提出了“功能进阶”式的深度学习的设想，试图构建分区分层多种功能模块组成的网络结构，研究“进阶”式的功能模块训练方法，以期探求“仿人学习”的新路径。

关键词: 深度学习方法, 卷积神经网络, 小样本学习, 功能进阶

Abstract: Deep learning has achieved great success with big data and powerful computing, but its performance is poor under sample constraint, mainly due to the construction of function space (clusters) and the design of algorithms under dataset constraint. Accordingly, a categorical review of deep learning under restricted samples is presented. In addition, according to the current research on the brain, the cognitive process of humankind is categorized in the brain with different regions, and the cognitive functions of each region are also different. Therefore, the training function of each region should also be different. At this point, an idea of deep learning method using functional evolution is proposed, trying to create a network structure composed of multiple functional modules, and the training procedure of the functional module used in this method is studied, aiming to explore the new area of "humanoid learning".

Key words: deep learning method, convolutional neural network, restricted sample learning, functional evolution

中图分类号:

TP183

章云, 王晓东. 基于受限样本的深度学习综述与思考[J]. 广东工业大学学报, 2022, 39(05): 1-8.

Zhang Yun, Wang Xiao-dong. A Review and Thinking of Deep Learning with a Restricted Number of Samples[J]. Journal of Guangdong University of Technology, 2022, 39(05): 1-8.

参考文献

[1] 徐宗本. 人工智能的十个重大数理基础问题[J]. 中国科学:信息科学, 2021, 51(12): 1967-1978.
XU Z B. Ten fundamental problems for artificial intelligence: mathematical and physical aspects [J]. Scientia Sinica(Informationis), 2021, 51(12): 1967-1978.
[2] WANG Y, YAO Q, KWOK J T, et al. Generalizing from a few examples: a survey on few-shot learning [J]. ACM Computing Surveys (Csur), 2020, 53(3): 1-34.
[3] LIU B, WANG X D, DIXIT M, et al. Feature space transfer for data augmentation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 9090-9098.
[4] TSAI Y H H, SALAKHUTDINOV R. Improving one-shot learning through fusing side information[EB/OL]. (2018-01-23)[2022-05-13]. https://arxiv.org/abs/1710.08347.
[5] TRIANTAFILLOU E, ZEMEL R, URTASUN R. Few-shot learning through an information retrieval lens[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2017, 30: 2252-2262.
[6] YAN L M, ZHENG Y H, CAO J. Few-shot learning for short text classification [J]. Multimedia Tools and Applications, 2018, 77(22): 29799-29810.
[7] RAVI S, LAROCHELLE H. Optimization as a model for few-shot learning[C] //Proceedings of the International Conference on Learning Representations. Cambridge: ICLR, 2017: 1-11.
[8] YOO D, FAN H, BODDETI V N, et al. Efficient k-shot learning with regularized deep networks[EB/OL]. (2017-10-06) [2022-05-13]. https://arxiv.org/abs/1710.02277.
[9] HONG J, FANG P, LI W, et al. Reinforced attention for few-shot learning and beyond [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2021: 913-923.
[10] ELSKEN T, STAFFLER B, METZEN J H, et al. Meta-learning of neural architectures for few-shot learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 12365-12375.
[11] PAN S J, YANG Q. A survey on transfer learning [J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 22(10): 1345-1359.
[12] ZHUANG F, QI Z, DUAN K, et al. A comprehensive survey on transfer learning [J]. Proceedings of the IEEE, 2020, 109(1): 43-76.
[13] JIANG J, ZHAI C. Instance weighting for domain adaptation in NLP[C]// Proceedings of the Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2007: 264-271.
[14] DAI W, YANG Q, XUE G R, et al. Boosting for transfer learning[C]// Proceedings of the International Conference on Machine Learning. New York: PMLR, 2007: 193-200.
[15] SUGIYAMA M, NAKAJIMA S, KASHIMA H, et al. Direct importance estimation with model selection and its application to covariate shift adaptation[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2007, 20: 1433-1440.
[16] SAITO K, USHIKU Y, HARADA T. Asymmetric tri-training for unsupervised domain adaptation[C]//Proceedings of the International Conference on Machine Learning. New York: PMLR, 2017: 2988-2997.
[17] DUAN L, TSANG I W, XU D. Domain transfer multiple kernel learning [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(3): 465-479.
[18] ZHUANG F, LUO P, YIN P, et al. Concept learning for cross-domain text classification: a general probabilistic framework[C]//Twenty-Third International Joint Conference on Artificial Intelligence. Amsterdam: Elsevier, 2013: 1960-1966.
[19] BONILLA E V, CHAI K, WILLIAMS C. Multi-task Gaussian process prediction[C] //Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2007, 20: 153-160.
[20] SCHWAIGHOFER A, TRESP V, YU K. Learning Gaussian process kernels via hierarchical Bayes[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2004, 17: 1209-1216.
[21] MIHALKOVA L, MOONEY R J. Transfer learning by mapping with minimal target data[C] //Proceedings of the AAAI-08 Workshop on Transfer Learning for Complex Tasks. Palo Alto: AAAI Press, 2008: 1163-1168.
[22] DAVIS J, DOMINGOS P. Deep transfer via second-order markov logic [C]//Proceedings of the International Conference on Machine Learning. New York: PMLR, 2009: 217-224.
[23] TAN B, SONG Y, ZHONG E, et al. Transitive transfer learning[C] //Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2015: 1155-1164.
[24] TAN B, ZHANG Y, PAN S, et al. Distant domain transfer learning [C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017, 31: 2604-2610.
[25] YOU K C, LIU Y, WANG J M, et al. Logme: practical assessment of pre-trained models for transfer learning[C]//Proceedings of the International Conference on Machine Learning. New York: PMLR, 2021: 12133-12143.
[26] BEN-DAVID S, BLITZER J, CRAMMER K, et al. Analysis of representations for domain adaptation[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2006, 19: 137-144.
[27] BLITZER J, CRAMMER K, KULESZA A, et al. Learning bounds for domain adaptation[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2007, 20: 129-136.
[28] BEN-DAVID S, BLITZER J, CRAMMER K, et al. A theory of learning from different domains [J]. Machine learning, 2010, 79(1): 151-175.
[29] HUISMAN M, VANRIJN J N, PLAAT A. A survey of deep meta-learning [J]. Artificial Intelligence Review, 2021, 54(6): 4483-4541.
[30] HOSPEDALES T, ANTONIOU A, MICAELLI P, et al. Meta-learning in neural networks: a survey[EB/OL]. https://arxiv.org/abs/2004.05439.
[31] KOCH G, ZEMEL R, SALAKHUTDINOV R. Siamese neural networks for one-shot image recognition[C] //ICML deep learning workshop. New York: PMLR, 2015, 2: 1-8.
[32] VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[C]//Advances in Neural Information Processing Systems. Cambridge, Massachusetts: MIT Press, 2016, 29: 3637-3645.
[33] SNELL J, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2017, 30: 4080-4090.
[34] SUNG F, YANG Y, ZHANG L, et al. Learning to compare: Relation network for few-shot learning[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 1199-1208.
[35] SANTORO A, BARTUNOV S, BOTVINICK M, et al. Meta-learning with memory-augmented neural networks[C]//Proceedings of the International Conference on Machine Learning. New York: PMLR, 2016: 1842–1850.
[36] MUNKHDALAI T, YU H. Meta networks[C]//Proceedings of the International Conference on Machine Learning. New York: PMLR, 2017: 2554-2563.
[37] CAI Q, PAN Y, YAO T, et al. Memory matching networks for one-shot image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 4080-4088.
[38] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the International Conference on Machine Learning. New York: PMLR, 2017: 1126-1135.
[39] RUSU A A, RAO D, SYGNOWSKI J, et al. Meta-learning with latent embedding optimization[EB/OL]. (2019-03-26)[2022-05-13]. https://arxiv.org/abs/1807.05960v3.
[40] LEE Y, CHOI S. Gradient-based meta-learning with learned layerwise metric and subspace[C]//Proceedings of the International Conference on Machine Learning. New York: PMLR, 2018: 2927-2936.
[41] Al-SHEDIVAT M, BANSAL T, BURDA Y, et al. Continuous adaptation via meta-learning in nonstationary and competitive environments[EB/OL]. (2018-02-23)[2022-05-13]. https://arxiv.org/abs/1710.03641.
[42] GRANT E, FINN C, LEVINE S, et al. Recasting gradient-based meta-learning as hierarchical bayes[EB/OL]. (2018-01-26)[2022-05-13]. https://arxiv.org/abs/1801.08930v1.
[43] NAGABANDI A, CLAVERA I, LIU S, et al. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning[EB/OL]. (2019-02-27)[2022-05-13]. https://arxiv.org/abs/1803.11347v6.
[44] BENGIO Y, LOURADOUR J, COLLOBERT R, et al. Curriculum learning[C] //Proceedings of the International Conference on Machine Learning. New York: PMLR, 2009: 41-48.
[45] KUMAR M P, PACKER B, KOLLER D. Self-paced learning for latent variable models[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2010, 23: 1189-1197.
[46] JIANG L, MENG D Y, MITAMURA T, et al. Easy samples first: self-paced reranking for zero-example multimedia search[C]// Proceedings of the ACM International Conference on Multimedia. New York: ACM, 2014: 547-556.
[47] ZHANG D W, MENG D Y, HAN J W. Co-saliency detection via a self-paced multiple-instance learning framework [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(5): 865-878.
[48] JIANG L, MENG D Y, ZHAO Q, et al. Self-paced curriculum learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2015: 2694-2700.
[49] SUPANCIC J S, RAMANAN D. Self-paced learning for long-term tracking[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2013: 2379-2386.
[50] MENG D Y, ZHAO Q, JIANG L. A theoretical understanding of self-paced learning [J]. Information Sciences, 2017, 414: 319-328.
[51] MA Z L, LIU S Q, MENG D Y, et al. On convergence properties of implicit self-paced objective [J]. Information Sciences, 2018, 462: 132-140.
[52] LIU S Q, MA Z L, MENG D Y, et al. Understanding self-paced learning under concave conjugacy theory [J]. Communications in Information and Systems, 2018, 1(1): 1-27.
[53] 束俊, 孟德宇, 徐宗本. 元自步学习[J]. 中国科学: 信息科学, 2020, 50(6): 781-793.
SHU J, MENG D Y. XU Z B, Meta self-paced learning [J]. Scientia Sinica (Informationis), 2020, 50(6): 781-793.
[54] WANG X, CHEN Y D, ZHU W W. A survey on curriculum learning [J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence. (2021-03-31)[2022-05-13]. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9392296.
[55] JIANG L, ZHOU Z, LEUNG T, et al. Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels[C]//Proceedings of the International Conference on Machine Learning. New York: PMLR, 2018: 2304-2313.
[56] KIM T-H, CHOI J. Screenernet: learning self-paced curriculum for deep neural networks [EB/OL]. (2018-01-03)[2022-05-13]. https://arxiv.org/abs/1801.00904.
[57] WU L, TIAN F, XIA Y, et al. Learning to teach with dynamic loss functions[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2018, 31: 6467-6478.
[58] GUO S, HUANG W, ZHANG H, et al. Curriculumnet: weakly supervised learning from large-scale web images [C] // Computer Vision – ECCV 2018.Munich: European Conference on Computer Vision (ECCV), 2018: 135-150.
[59] SHU Y, CAO Z, LONG M, et al. Transferable curriculum for weakly-supervised domain adaptation[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019, 33(1): 4951-4958.
[60] TSVETKOV Y, FARUQUI M, LING W, et al. Learning the curriculum with bayesian optimization for task-specific word representation learning[C] //Proceedings of the Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2016: 130-139.
[61] SAXENA S, TUZEL O, DECOSTE D. Data parameters: a new family of parameters for learning a differentiable curriculum[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2019, 32: 11095-11105.
[62] ZHANG X, KUMAR G, KHAYRALLAH H, et al. An empirical exploration of curriculum learning for neural machine translation[EB/OL]. (2018-11-02)[2022-05-13]. https://arxiv.org/abs/1811.00739.
[63] HACOHEN G, WEINSHALL D. On the power of curriculum learning in training deep networks[C]// Proceedings of the International Conference on Machine Learning. New York: PMLR, 2019: 2535-2544.
[64] WANG W, CASWELL I, CHELBA C. Dynamically composing domain-data selection with clean-data selection by "co-curricular learning" for neural machine translation[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 1282-1292.

Metrics

Viewed

Full text

2840

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	2840

From	Others	local

Times	502	2338
Rate	18%	82%

Abstract

1394

Just accepted	Online first	Issue

0	0	1394

Cited

Shared

Discussed