广东工业大学学报 ›› 2021, Vol. 38 ›› Issue (03): 17-21.doi: 10.12052/gdutxb.200124

• • 上一篇    下一篇

基于全局数据混洗的小样本数据预测方法

赖峻, 刘震宇, 刘圣海   

  1. 广东工业大学 信息工程学院,广东 广州 510006
  • 收稿日期:2020-09-22 出版日期:2021-05-10 发布日期:2021-03-12
  • 作者简介:赖峻(1979-),男,讲师,博士,主要研究方向为深度学习、网络流量处理、视频处理等,E-mail:laijun@gdut.edu.cn
  • 基金资助:
    广州市科技计划资助项目(201907010003)

A Small Sample Data Prediction Method Based on Global Data Shuffling

Lai Jun, Liu Zhen-yu, Liu Sheng-hai   

  1. School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2020-09-22 Online:2021-05-10 Published:2021-03-12

摘要: 以广州车牌竞拍价格数据集为数据来源, 采用线性回归并结合k折交叉验证, 研究小样本数据集的预测方法。为解决小样本局部特异性数据导致的验证误差增大的问题, 提出验证之前先对数据进行全局混洗的策略。最后通过实验验证了此策略可以明显降低验证误差, 以此为基础, 通过多组实验验证, 确定了合适的参数, 结果表明最终预测值的总平均正确率达到了95%。

关键词: 线性回归, k折交叉验证, 随机梯度下降, 数据混洗, 深度学习

Abstract: Based on the Guangzhou license plate auction price data set, linear regression combined with k-fold cross-validation is used to study the prediction method of a small sample data set. In order to solve the problem of increased verification errors caused by local specific data in a small sample set, a strategy to shuffle the data globally before verification is proposed. Finally, it is verified through experiments that this strategy can significantly reduce the verification error. Based on this, through multiple sets of experimental verification, the appropriate parameters are determined, and the results show that the total average correct rate of the final predicted value has reached 95%.

Key words: linear regression, k-fold cross-validation, stochastic gradient descent, data shuffling, deep learning

中图分类号: 

  • TP183
[1] SEVGICAN S, TRRAN M, GOKARSLAN K, et al. Intelligent network data analytics function in 5G cellular networks using machine learning [J]. Journal of Communications and Networks, 2020, 22(3): 269-280.
[2] MADHURI C R, ANURADHA G, PUJITHA M V. House price prediction using regression techniques: a comparative study[C]//2019 International Conference on Smart Structures and Systems (ICSSS). Chennai: IEEE, 2019: 1-5.
[3] JAHANDARI S, KALHOR A, ARAABI B N. Online forecasting of synchronous time series based on evolving linear models [J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(5): 1865-1876.
[4] HASAN M M, SULTANA M I, SALMA U, et al. Investigation of influential factors towards predicting birth rate in Bangladesh[C]//2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE). Vellore: IEEE, 2020: 1-6.
[5] 谢振东, 刘雪琴, 吴金成, 等. 公交IC卡数据客流预测模型研究[J]. 广东工业大学学报, 2018, 35(1): 16-22.
XIE Z D, LIU X Q, WU J C, et al. A Study of passenger flow prediction based on IC card data [J]. Journal of Guangdong University of Technology, 2018, 35(1): 16-22.
[6] BEAN W T, STAFFORD R, BRASHARES J S. The effects of small sample size and sample bias on threshold selection and accuracy assessment of species distribution models [J]. Ecography, 2012, 35: 250-258.
[7] ASIRET S, SUNBUL S O. Investigating test equating methods in small samples through various factors [J]. Educational Sciences: Theory & Practice, 2016, 16(2): 647-668.
[8] DABBAGHCHIAN S, AGHAGOLZADEH A, MOIN M S. Reducing the effects of small sample size in DCT domain for face recognition[C]//2008 International Symposium on Telecommunications. Tehran: IEEE, 2008: 634-638.
[9] ZHANG H, YUAN H, LI P. Estimation method for extremely small sample accelerated degradation test data[C]//First International Conference on Reliability Systems Engineering (ICRSE). Beijing: IEEE, 2015: 21-23.
[10] FURSOV V A, GAVRILOV A V, KOTOV A P. Prediction of estimates' accuracy for linear regression with a small sample size[C]//2018 41st International Conference on Telecommunications and Signal Processing (TSP). Athens: IEEE, 2018: 1-7.
[11] ZHENG C, WANG N, CUI J. Hyperspectral Image Classification With Small Training Sample Size Using Superpixel-Guided Training Sample Enlargement [J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(10): 7307-7316.
[12] TSAKIRIS M C, PENG L, CONCA A, et al. An Algebraic-Geometric Approach for Linear Regression Without Correspondences [J]. IEEE Transactions on Information Theory, 2020, 66(8): 5130-5144.
[13] 广州产权交易所. 广州市中小客车指标竞价情况表[EB/OL]. [2020-09-21]. http://www.gzqcjj.com/article/gonggao,
[14] ZHANG A, LI M, LIPTON Z C. Dive into deep learning[EB/OL]. [2020-09-21]. http://www.d2l.ai.
[15] GOODFELLOW I, BENGIO Y, COURVILLE A. Deep Learning[M]. Cambridge: MIT Press, 2016.
[16] Torch Contributors. Pytorch documentation [EB/OL]. [2020-09-21]. https://pytorch.org/docs/stable/index.html.
[1] 吴俊贤, 何元烈. 基于通道注意力的自监督深度估计方法[J]. 广东工业大学学报, 2023, 40(02): 22-29.
[2] 刘冬宁, 王子奇, 曾艳姣, 文福燕, 王洋. 基于复合编码特征LSTM的基因甲基化位点预测方法[J]. 广东工业大学学报, 2023, 40(01): 1-9.
[3] 徐伟锋, 蔡述庭, 熊晓明. 基于深度特征的单目视觉惯导里程计[J]. 广东工业大学学报, 2023, 40(01): 56-60,76.
[4] 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9.
[5] 章云, 王晓东. 基于受限样本的深度学习综述与思考[J]. 广东工业大学学报, 2022, 39(05): 1-8.
[6] 曾江毅, 李志生, 欧耀春, 金宇凯. 季节指数改进的PM2.5质量浓度组合预测模型研究[J]. 广东工业大学学报, 2022, 39(03): 89-94.
[7] 郑佳碧, 杨振国, 刘文印. 基于细粒度混杂平衡的营销效果评估方法[J]. 广东工业大学学报, 2022, 39(02): 55-61.
[8] Gary Yen, 栗波, 谢胜利. 地球流体动力学模型恢复的长短期记忆网络渐进优化方法[J]. 广东工业大学学报, 2021, 38(06): 1-8.
[9] 岑仕杰, 何元烈, 陈小聪. 结合注意力与无监督深度学习的单目深度估计[J]. 广东工业大学学报, 2020, 37(04): 35-41.
[10] 曾碧, 任万灵, 陈云华. 基于CycleGAN的非配对人脸图片光照归一化方法[J]. 广东工业大学学报, 2018, 35(05): 11-19.
[11] 谢振东, 刘雪琴, 吴金成, 冷梦甜. 公交IC卡数据客流预测模型研究[J]. 广东工业大学学报, 2018, 35(01): 16-22.
[12] 陈旭, 张军, 陈文伟, 李硕豪. 卷积网络深度学习算法与实例[J]. 广东工业大学学报, 2017, 34(06): 20-26.
[13] 刘震宇, 李嘉俊, 王昆. 一种基于深度自编码器的指纹匹配定位方法[J]. 广东工业大学学报, 2017, 34(05): 15-21.
[14] 梁迅. 建筑工程单方造价快速估算方法综述[J]. 广东工业大学学报, 2012, 29(3): 107-110.
[15] 鲍芳; 汪仁煌; . 沥青流量智能测试系统数学模型的研究[J]. 广东工业大学学报, 1999, 16(1): 9-12.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!