广东工业大学学报 ›› 2023, Vol. 40 ›› Issue (03): 17-24.doi: 10.12052/gdutxb.210157
金宇凯, 李志生, 欧耀春, 张华刚, 曾江毅, 陈搏超
Jin Yu-kai, Li Zhi-sheng, Ou Yao-chun, Zhang Hua-gang, Zeng Jiang-yi, Chen Bo-chao
摘要: 本文提出了一个基于多阶段聚类的深度神经网络(Deep Neural Network,DNN)预测模型,用于多步骤PM2.5质量浓度预测。建议的模型包括分解聚类和预测。在聚类部分中,第1阶段采用的是HDBSCAN(Hierarchical Density-based Spatial Clustering of Applications with Noise,HDB)密度聚类来剔除噪点,在此基础上,再进行第2阶段聚类。第2阶段聚类采用的是Kmeans、Agglomerative、高斯混合以及BIRCH聚类算法(Balanced Iterative Reducing and Clustering Using Hierarchies)4种聚类算法。在预测部分中,使用了DNN作为预测器,选取了深圳市11个空气质量监测站的2015全年逐时数据来验证模型的有效性。实验结果表明,基于多阶段聚类的预测模型适合PM2.5质量浓度的多步高精度预测,性能优于无聚类预测模型以及单阶段聚类预测模型。
中图分类号:
[1] LI F, YAN J, WEI Y, et al. PM2.5-bound heavy metals from the major cities in China: spatiotemporal distribution, fuzzy exposure assessment and health risk management [J]. Journal of Cleaner Production, 2021, 286: 124967. [2] BU X, XIE Z, LIU J, et al. Global PM2.5-attributable health burden from 1990 to 2017: estimates from the Global Burden of disease study 2017 [J]. Environ Res, 2021, 197: 111123. [3] XU B, LUO L, LIN B. A dynamic analysis of air pollution emissions in China: evidence from nonparametric additive regression models [J]. Ecological Indicators, 2016, 63: 346-358. [4] OLSON D A, BURKE J M. Distributions of PM2.5 source strengths for cooking from the research triangle park particulate matter panel study [J]. Environmental Science & Technology, 2006, 40(1): 163-169. [5] LUO R, DAI H, ZHANG Y, et al. Association of short-term exposure to source-specific PM2.5 with the cardiovascular response during pregnancy in the Shanghai MCPC study [J]. Science of The Total Environment, 2021, 775: 145725. [6] 白盛楠, 申晓留. 基于LSTM循环神经网络的PM2.5预测[J]. 计算机应用与软件, 2019, 36(1): 67-70. BAI S N, SHEN X L. PM2.5 prediction based on lstm recurrent neural network [J]. Computer Applications and Software, 2019, 36(1): 67-70. [7] 赵文芳, 林润生, 唐伟, 等. 基于深度学习的PM2.5短期预测模型[J]. 南京师大学报(自然科学版), 2019, 42(3): 32-41. ZHAO W F, LIN R S, TANG W, et al. Forecasting Model of Short-Term PM2.5 Concentration Based on Deep Learning [J]. Journal of Nanjing Normal University (Natural Science Edition), 2019, 42(3): 32-41. [8] SOH P W, CHANG J W, HUANG J W. Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations [J]. IEEE Access, 2018, 6: 38186-38199. [9] ZHAO G, HUANG G, HE H, et al. Regional spatiotemporal collaborative prediction model for air quality [J]. IEEE Access, 2019, 7: 134903-134919. [10] MA J, DING Y X, GAN V J L, et al. Spatiotemporal prediction of PM2.5 concentrations at different time granularities using IDW-BLSTM [J]. IEEE Access, 2019, 7: 107897-107907. [11] 黄婕, 张丰, 杜震洪, 等. 基于RNN-CNN集成深度学习模型的PM2.5 小时浓度预测[J]. 浙江大学学报(理学版), 2019, 46(3): 370-379. HUANG J, ZHANG F, DU Z H, et al. Hourly concentration prediction of PM2.5 based on RNN-CNN ensemble deep learning model [J]. Journal of Zhejiang University (Science Edition), 2019, 46(3): 370-379. [12] NIU M, WANG Y, SUN S, et al. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting [J]. Atmospheric Environment, 2016, 134: 168-180. [13] GAN K, SUN S, WANG S, et al. A secondary-decomposition-ensemble learning paradigm for forecasting PM2.5 concentration [J]. Atmospheric Pollution Research, 2018, 9(6): 989-999. [14] CHENG Y, ZHANG H, LIU Z, et al. Hybrid algorithm for short-term forecasting of PM2.5 in China [J]. Atmospheric Environment, 2019, 200: 264-279. [15] 康俊锋, 谭建林, 方雷, 等. XGBoost-LSTM变权组合模型支持下的短期PM2.5 浓度预测——以上海为例[J]. 中国环境科学, 2021, 41(9): 4016-4025. KANG J F, TAN J L, FANG L, et al. Short-term PM2.5 concentration prediction based on XGBoost and LSTM variable weight combination model: a case study of Shanghai[J]. China Environmental Science, 2021, 41(9): 4016-4025. [16] 周丽娟, 王慧, 王文伯, 等. 面向海量数据的并行KMeans算法[J]. 华中科技大学学报(自然科学版), 2012, 40(S1): 150-152. ZHOU L J, WANG H, WANG W B, et al. Parallel KMeans algorithm for massive data [J]. Journal of Huazhong University of science and Technology (Natural Science Edition), 2012, 40(S1): 150-152. [17] 杨勇, 任淑霞, 冉娟, 等. 基于粒子群优化的k-means改进算法实现Web日志挖掘[J]. 计算机应用, 2016, 36(S1): 29-32. YANG Y, REN S X, RAN J, et al. Improved k-means algorithm for Web log mining based on particle swarm optimization [J]. Computer Applications, 2016, 36(S1): 29-32. [18] HUANG S, KANG Z, XU Z, et al. Robust deep k-means: an effective and simple method for data clustering [J]. Pattern Recognition, 2021, 117: 107996. [19] ALGULIYEV R M, ALIGULIYEV R M, SUKHOSTAT L V. Parallel batch k-means for Big data clustering [J]. Computers & Industrial Engineering, 2021, 152: 107023. [20] 李如梅, 闫雨龙, 段小琳, 等. 基于聚类分析的长治市夏季VOCs来源及活性[J]. 中国环境科学, 2020, 40(8): 3249-3259. LI R M, YAN Y L, DUAN X L, et al. Source apportionment and chemical reactivity of VOCs based on clustering during summertime in Changzhi [J]. China Environmental Science, 2020, 40(8): 3249-3259. [21] 周军锋, 陈伟, 费春苹, 等. BiRch: 一种处理k步可达性查询的双向搜索算法[J]. 通信学报, 2015, 36(8): 50-60. ZHOU J F, CHEN W, FEI C P, et al. BiRch: a bidirectional search algorithm for k-step reachability queries [J]. Acta communication Sinica, 2015, 36(8): 50-60. [22] 乔少杰, 金琨, 韩楠, 等. 一种基于高斯混合模型的轨迹预测算法[J]. 软件学报, 2015, 26(5): 1048-1063. QIAO S J, JIN K, HAN N, et al. Trajectory prediction algorithm based on Gaussian mixture model [J]. Acta Sinica Sinica, 2015, 26(5): 1048-1063. [23] 崔玮, 吴成东, 张云洲, 等. 基于高斯混合模型的非视距定位算法[J]. 通信学报, 2014, 35(1): 99-106. CUI W, WU C D, ZHANG Y Z, et al. GMM-based localization algorithm under NLOS conditions [J]. Acta Communication Sinica, 2014, 35(1): 99-106. [24] 宋董飞, 徐华. DBSCAN算法研究及并行化实现[J]. 计算机工程与应用, 2018, 54(24): 52-56. SONG D F, XU H. Research and parallelization of DBSCAN algorithm [J]. Computer Engineering and Application, 2018, 54(24): 52-56. [25] 张旭, 杜景林. 改进PSO-GA-BP的PM2.5浓度预测[J]. 计算机工程与设计, 2019, 40(6): 1718-1723. ZHANG X, DU J L. PM2.5 concentration prediction based on improved PSO-GA-BP [J]. Computer Engineering and Design, 2019, 40(6): 1718-1723. [26] 杨俊闯, 赵超. K-means聚类算法研究综述[J]. 计算机工程与应用, 2019, 55(23): 7-14. YANG J C, ZHAO C. Survey on K-means Clustering Algorithm [J]. Computer Engineering and Application, 2019, 55(23): 7-14. [27] 董宏成, 赵学华, 赵成, 等. 基于HDBACAN聚类的自适应过采样技术[J]. 计算机工程与设计, 2020, 41(5): 1295-1300. DONG H C, ZHAO X H, ZHAO C, et al. Adaptive oversampling based on HDBACAN [J]. Computer Engineering and Design, 2020, 41(5): 1295-1300. [28] 刘仕友, 宋炜, 应明雄, 等. 基于波形特征向量的凝聚层次聚类地震相分析[J]. 物探与化探, 2020, 44(2): 339-349. LIU S Y, SONG W, YING M X, et al. Agglomerative hierarchical clustering seismic facies analysis based on waveform eigenvector [J]. Geophysical and Geochemical Exploration, 2020, 44(2): 339-349. [29] 赵洪科, 吴李康, 李徵, 等. 基于深度神经网络结构的互联网金融市场动态预测[J]. 计算机研究与发展, 2019, 56(8): 1621-1631. ZHAO H K, WU L K, LI Z, et al. Predicting the dynamics in internet finance based on deep neural network structure [J]. Computer Research and Development, 2019, 56(8): 1621-1631. [30] DUAN J. Financial system modeling using deep neural networks (DNNs) for effective risk assessment and prediction [J]. Journal of the Franklin Institute, 2019, 356(8): 4716-4731. [31] OU D, TAN K, LAI J, et al. Semi-supervised DNN regression on airborne hyperspectral imagery for improved spatial soil properties prediction [J]. Geoderma, 2021, 385: 114875. [32] LIU H, LONG Z, DUAN Z, et al. A new model using multiple feature clustering and neural networks for forecasting hourly PM2.5 concentrations, and its applications in China [J]. Engineering, 2020, 6(8): 944-956. [33] 丁泉, 李帅. 智能变电站重采样应用研究及其线性插值法误差分析[J]. 电力系统保护与控制, 2015, 43(23): 132-136. DING Q, LI S. Application study on resampling in smart substation with error analysis of linear interpolation [J]. Power System Protection and Control, 2015, 43(23): 132-136. [34] 喻其炳, 李勇, 白云, 等. 基于聚类分析与偏最小二乘法的支持向量机PM2.5预测[J]. 环境科学与技术, 2017, 40(6): 157-164. YU Q B, LI Y, BAI Y, et al. Support vector machine PM2.5 concentration prediction based on K-means clustering and partial least square [J]. Environmental Science and Technology, 2017, 40(6): 157-164. [35] 于彦伟, 贾召飞, 曹磊, 等. 面向位置大数据的快速密度聚类算法[J]. 软件学报, 2018, 29(8): 2470-2484. YU Y W, JIA Z F, CAO L, et al. Fast density-based clustering algorithm for location big data [J]. Acta Sinica, 2018, 29(8): 2470-2484. [36] 朱连江, 马炳先, 赵学泉. 基于轮廓系数的聚类有效性分析[J]. 计算机应用, 2010, 30(S2): 139-141. ZHU L J, MA B X, ZHAO X Q. Clustering validity analysis based on silhouette coefficient [J]. Computer Applications, 2010, 30(S2): 139-141. |
[1] | 李杨, 周莹. 基于方向控制的差分隐私轨迹数据发布方法[J]. 广东工业大学学报, 2023, 40(05): 56-63. |
[2] | 温雯, 刘莹, 蔡瑞初, 郝志峰. 面向多粒度交通流预测的时空深度回归模型[J]. 广东工业大学学报, 2023, 40(04): 1-8. |
[3] | 钟耿君, 李东. 基于通道分离机制的双分支点云处理网络[J]. 广东工业大学学报, 2023, 40(04): 18-23. |
[4] | 吴俊贤, 何元烈. 基于通道注意力的自监督深度估计方法[J]. 广东工业大学学报, 2023, 40(02): 22-29. |
[5] | 樊娟, 邓秀勤, 刘玉兰. 一种基于Fréchet距离的谱聚类算法[J]. 广东工业大学学报, 2023, 40(02): 39-44. |
[6] | 莫赞, 范梦婷, 刘洪伟, 严杨帆. 基于在线用户行为的产品非对称竞争市场结构研究[J]. 广东工业大学学报, 2023, 40(02): 111-119. |
[7] | 刘冬宁, 王子奇, 曾艳姣, 文福燕, 王洋. 基于复合编码特征LSTM的基因甲基化位点预测方法[J]. 广东工业大学学报, 2023, 40(01): 1-9. |
[8] | 徐伟锋, 蔡述庭, 熊晓明. 基于深度特征的单目视觉惯导里程计[J]. 广东工业大学学报, 2023, 40(01): 56-60,76. |
[9] | 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9. |
[10] | 章云, 王晓东. 基于受限样本的深度学习综述与思考[J]. 广东工业大学学报, 2022, 39(05): 1-8. |
[11] | 郑佳碧, 杨振国, 刘文印. 基于细粒度混杂平衡的营销效果评估方法[J]. 广东工业大学学报, 2022, 39(02): 55-61. |
[12] | Gary Yen, 栗波, 谢胜利. 地球流体动力学模型恢复的长短期记忆网络渐进优化方法[J]. 广东工业大学学报, 2021, 38(06): 1-8. |
[13] | 赖峻, 刘震宇, 刘圣海. 基于全局数据混洗的小样本数据预测方法[J]. 广东工业大学学报, 2021, 38(03): 17-21. |
[14] | 岑仕杰, 何元烈, 陈小聪. 结合注意力与无监督深度学习的单目深度估计[J]. 广东工业大学学报, 2020, 37(04): 35-41. |
[15] | 范梦婷, 刘洪伟, 高鸿铭, 何锐超. 电子商务平台下的竞争产品市场结构研究[J]. 广东工业大学学报, 2019, 36(06): 32-37. |
|