基于多阶段聚类的PM2.5质量浓度预测及对比研究

doi:10.12052/gdutxb.210157

摘要/Abstract

摘要： 本文提出了一个基于多阶段聚类的深度神经网络(Deep Neural Network，DNN)预测模型，用于多步骤PM_2.5质量浓度预测。建议的模型包括分解聚类和预测。在聚类部分中，第1阶段采用的是HDBSCAN(Hierarchical Density-based Spatial Clustering of Applications with Noise，HDB)密度聚类来剔除噪点，在此基础上，再进行第2阶段聚类。第2阶段聚类采用的是Kmeans、Agglomerative、高斯混合以及BIRCH聚类算法(Balanced Iterative Reducing and Clustering Using Hierarchies)4种聚类算法。在预测部分中，使用了DNN作为预测器，选取了深圳市11个空气质量监测站的2015全年逐时数据来验证模型的有效性。实验结果表明，基于多阶段聚类的预测模型适合PM_2.5质量浓度的多步高精度预测，性能优于无聚类预测模型以及单阶段聚类预测模型。

关键词: PM_2.5预测, 聚类, 深度学习, 对比研究

Abstract: A deep neural network (DNN) prediction model based on multi-stage clustering is proposed for multi-step PM_2.5 concentration prediction. The proposed model includes decomposition, clustering and prediction. In the part of clustering, the first stage uses HDBscan density clustering to eliminate the noise, and then carries on the second stage clustering. In the second stage, Kmeans, AHClomerative, Gaussian mixture and birch clustering algorithms are used. In the prediction part, the deep neural network (DNN) is used as the predictor, and the hourly data of 11 air quality monitoring stations in Shenzhen are selected to verify the effectiveness of the model. The experimental results show that the prediction model based on multi-stage clustering is suitable for multi-step high-precision prediction of PM concentration, and its performance is better than DNN model and single-stage clustering prediction model.

Key words: PM_2.5 prediction, clustering, deep learning, comparative study

中图分类号:

X513

金宇凯, 李志生, 欧耀春, 张华刚, 曾江毅, 陈搏超. 基于多阶段聚类的PM_2.5质量浓度预测及对比研究[J]. 广东工业大学学报, 2023, 40(03): 17-24.

Jin Yu-kai, Li Zhi-sheng, Ou Yao-chun, Zhang Hua-gang, Zeng Jiang-yi, Chen Bo-chao. Prediction and Comparative Study of PM_2.5 Concentration Based on Multi-stage Clustering[J]. Journal of Guangdong University of Technology, 2023, 40(03): 17-24.

参考文献

[1] LI F, YAN J, WEI Y, et al. PM2.5-bound heavy metals from the major cities in China: spatiotemporal distribution, fuzzy exposure assessment and health risk management [J]. Journal of Cleaner Production, 2021, 286: 124967.
[2] BU X, XIE Z, LIU J, et al. Global PM2.5-attributable health burden from 1990 to 2017: estimates from the Global Burden of disease study 2017 [J]. Environ Res, 2021, 197: 111123.
[3] XU B, LUO L, LIN B. A dynamic analysis of air pollution emissions in China: evidence from nonparametric additive regression models [J]. Ecological Indicators, 2016, 63: 346-358.
[4] OLSON D A, BURKE J M. Distributions of PM2.5 source strengths for cooking from the research triangle park particulate matter panel study [J]. Environmental Science & Technology, 2006, 40(1): 163-169.
[5] LUO R, DAI H, ZHANG Y, et al. Association of short-term exposure to source-specific PM2.5 with the cardiovascular response during pregnancy in the Shanghai MCPC study [J]. Science of The Total Environment, 2021, 775: 145725.
[6] 白盛楠, 申晓留. 基于LSTM循环神经网络的PM_2.5预测[J]. 计算机应用与软件, 2019, 36(1): 67-70.
BAI S N, SHEN X L. PM_2.5 prediction based on lstm recurrent neural network [J]. Computer Applications and Software, 2019, 36(1): 67-70.
[7] 赵文芳, 林润生, 唐伟, 等. 基于深度学习的PM_2.5短期预测模型[J]. 南京师大学报(自然科学版), 2019, 42(3): 32-41.
ZHAO W F, LIN R S, TANG W, et al. Forecasting Model of Short-Term PM_2.5 Concentration Based on Deep Learning [J]. Journal of Nanjing Normal University (Natural Science Edition), 2019, 42(3): 32-41.
[8] SOH P W, CHANG J W, HUANG J W. Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations [J]. IEEE Access, 2018, 6: 38186-38199.
[9] ZHAO G, HUANG G, HE H, et al. Regional spatiotemporal collaborative prediction model for air quality [J]. IEEE Access, 2019, 7: 134903-134919.
[10] MA J, DING Y X, GAN V J L, et al. Spatiotemporal prediction of PM2.5 concentrations at different time granularities using IDW-BLSTM [J]. IEEE Access, 2019, 7: 107897-107907.
[11] 黄婕, 张丰, 杜震洪, 等. 基于RNN-CNN集成深度学习模型的PM_2.5 小时浓度预测[J]. 浙江大学学报(理学版), 2019, 46(3): 370-379.
HUANG J, ZHANG F, DU Z H, et al. Hourly concentration prediction of PM_2.5 based on RNN-CNN ensemble deep learning model [J]. Journal of Zhejiang University (Science Edition), 2019, 46(3): 370-379.
[12] NIU M, WANG Y, SUN S, et al. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting [J]. Atmospheric Environment, 2016, 134: 168-180.
[13] GAN K, SUN S, WANG S, et al. A secondary-decomposition-ensemble learning paradigm for forecasting PM2.5 concentration [J]. Atmospheric Pollution Research, 2018, 9(6): 989-999.
[14] CHENG Y, ZHANG H, LIU Z, et al. Hybrid algorithm for short-term forecasting of PM2.5 in China [J]. Atmospheric Environment, 2019, 200: 264-279.
[15] 康俊锋, 谭建林, 方雷, 等. XGBoost-LSTM变权组合模型支持下的短期PM_2.5 浓度预测——以上海为例[J]. 中国环境科学, 2021, 41(9): 4016-4025.
KANG J F, TAN J L, FANG L, et al. Short-term PM_2.5 concentration prediction based on XGBoost and LSTM variable weight combination model: a case study of Shanghai[J]. China Environmental Science, 2021, 41(9): 4016-4025.
[16] 周丽娟, 王慧, 王文伯, 等. 面向海量数据的并行KMeans算法[J]. 华中科技大学学报(自然科学版), 2012, 40(S1): 150-152.
ZHOU L J, WANG H, WANG W B, et al. Parallel KMeans algorithm for massive data [J]. Journal of Huazhong University of science and Technology (Natural Science Edition), 2012, 40(S1): 150-152.
[17] 杨勇, 任淑霞, 冉娟, 等. 基于粒子群优化的k-means改进算法实现Web日志挖掘[J]. 计算机应用, 2016, 36(S1): 29-32.
YANG Y, REN S X, RAN J, et al. Improved k-means algorithm for Web log mining based on particle swarm optimization [J]. Computer Applications, 2016, 36(S1): 29-32.
[18] HUANG S, KANG Z, XU Z, et al. Robust deep k-means: an effective and simple method for data clustering [J]. Pattern Recognition, 2021, 117: 107996.
[19] ALGULIYEV R M, ALIGULIYEV R M, SUKHOSTAT L V. Parallel batch k-means for Big data clustering [J]. Computers & Industrial Engineering, 2021, 152: 107023.
[20] 李如梅, 闫雨龙, 段小琳, 等. 基于聚类分析的长治市夏季VOCs来源及活性[J]. 中国环境科学, 2020, 40(8): 3249-3259.
LI R M, YAN Y L, DUAN X L, et al. Source apportionment and chemical reactivity of VOCs based on clustering during summertime in Changzhi [J]. China Environmental Science, 2020, 40(8): 3249-3259.
[21] 周军锋, 陈伟, 费春苹, 等. BiRch: 一种处理k步可达性查询的双向搜索算法[J]. 通信学报, 2015, 36(8): 50-60.
ZHOU J F, CHEN W, FEI C P, et al. BiRch: a bidirectional search algorithm for k-step reachability queries [J]. Acta communication Sinica, 2015, 36(8): 50-60.
[22] 乔少杰, 金琨, 韩楠, 等. 一种基于高斯混合模型的轨迹预测算法[J]. 软件学报, 2015, 26(5): 1048-1063.
QIAO S J, JIN K, HAN N, et al. Trajectory prediction algorithm based on Gaussian mixture model [J]. Acta Sinica Sinica, 2015, 26(5): 1048-1063.
[23] 崔玮, 吴成东, 张云洲, 等. 基于高斯混合模型的非视距定位算法[J]. 通信学报, 2014, 35(1): 99-106.
CUI W, WU C D, ZHANG Y Z, et al. GMM-based localization algorithm under NLOS conditions [J]. Acta Communication Sinica, 2014, 35(1): 99-106.
[24] 宋董飞, 徐华. DBSCAN算法研究及并行化实现[J]. 计算机工程与应用, 2018, 54(24): 52-56.
SONG D F, XU H. Research and parallelization of DBSCAN algorithm [J]. Computer Engineering and Application, 2018, 54(24): 52-56.
[25] 张旭, 杜景林. 改进PSO-GA-BP的PM_2.5浓度预测[J]. 计算机工程与设计, 2019, 40(6): 1718-1723.
ZHANG X, DU J L. PM_2.5concentration prediction based on improved PSO-GA-BP [J]. Computer Engineering and Design, 2019, 40(6): 1718-1723.
[26] 杨俊闯, 赵超. K-means聚类算法研究综述[J]. 计算机工程与应用, 2019, 55(23): 7-14.
YANG J C, ZHAO C. Survey on K-means Clustering Algorithm [J]. Computer Engineering and Application, 2019, 55(23): 7-14.
[27] 董宏成, 赵学华, 赵成, 等. 基于HDBACAN聚类的自适应过采样技术[J]. 计算机工程与设计, 2020, 41(5): 1295-1300.
DONG H C, ZHAO X H, ZHAO C, et al. Adaptive oversampling based on HDBACAN [J]. Computer Engineering and Design, 2020, 41(5): 1295-1300.
[28] 刘仕友, 宋炜, 应明雄, 等. 基于波形特征向量的凝聚层次聚类地震相分析[J]. 物探与化探, 2020, 44(2): 339-349.
LIU S Y, SONG W, YING M X, et al. Agglomerative hierarchical clustering seismic facies analysis based on waveform eigenvector [J]. Geophysical and Geochemical Exploration, 2020, 44(2): 339-349.
[29] 赵洪科, 吴李康, 李徵, 等. 基于深度神经网络结构的互联网金融市场动态预测[J]. 计算机研究与发展, 2019, 56(8): 1621-1631.
ZHAO H K, WU L K, LI Z, et al. Predicting the dynamics in internet finance based on deep neural network structure [J]. Computer Research and Development, 2019, 56(8): 1621-1631.
[30] DUAN J. Financial system modeling using deep neural networks (DNNs) for effective risk assessment and prediction [J]. Journal of the Franklin Institute, 2019, 356(8): 4716-4731.
[31] OU D, TAN K, LAI J, et al. Semi-supervised DNN regression on airborne hyperspectral imagery for improved spatial soil properties prediction [J]. Geoderma, 2021, 385: 114875.
[32] LIU H, LONG Z, DUAN Z, et al. A new model using multiple feature clustering and neural networks for forecasting hourly PM_2.5 concentrations, and its applications in China [J]. Engineering, 2020, 6(8): 944-956.
[33] 丁泉, 李帅. 智能变电站重采样应用研究及其线性插值法误差分析[J]. 电力系统保护与控制, 2015, 43(23): 132-136.
DING Q, LI S. Application study on resampling in smart substation with error analysis of linear interpolation [J]. Power System Protection and Control, 2015, 43(23): 132-136.
[34] 喻其炳, 李勇, 白云, 等. 基于聚类分析与偏最小二乘法的支持向量机PM_2.5预测[J]. 环境科学与技术, 2017, 40(6): 157-164.
YU Q B, LI Y, BAI Y, et al. Support vector machine PM_2.5 concentration prediction based on K-means clustering and partial least square [J]. Environmental Science and Technology, 2017, 40(6): 157-164.
[35] 于彦伟, 贾召飞, 曹磊, 等. 面向位置大数据的快速密度聚类算法[J]. 软件学报, 2018, 29(8): 2470-2484.
YU Y W, JIA Z F, CAO L, et al. Fast density-based clustering algorithm for location big data [J]. Acta Sinica, 2018, 29(8): 2470-2484.
[36] 朱连江, 马炳先, 赵学泉. 基于轮廓系数的聚类有效性分析[J]. 计算机应用, 2010, 30(S2): 139-141.
ZHU L J, MA B X, ZHAO X Q. Clustering validity analysis based on silhouette coefficient [J]. Computer Applications, 2010, 30(S2): 139-141.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed