广东工业大学学报 ›› 2023, Vol. 40 ›› Issue (03): 17-24.doi: 10.12052/gdutxb.210157

• • 上一篇    下一篇

基于多阶段聚类的PM2.5质量浓度预测及对比研究

金宇凯, 李志生, 欧耀春, 张华刚, 曾江毅, 陈搏超   

  1. 广东工业大学 土木与交通工程学院, 广东 广州 510006
  • 收稿日期:2021-10-25 出版日期:2023-05-25 发布日期:2023-06-08
  • 通信作者: 李志生(1972-),男,副教授,博士,主要研究方向为建筑环境与室内污染防治,E-mail:chinagzlzs@126.com
  • 作者简介:金宇凯(1998-),男,硕士研究生,主要研究方向为机器学习和室内空气品质
  • 基金资助:
    广东省自然科学基金资助项目(S2011040003755);广东省自然科学基金资助项目(2016A030313711)

Prediction and Comparative Study of PM2.5 Concentration Based on Multi-stage Clustering

Jin Yu-kai, Li Zhi-sheng, Ou Yao-chun, Zhang Hua-gang, Zeng Jiang-yi, Chen Bo-chao   

  1. School of Civil and Transportation Engineering, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2021-10-25 Online:2023-05-25 Published:2023-06-08

摘要: 本文提出了一个基于多阶段聚类的深度神经网络(Deep Neural Network,DNN)预测模型,用于多步骤PM2.5质量浓度预测。建议的模型包括分解聚类和预测。在聚类部分中,第1阶段采用的是HDBSCAN(Hierarchical Density-based Spatial Clustering of Applications with Noise,HDB)密度聚类来剔除噪点,在此基础上,再进行第2阶段聚类。第2阶段聚类采用的是Kmeans、Agglomerative、高斯混合以及BIRCH聚类算法(Balanced Iterative Reducing and Clustering Using Hierarchies)4种聚类算法。在预测部分中,使用了DNN作为预测器,选取了深圳市11个空气质量监测站的2015全年逐时数据来验证模型的有效性。实验结果表明,基于多阶段聚类的预测模型适合PM2.5质量浓度的多步高精度预测,性能优于无聚类预测模型以及单阶段聚类预测模型。

关键词: PM2.5预测, 聚类, 深度学习, 对比研究

Abstract: A deep neural network (DNN) prediction model based on multi-stage clustering is proposed for multi-step PM2.5 concentration prediction. The proposed model includes decomposition, clustering and prediction. In the part of clustering, the first stage uses HDBscan density clustering to eliminate the noise, and then carries on the second stage clustering. In the second stage, Kmeans, AHClomerative, Gaussian mixture and birch clustering algorithms are used. In the prediction part, the deep neural network (DNN) is used as the predictor, and the hourly data of 11 air quality monitoring stations in Shenzhen are selected to verify the effectiveness of the model. The experimental results show that the prediction model based on multi-stage clustering is suitable for multi-step high-precision prediction of PM concentration, and its performance is better than DNN model and single-stage clustering prediction model.

Key words: PM2.5 prediction, clustering, deep learning, comparative study

中图分类号: 

  • X513
[1] LI F, YAN J, WEI Y, et al. PM2.5-bound heavy metals from the major cities in China: spatiotemporal distribution, fuzzy exposure assessment and health risk management [J]. Journal of Cleaner Production, 2021, 286: 124967.
[2] BU X, XIE Z, LIU J, et al. Global PM2.5-attributable health burden from 1990 to 2017: estimates from the Global Burden of disease study 2017 [J]. Environ Res, 2021, 197: 111123.
[3] XU B, LUO L, LIN B. A dynamic analysis of air pollution emissions in China: evidence from nonparametric additive regression models [J]. Ecological Indicators, 2016, 63: 346-358.
[4] OLSON D A, BURKE J M. Distributions of PM2.5 source strengths for cooking from the research triangle park particulate matter panel study [J]. Environmental Science & Technology, 2006, 40(1): 163-169.
[5] LUO R, DAI H, ZHANG Y, et al. Association of short-term exposure to source-specific PM2.5 with the cardiovascular response during pregnancy in the Shanghai MCPC study [J]. Science of The Total Environment, 2021, 775: 145725.
[6] 白盛楠, 申晓留. 基于LSTM循环神经网络的PM2.5预测[J]. 计算机应用与软件, 2019, 36(1): 67-70.
BAI S N, SHEN X L. PM2.5 prediction based on lstm recurrent neural network [J]. Computer Applications and Software, 2019, 36(1): 67-70.
[7] 赵文芳, 林润生, 唐伟, 等. 基于深度学习的PM2.5短期预测模型[J]. 南京师大学报(自然科学版), 2019, 42(3): 32-41.
ZHAO W F, LIN R S, TANG W, et al. Forecasting Model of Short-Term PM2.5 Concentration Based on Deep Learning [J]. Journal of Nanjing Normal University (Natural Science Edition), 2019, 42(3): 32-41.
[8] SOH P W, CHANG J W, HUANG J W. Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations [J]. IEEE Access, 2018, 6: 38186-38199.
[9] ZHAO G, HUANG G, HE H, et al. Regional spatiotemporal collaborative prediction model for air quality [J]. IEEE Access, 2019, 7: 134903-134919.
[10] MA J, DING Y X, GAN V J L, et al. Spatiotemporal prediction of PM2.5 concentrations at different time granularities using IDW-BLSTM [J]. IEEE Access, 2019, 7: 107897-107907.
[11] 黄婕, 张丰, 杜震洪, 等. 基于RNN-CNN集成深度学习模型的PM2.5 小时浓度预测[J]. 浙江大学学报(理学版), 2019, 46(3): 370-379.
HUANG J, ZHANG F, DU Z H, et al. Hourly concentration prediction of PM2.5 based on RNN-CNN ensemble deep learning model [J]. Journal of Zhejiang University (Science Edition), 2019, 46(3): 370-379.
[12] NIU M, WANG Y, SUN S, et al. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting [J]. Atmospheric Environment, 2016, 134: 168-180.
[13] GAN K, SUN S, WANG S, et al. A secondary-decomposition-ensemble learning paradigm for forecasting PM2.5 concentration [J]. Atmospheric Pollution Research, 2018, 9(6): 989-999.
[14] CHENG Y, ZHANG H, LIU Z, et al. Hybrid algorithm for short-term forecasting of PM2.5 in China [J]. Atmospheric Environment, 2019, 200: 264-279.
[15] 康俊锋, 谭建林, 方雷, 等. XGBoost-LSTM变权组合模型支持下的短期PM2.5 浓度预测——以上海为例[J]. 中国环境科学, 2021, 41(9): 4016-4025.
KANG J F, TAN J L, FANG L, et al. Short-term PM2.5 concentration prediction based on XGBoost and LSTM variable weight combination model: a case study of Shanghai[J]. China Environmental Science, 2021, 41(9): 4016-4025.
[16] 周丽娟, 王慧, 王文伯, 等. 面向海量数据的并行KMeans算法[J]. 华中科技大学学报(自然科学版), 2012, 40(S1): 150-152.
ZHOU L J, WANG H, WANG W B, et al. Parallel KMeans algorithm for massive data [J]. Journal of Huazhong University of science and Technology (Natural Science Edition), 2012, 40(S1): 150-152.
[17] 杨勇, 任淑霞, 冉娟, 等. 基于粒子群优化的k-means改进算法实现Web日志挖掘[J]. 计算机应用, 2016, 36(S1): 29-32.
YANG Y, REN S X, RAN J, et al. Improved k-means algorithm for Web log mining based on particle swarm optimization [J]. Computer Applications, 2016, 36(S1): 29-32.
[18] HUANG S, KANG Z, XU Z, et al. Robust deep k-means: an effective and simple method for data clustering [J]. Pattern Recognition, 2021, 117: 107996.
[19] ALGULIYEV R M, ALIGULIYEV R M, SUKHOSTAT L V. Parallel batch k-means for Big data clustering [J]. Computers & Industrial Engineering, 2021, 152: 107023.
[20] 李如梅, 闫雨龙, 段小琳, 等. 基于聚类分析的长治市夏季VOCs来源及活性[J]. 中国环境科学, 2020, 40(8): 3249-3259.
LI R M, YAN Y L, DUAN X L, et al. Source apportionment and chemical reactivity of VOCs based on clustering during summertime in Changzhi [J]. China Environmental Science, 2020, 40(8): 3249-3259.
[21] 周军锋, 陈伟, 费春苹, 等. BiRch: 一种处理k步可达性查询的双向搜索算法[J]. 通信学报, 2015, 36(8): 50-60.
ZHOU J F, CHEN W, FEI C P, et al. BiRch: a bidirectional search algorithm for k-step reachability queries [J]. Acta communication Sinica, 2015, 36(8): 50-60.
[22] 乔少杰, 金琨, 韩楠, 等. 一种基于高斯混合模型的轨迹预测算法[J]. 软件学报, 2015, 26(5): 1048-1063.
QIAO S J, JIN K, HAN N, et al. Trajectory prediction algorithm based on Gaussian mixture model [J]. Acta Sinica Sinica, 2015, 26(5): 1048-1063.
[23] 崔玮, 吴成东, 张云洲, 等. 基于高斯混合模型的非视距定位算法[J]. 通信学报, 2014, 35(1): 99-106.
CUI W, WU C D, ZHANG Y Z, et al. GMM-based localization algorithm under NLOS conditions [J]. Acta Communication Sinica, 2014, 35(1): 99-106.
[24] 宋董飞, 徐华. DBSCAN算法研究及并行化实现[J]. 计算机工程与应用, 2018, 54(24): 52-56.
SONG D F, XU H. Research and parallelization of DBSCAN algorithm [J]. Computer Engineering and Application, 2018, 54(24): 52-56.
[25] 张旭, 杜景林. 改进PSO-GA-BP的PM2.5浓度预测[J]. 计算机工程与设计, 2019, 40(6): 1718-1723.
ZHANG X, DU J L. PM2.5 concentration prediction based on improved PSO-GA-BP [J]. Computer Engineering and Design, 2019, 40(6): 1718-1723.
[26] 杨俊闯, 赵超. K-means聚类算法研究综述[J]. 计算机工程与应用, 2019, 55(23): 7-14.
YANG J C, ZHAO C. Survey on K-means Clustering Algorithm [J]. Computer Engineering and Application, 2019, 55(23): 7-14.
[27] 董宏成, 赵学华, 赵成, 等. 基于HDBACAN聚类的自适应过采样技术[J]. 计算机工程与设计, 2020, 41(5): 1295-1300.
DONG H C, ZHAO X H, ZHAO C, et al. Adaptive oversampling based on HDBACAN [J]. Computer Engineering and Design, 2020, 41(5): 1295-1300.
[28] 刘仕友, 宋炜, 应明雄, 等. 基于波形特征向量的凝聚层次聚类地震相分析[J]. 物探与化探, 2020, 44(2): 339-349.
LIU S Y, SONG W, YING M X, et al. Agglomerative hierarchical clustering seismic facies analysis based on waveform eigenvector [J]. Geophysical and Geochemical Exploration, 2020, 44(2): 339-349.
[29] 赵洪科, 吴李康, 李徵, 等. 基于深度神经网络结构的互联网金融市场动态预测[J]. 计算机研究与发展, 2019, 56(8): 1621-1631.
ZHAO H K, WU L K, LI Z, et al. Predicting the dynamics in internet finance based on deep neural network structure [J]. Computer Research and Development, 2019, 56(8): 1621-1631.
[30] DUAN J. Financial system modeling using deep neural networks (DNNs) for effective risk assessment and prediction [J]. Journal of the Franklin Institute, 2019, 356(8): 4716-4731.
[31] OU D, TAN K, LAI J, et al. Semi-supervised DNN regression on airborne hyperspectral imagery for improved spatial soil properties prediction [J]. Geoderma, 2021, 385: 114875.
[32] LIU H, LONG Z, DUAN Z, et al. A new model using multiple feature clustering and neural networks for forecasting hourly PM2.5 concentrations, and its applications in China [J]. Engineering, 2020, 6(8): 944-956.
[33] 丁泉, 李帅. 智能变电站重采样应用研究及其线性插值法误差分析[J]. 电力系统保护与控制, 2015, 43(23): 132-136.
DING Q, LI S. Application study on resampling in smart substation with error analysis of linear interpolation [J]. Power System Protection and Control, 2015, 43(23): 132-136.
[34] 喻其炳, 李勇, 白云, 等. 基于聚类分析与偏最小二乘法的支持向量机PM2.5预测[J]. 环境科学与技术, 2017, 40(6): 157-164.
YU Q B, LI Y, BAI Y, et al. Support vector machine PM2.5 concentration prediction based on K-means clustering and partial least square [J]. Environmental Science and Technology, 2017, 40(6): 157-164.
[35] 于彦伟, 贾召飞, 曹磊, 等. 面向位置大数据的快速密度聚类算法[J]. 软件学报, 2018, 29(8): 2470-2484.
YU Y W, JIA Z F, CAO L, et al. Fast density-based clustering algorithm for location big data [J]. Acta Sinica, 2018, 29(8): 2470-2484.
[36] 朱连江, 马炳先, 赵学泉. 基于轮廓系数的聚类有效性分析[J]. 计算机应用, 2010, 30(S2): 139-141.
ZHU L J, MA B X, ZHAO X Q. Clustering validity analysis based on silhouette coefficient [J]. Computer Applications, 2010, 30(S2): 139-141.
[1] 李杨, 周莹. 基于方向控制的差分隐私轨迹数据发布方法[J]. 广东工业大学学报, 2023, 40(05): 56-63.
[2] 温雯, 刘莹, 蔡瑞初, 郝志峰. 面向多粒度交通流预测的时空深度回归模型[J]. 广东工业大学学报, 2023, 40(04): 1-8.
[3] 钟耿君, 李东. 基于通道分离机制的双分支点云处理网络[J]. 广东工业大学学报, 2023, 40(04): 18-23.
[4] 吴俊贤, 何元烈. 基于通道注意力的自监督深度估计方法[J]. 广东工业大学学报, 2023, 40(02): 22-29.
[5] 樊娟, 邓秀勤, 刘玉兰. 一种基于Fréchet距离的谱聚类算法[J]. 广东工业大学学报, 2023, 40(02): 39-44.
[6] 莫赞, 范梦婷, 刘洪伟, 严杨帆. 基于在线用户行为的产品非对称竞争市场结构研究[J]. 广东工业大学学报, 2023, 40(02): 111-119.
[7] 刘冬宁, 王子奇, 曾艳姣, 文福燕, 王洋. 基于复合编码特征LSTM的基因甲基化位点预测方法[J]. 广东工业大学学报, 2023, 40(01): 1-9.
[8] 徐伟锋, 蔡述庭, 熊晓明. 基于深度特征的单目视觉惯导里程计[J]. 广东工业大学学报, 2023, 40(01): 56-60,76.
[9] 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9.
[10] 章云, 王晓东. 基于受限样本的深度学习综述与思考[J]. 广东工业大学学报, 2022, 39(05): 1-8.
[11] 郑佳碧, 杨振国, 刘文印. 基于细粒度混杂平衡的营销效果评估方法[J]. 广东工业大学学报, 2022, 39(02): 55-61.
[12] Gary Yen, 栗波, 谢胜利. 地球流体动力学模型恢复的长短期记忆网络渐进优化方法[J]. 广东工业大学学报, 2021, 38(06): 1-8.
[13] 赖峻, 刘震宇, 刘圣海. 基于全局数据混洗的小样本数据预测方法[J]. 广东工业大学学报, 2021, 38(03): 17-21.
[14] 岑仕杰, 何元烈, 陈小聪. 结合注意力与无监督深度学习的单目深度估计[J]. 广东工业大学学报, 2020, 37(04): 35-41.
[15] 范梦婷, 刘洪伟, 高鸿铭, 何锐超. 电子商务平台下的竞争产品市场结构研究[J]. 广东工业大学学报, 2019, 36(06): 32-37.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!