Journal of Guangdong University of Technology ›› 2023, Vol. 40 ›› Issue (03): 17-24.doi: 10.12052/gdutxb.210157

Previous Articles     Next Articles

Prediction and Comparative Study of PM2.5 Concentration Based on Multi-stage Clustering

Jin Yu-kai, Li Zhi-sheng, Ou Yao-chun, Zhang Hua-gang, Zeng Jiang-yi, Chen Bo-chao   

  1. School of Civil and Transportation Engineering, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2021-10-25 Online:2023-05-25 Published:2023-06-08

Abstract: A deep neural network (DNN) prediction model based on multi-stage clustering is proposed for multi-step PM2.5 concentration prediction. The proposed model includes decomposition, clustering and prediction. In the part of clustering, the first stage uses HDBscan density clustering to eliminate the noise, and then carries on the second stage clustering. In the second stage, Kmeans, AHClomerative, Gaussian mixture and birch clustering algorithms are used. In the prediction part, the deep neural network (DNN) is used as the predictor, and the hourly data of 11 air quality monitoring stations in Shenzhen are selected to verify the effectiveness of the model. The experimental results show that the prediction model based on multi-stage clustering is suitable for multi-step high-precision prediction of PM concentration, and its performance is better than DNN model and single-stage clustering prediction model.

Key words: PM2.5 prediction, clustering, deep learning, comparative study

CLC Number: 

  • X513
[1] LI F, YAN J, WEI Y, et al. PM2.5-bound heavy metals from the major cities in China: spatiotemporal distribution, fuzzy exposure assessment and health risk management [J]. Journal of Cleaner Production, 2021, 286: 124967.
[2] BU X, XIE Z, LIU J, et al. Global PM2.5-attributable health burden from 1990 to 2017: estimates from the Global Burden of disease study 2017 [J]. Environ Res, 2021, 197: 111123.
[3] XU B, LUO L, LIN B. A dynamic analysis of air pollution emissions in China: evidence from nonparametric additive regression models [J]. Ecological Indicators, 2016, 63: 346-358.
[4] OLSON D A, BURKE J M. Distributions of PM2.5 source strengths for cooking from the research triangle park particulate matter panel study [J]. Environmental Science & Technology, 2006, 40(1): 163-169.
[5] LUO R, DAI H, ZHANG Y, et al. Association of short-term exposure to source-specific PM2.5 with the cardiovascular response during pregnancy in the Shanghai MCPC study [J]. Science of The Total Environment, 2021, 775: 145725.
[6] 白盛楠, 申晓留. 基于LSTM循环神经网络的PM2.5预测[J]. 计算机应用与软件, 2019, 36(1): 67-70.
BAI S N, SHEN X L. PM2.5 prediction based on lstm recurrent neural network [J]. Computer Applications and Software, 2019, 36(1): 67-70.
[7] 赵文芳, 林润生, 唐伟, 等. 基于深度学习的PM2.5短期预测模型[J]. 南京师大学报(自然科学版), 2019, 42(3): 32-41.
ZHAO W F, LIN R S, TANG W, et al. Forecasting Model of Short-Term PM2.5 Concentration Based on Deep Learning [J]. Journal of Nanjing Normal University (Natural Science Edition), 2019, 42(3): 32-41.
[8] SOH P W, CHANG J W, HUANG J W. Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations [J]. IEEE Access, 2018, 6: 38186-38199.
[9] ZHAO G, HUANG G, HE H, et al. Regional spatiotemporal collaborative prediction model for air quality [J]. IEEE Access, 2019, 7: 134903-134919.
[10] MA J, DING Y X, GAN V J L, et al. Spatiotemporal prediction of PM2.5 concentrations at different time granularities using IDW-BLSTM [J]. IEEE Access, 2019, 7: 107897-107907.
[11] 黄婕, 张丰, 杜震洪, 等. 基于RNN-CNN集成深度学习模型的PM2.5 小时浓度预测[J]. 浙江大学学报(理学版), 2019, 46(3): 370-379.
HUANG J, ZHANG F, DU Z H, et al. Hourly concentration prediction of PM2.5 based on RNN-CNN ensemble deep learning model [J]. Journal of Zhejiang University (Science Edition), 2019, 46(3): 370-379.
[12] NIU M, WANG Y, SUN S, et al. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting [J]. Atmospheric Environment, 2016, 134: 168-180.
[13] GAN K, SUN S, WANG S, et al. A secondary-decomposition-ensemble learning paradigm for forecasting PM2.5 concentration [J]. Atmospheric Pollution Research, 2018, 9(6): 989-999.
[14] CHENG Y, ZHANG H, LIU Z, et al. Hybrid algorithm for short-term forecasting of PM2.5 in China [J]. Atmospheric Environment, 2019, 200: 264-279.
[15] 康俊锋, 谭建林, 方雷, 等. XGBoost-LSTM变权组合模型支持下的短期PM2.5 浓度预测——以上海为例[J]. 中国环境科学, 2021, 41(9): 4016-4025.
KANG J F, TAN J L, FANG L, et al. Short-term PM2.5 concentration prediction based on XGBoost and LSTM variable weight combination model: a case study of Shanghai[J]. China Environmental Science, 2021, 41(9): 4016-4025.
[16] 周丽娟, 王慧, 王文伯, 等. 面向海量数据的并行KMeans算法[J]. 华中科技大学学报(自然科学版), 2012, 40(S1): 150-152.
ZHOU L J, WANG H, WANG W B, et al. Parallel KMeans algorithm for massive data [J]. Journal of Huazhong University of science and Technology (Natural Science Edition), 2012, 40(S1): 150-152.
[17] 杨勇, 任淑霞, 冉娟, 等. 基于粒子群优化的k-means改进算法实现Web日志挖掘[J]. 计算机应用, 2016, 36(S1): 29-32.
YANG Y, REN S X, RAN J, et al. Improved k-means algorithm for Web log mining based on particle swarm optimization [J]. Computer Applications, 2016, 36(S1): 29-32.
[18] HUANG S, KANG Z, XU Z, et al. Robust deep k-means: an effective and simple method for data clustering [J]. Pattern Recognition, 2021, 117: 107996.
[19] ALGULIYEV R M, ALIGULIYEV R M, SUKHOSTAT L V. Parallel batch k-means for Big data clustering [J]. Computers & Industrial Engineering, 2021, 152: 107023.
[20] 李如梅, 闫雨龙, 段小琳, 等. 基于聚类分析的长治市夏季VOCs来源及活性[J]. 中国环境科学, 2020, 40(8): 3249-3259.
LI R M, YAN Y L, DUAN X L, et al. Source apportionment and chemical reactivity of VOCs based on clustering during summertime in Changzhi [J]. China Environmental Science, 2020, 40(8): 3249-3259.
[21] 周军锋, 陈伟, 费春苹, 等. BiRch: 一种处理k步可达性查询的双向搜索算法[J]. 通信学报, 2015, 36(8): 50-60.
ZHOU J F, CHEN W, FEI C P, et al. BiRch: a bidirectional search algorithm for k-step reachability queries [J]. Acta communication Sinica, 2015, 36(8): 50-60.
[22] 乔少杰, 金琨, 韩楠, 等. 一种基于高斯混合模型的轨迹预测算法[J]. 软件学报, 2015, 26(5): 1048-1063.
QIAO S J, JIN K, HAN N, et al. Trajectory prediction algorithm based on Gaussian mixture model [J]. Acta Sinica Sinica, 2015, 26(5): 1048-1063.
[23] 崔玮, 吴成东, 张云洲, 等. 基于高斯混合模型的非视距定位算法[J]. 通信学报, 2014, 35(1): 99-106.
CUI W, WU C D, ZHANG Y Z, et al. GMM-based localization algorithm under NLOS conditions [J]. Acta Communication Sinica, 2014, 35(1): 99-106.
[24] 宋董飞, 徐华. DBSCAN算法研究及并行化实现[J]. 计算机工程与应用, 2018, 54(24): 52-56.
SONG D F, XU H. Research and parallelization of DBSCAN algorithm [J]. Computer Engineering and Application, 2018, 54(24): 52-56.
[25] 张旭, 杜景林. 改进PSO-GA-BP的PM2.5浓度预测[J]. 计算机工程与设计, 2019, 40(6): 1718-1723.
ZHANG X, DU J L. PM2.5 concentration prediction based on improved PSO-GA-BP [J]. Computer Engineering and Design, 2019, 40(6): 1718-1723.
[26] 杨俊闯, 赵超. K-means聚类算法研究综述[J]. 计算机工程与应用, 2019, 55(23): 7-14.
YANG J C, ZHAO C. Survey on K-means Clustering Algorithm [J]. Computer Engineering and Application, 2019, 55(23): 7-14.
[27] 董宏成, 赵学华, 赵成, 等. 基于HDBACAN聚类的自适应过采样技术[J]. 计算机工程与设计, 2020, 41(5): 1295-1300.
DONG H C, ZHAO X H, ZHAO C, et al. Adaptive oversampling based on HDBACAN [J]. Computer Engineering and Design, 2020, 41(5): 1295-1300.
[28] 刘仕友, 宋炜, 应明雄, 等. 基于波形特征向量的凝聚层次聚类地震相分析[J]. 物探与化探, 2020, 44(2): 339-349.
LIU S Y, SONG W, YING M X, et al. Agglomerative hierarchical clustering seismic facies analysis based on waveform eigenvector [J]. Geophysical and Geochemical Exploration, 2020, 44(2): 339-349.
[29] 赵洪科, 吴李康, 李徵, 等. 基于深度神经网络结构的互联网金融市场动态预测[J]. 计算机研究与发展, 2019, 56(8): 1621-1631.
ZHAO H K, WU L K, LI Z, et al. Predicting the dynamics in internet finance based on deep neural network structure [J]. Computer Research and Development, 2019, 56(8): 1621-1631.
[30] DUAN J. Financial system modeling using deep neural networks (DNNs) for effective risk assessment and prediction [J]. Journal of the Franklin Institute, 2019, 356(8): 4716-4731.
[31] OU D, TAN K, LAI J, et al. Semi-supervised DNN regression on airborne hyperspectral imagery for improved spatial soil properties prediction [J]. Geoderma, 2021, 385: 114875.
[32] LIU H, LONG Z, DUAN Z, et al. A new model using multiple feature clustering and neural networks for forecasting hourly PM2.5 concentrations, and its applications in China [J]. Engineering, 2020, 6(8): 944-956.
[33] 丁泉, 李帅. 智能变电站重采样应用研究及其线性插值法误差分析[J]. 电力系统保护与控制, 2015, 43(23): 132-136.
DING Q, LI S. Application study on resampling in smart substation with error analysis of linear interpolation [J]. Power System Protection and Control, 2015, 43(23): 132-136.
[34] 喻其炳, 李勇, 白云, 等. 基于聚类分析与偏最小二乘法的支持向量机PM2.5预测[J]. 环境科学与技术, 2017, 40(6): 157-164.
YU Q B, LI Y, BAI Y, et al. Support vector machine PM2.5 concentration prediction based on K-means clustering and partial least square [J]. Environmental Science and Technology, 2017, 40(6): 157-164.
[35] 于彦伟, 贾召飞, 曹磊, 等. 面向位置大数据的快速密度聚类算法[J]. 软件学报, 2018, 29(8): 2470-2484.
YU Y W, JIA Z F, CAO L, et al. Fast density-based clustering algorithm for location big data [J]. Acta Sinica, 2018, 29(8): 2470-2484.
[36] 朱连江, 马炳先, 赵学泉. 基于轮廓系数的聚类有效性分析[J]. 计算机应用, 2010, 30(S2): 139-141.
ZHU L J, MA B X, ZHAO X Q. Clustering validity analysis based on silhouette coefficient [J]. Computer Applications, 2010, 30(S2): 139-141.
[1] Li Yang, Zhou Ying. Differential Privacy Trajectory Data Publishing Based on Orientation Control [J]. Journal of Guangdong University of Technology, 2023, 40(05): 56-63.
[2] Wen Wen, Liu Ying, Cai Rui-chu, Hao Zhi-feng. Spatial-temporal Deep Regression Model for Multi-granularity Traffic Flow Prediction [J]. Journal of Guangdong University of Technology, 2023, 40(04): 1-8.
[3] Zhong Geng-jun, Li Dong. A Channel-splited Based Dual-branch Block for 3D Point Cloud Processing [J]. Journal of Guangdong University of Technology, 2023, 40(04): 18-23.
[4] Fan Juan, Deng Xiu-qin, Liu Yu-lan. A Spectral Clustering Algorithm Based on Fréchet Distance [J]. Journal of Guangdong University of Technology, 2023, 40(02): 39-44.
[5] Mo Zan, Fan Meng-ting, Liu Hong-wei, Yan Yang-fan. Market Structure of Product Asymmetric Competition Based on Online User Behavior [J]. Journal of Guangdong University of Technology, 2023, 40(02): 111-119.
[6] Liu Dong-ning, Wang Zi-qi, Zeng Yan-jiao, Wen Fu-yan, Wang Yang. Prediction Method of Gene Methylation Sites Based on LSTM with Compound Coding Characteristics [J]. Journal of Guangdong University of Technology, 2023, 40(01): 1-9.
[7] Xu Wei-feng, Cai Shu-ting, Xiong Xiao-ming. Visual Inertial Odometry Based on Deep Features [J]. Journal of Guangdong University of Technology, 2023, 40(01): 56-60,76.
[8] Liu Hong-wei, Lin Wei-zhen, Wen Zhan-ming, Chen Yan-jun, Yi Min-qi. A MABM-based Model for Identifying Consumers' Sentiment Polarity―Taking Movie Reviews as an Example [J]. Journal of Guangdong University of Technology, 2022, 39(06): 1-9.
[9] Zhang Yun, Wang Xiao-dong. A Review and Thinking of Deep Learning with a Restricted Number of Samples [J]. Journal of Guangdong University of Technology, 2022, 39(05): 1-8.
[10] Zheng Jia-bi, Yang Zhen-guo, Liu Wen-yin. Marketing-Effect Estimation Based on Fine-grained Confounder Balancing [J]. Journal of Guangdong University of Technology, 2022, 39(02): 55-61.
[11] Gary Yen, Li Bo, Xie Sheng-li. An Evolutionary Optimization of LSTM for Model Recovery of Geophysical Fluid Dynamics [J]. Journal of Guangdong University of Technology, 2021, 38(06): 1-8.
[12] Lai Jun, Liu Zhen-yu, Liu Sheng-hai. A Small Sample Data Prediction Method Based on Global Data Shuffling [J]. Journal of Guangdong University of Technology, 2021, 38(03): 17-21.
[13] Cen Shi-jie, He Yuan-lie, Chen Xiao-cong. A Monocular Depth Estimation Combined with Attention and Unsupervised Deep Learning [J]. Journal of Guangdong University of Technology, 2020, 37(04): 35-41.
[14] Fan Meng-ting, Liu Hong-wei, Gao Hong-ming, He Rui-chao. A Research on Competitive Product Market Structure of E-commerce Platform [J]. Journal of Guangdong University of Technology, 2019, 36(06): 32-37.
[15] He Qing-xiang, Zhang Wei. Application of Improved Clustering Algorithm in Terrorist Attacks [J]. Journal of Guangdong University of Technology, 2019, 36(04): 24-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!