广东工业大学学报 ›› 2022, Vol. 39 ›› Issue (05): 21-28.doi: 10.12052/gdutxb.220029
袁君1, 章云1, 张桂东1, 李忠2, 陈哲3, 于晟龙4
Yuan Jun1, Zhang Yun1, Zhang Gui-dong1, Li Zhong2, Chen Zhe3, Yu Sheng-long4
摘要: 自适应动态规划 (Adaptive Dynamic Programming, ADP) 作为最优控制领域的研究热点,其在能量管理系统 (Energy Management System, EMS)领域中有着广泛的应用。ADP算法是通过系统输入输出数据自适应调整控制策略实现优化控制, 尤其在解决复杂非线性系统最优控制问题方面具有强大优势。本文介绍了ADP算法的研究进展及其在EMS领域的应用,分析了该算法在离散EMS和连续EMS的研究现状和算法实现方式,并介绍了实时自适应动态规划 (Real-time Adaptive Dynamic Programming, RT-ADP)算法及其应用的可行性。
中图分类号:
[1] 康重庆. 能源互联网促进实现“双碳”目标[J]. 全球能源互联网, 2021, 4(3): 205-206. KANG C Q. Energy internet promotes "dual carbon" goal [J]. Global Energy Internet, 2021, 4(3): 205-206. [2] 白建华, 辛颂旭, 刘俊, 等. 中国实现髙比例可再生能源发展路径研究[J]. 中国电机工程学报, 2015, 35(14): 284-287. BAI J H, XIN S X, LIU J, et al. Research on the development path of renewable energy in China [J]. China Electric Machinery Journal of engineering, 2015, 35(14): 284-287. [3] MARIANO-HERNáNDEZ D, HERNáNDEZ-CALLEJO L, et al. A review of strategies for building energy management system: model predictive control, demand side management, optimization, and fault detect diagnosis [J]. Journal of Building Engineering, 2021, 33(1): 352-371. [4] WERBOS P J. Computational intelligence for the smart grid-history, challenges, and opportunities [J]. IEEE Computational Intelligence Magazine, 2011, 6(3): 14-21. [5] LIU D, XUE S, ZHAO B, LUO B, et al. Adaptive dynamic programming for control: a survey and recent advances [J]. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 2021, 51(1): 142-160. [6] 陈辞, 谢立华. 具有指定收敛速度的离散系统鲁棒跟踪数据驱动设计[J]. 广东工业大学学报, 2021, 38(6): 29-34. CHEN C, XIE L H. A data-driven prescribed convergence rate design for robust tracking of discrete-time systems [J]. Journal of Guangdong University of Technology, 2021, 38(6): 29-34. [7] BOARO M, FUSELLI D, ANGELIS F D, et al. Adaptive dynamic programming algorithm for renewable energy scheduling and battery management [J]. Cognitive Computation, 2013, 5(2): 264-277. [8] WEI Q, LIU D, SHI G. A novel dual iterative Q-learning method for optimal battery management in smart residential environments [J]. IEEE Transactions on Industrial Electronics, 2015, 62(4): 2509-2518. [9] WANG D, LIU D, WEI Q L, et al. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming [J]. Automatica, 2012, 48(8): 1825-1832. [10] LU Z X, LI H B, QIAO Y. Power system flexibility planning and challenges with high proportion of renewable energy [J]. Power System automation, 2016, 40(13): 147-158. [11] KONG L H, HE W, YANG C G, et al. Robust neuro-optimal control for a robot via adaptive dynamic programming [J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(6): 2584-2594. [12] FERRARI S, SARANGAPANI J, LEWIS F L. Special issue on approximate dynamic programming and reinforcement learning [J]. Journal of Control Theory and Applications, 2011, 9(3): 309. [13] WANG D, LIU D R. Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique [J]. Neurocomputing, 2013, 121(1): 218-225. [14] NI Z, HE H B, WEN J Y, et al. Goal representation heuristic dynamic programming on maze navigation [J]. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(12): 2038-2050. [15] LIU D R, WANG D, ZHAO D B, et al. Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming [J]. IEEE Transactions on Automation Science and Engineering, 2012, 9(3): 628-634. [16] HE H B, NI Z, FU J. A three-network architecture for on-line learning and optimization based on adaptive dynamic programming [J]. Neurocomputing, 2012, 78(1): 3-13. [17] XU X, HOU Z S, LIAN C Q, et al. Online learning control using adaptive critic designs with sparse kernel machines [J]. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(5): 762-775. [18] 宋浩楠, 赵刚, 孙若莹. 基于深度强化学习的知识推理研究进展综述[J]. 计算机工程与应用, 2022, 58(1): 12-25. SONG H N, ZHAO G, SUN R Y. Developments of knowledge reasoning based on deep reinforcement learning [J]. Computer Engineering and Applications, 2022, 58(1): 12-25. [19] XU X, HU D, LU X. Kernel-Based least squares policy iteration for reinforcement learning [J]. IEEE Transactions on Neural Networks, 2007, 18(4): 973-992. [20] XU X, YANG H, et al. Self-Learning control using dual heuristic programming with global laplacian eigenmaps [J]. IEEE Transactions on Industrial Electronics, 2017, 64(12): 9517-9526. [21] 刘毅, 章云. 一种基于自适应动态规划的协同优化算法[J]. 广东工业大学学报, 2017, 34(6): 15-19. LIU Y, ZHANG Y. A cooperative optimization algorithm based on adaptive dynamic programming [J]. Journal of Guangdong University of Technology, 2017, 34(6): 15-19. [22] MURRAY J J, COXC C J, LENDARIS G G, et al. Adaptive dynamic programming [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2002, 32(2): 140-153. [23] LEE J Y, PARK J B, CHOI Y H. On integral generalized policy iteration for continuous-time linear quadratic regulations [J]. Automatica, 2014, 50(2): 475-489. [24] LEWIS F L, WEI Q L. Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games [J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 28(3): 704-713. [25] LIU D R, WEI Q L, WANG D, et al. Adaptive dynamic programming with applications in optimal control[M]. Cham, Switzerland: Springer, 2017. [26] LIU D R, WEI Q L. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems [J]. IEEE Transactions on Neural Networks and Learning Systems, 2013, 25(3): 621-634. [27] WEI Q L, LIU D R, LIN Q, et al. Adaptive dynamic programming for discrete-time zero-sum games [J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(4): 957-969. [28] 刘毅, 章云. 基于值迭代的自适应动态规划的收敛条件[J]. 广东工业大学学报, 2017, 34(5): 10-14. LIU Y, ZHANG Y. Convergence condition of value-iteration based adaptive dynamic programming [J]. Journal of Guangdong University of Technology, 2017, 34(5): 10-14. [29] AL-TAMIMI A, LEWIS F L, ABU-KHALAF M. Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2008, 38(4): 943-949. [30] SUN T, SUN X M. An adaptive dynamic programming scheme for nonlinear optimal control with unknown dynamics and its application to turbofan engines [J]. IEEE Transactions on Industrial Informatics, 2020, 17(1): 367-376. [31] WU N, WANG H L. Deep learning adaptive dynamic programming for real time energy management and control strategy of micro-grid [J]. Journal of Cleaner Production, 2018, 204(1): 1169-1177. [32] DAVARI M, GAO W N, JIANG Z P, et al. An optimal primary frequency control based on adaptive dynamic programming for islanded modernized microgrids [J]. IEEE Transactions on Automation Science and Engineering, 2020, 18(3): 1109-1121. [33] LI L, YAN B J, YANG C, et al. Application-oriented stochastic energy management for plug-in hybrid electric bus with AMT [J]. IEEE Transactions on Vehicular Technology, 2015, 65(6): 4459-4470. [34] LIANG J K, TANG W Y. Optimal trading strategies in continuous double auctions for transactive energy[C]//2019 North American Power Symposium (NAPS). Wichita, KS, USA : IEEE, 2019. [35] 林小峰, 丁强. 基于评价网络近似误差的自适应动态规划优化控制[J]. 控制与决策, 2015, 30(3): 495-499. LIN X F, DING Q. Adaptive dynamic programming optimal control based on approximation error of critic network [J]. Control and Decision., 2015, 30(3): 495-499. [36] MARINONI M, BUTTAZZO G. Adaptive DVS management through elastic scheduling[C]//2005 IEEE Conference on Emerging Technologies and Factory Automation. Catania, Italy: IEEE, 2005. [37] WEI Q L, LIU D R, LEWIS F L, et al. Mixed iterative adaptive dynamic programming for optimal battery energy control in smart residential microgrids [J]. IEEE Transactions on Industrial Electronics, 2017, 64(5): 4110-4120. [38] ZHU Y, ZHAO D, LI X. Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data [J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3): 714-725. [39] ZHU J, HOU Y J, LI H. Optimal control of nonlinear systems with time delays: an online ADP perspective [J]. IEEE Access, 2019, 7(1): 145574-145581. [40] HUANG M, JIANG Z P, et al. Learning-Based adaptive optimal control for connected vehicles in mixed traffic: robustness to driver reaction time [J]. IEEE Transactions on Cybernetics, 2022, 52(6): 267-5277. [41] LEWIS F L, SONG R Z, WEI Q L, et al. Multiple actor-critic structures for continuous-time optimal control using input-output data [J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(4): 851-865. [42] KWON S, XU Y J, GAUTAM N. Meeting inelastic demand in systems with storage and renewable sources [J]. IEEE Transactions on Smart Grid, 2015, 8(4): 1619-1629. [43] ZHU J Q, CHEN J J, ZHUO Y L, et al. Stochastic energy management of active distribution network based on improved approximate dynamic programming [J]. IEEE Transactions on Smart Grid, 2021, 13(1): 406-416. [44] SHUAI H, FANG J K, AI X M, et al. Optimal real-time operation strategy for microgrid: an ADP-based stochastic nonlinear optimization approach [J]. IEEE Transactions on Sustainable Energy, 2018, 10(2): 931-942. [45] LI W M, XU G Q, XU Y S. Online learning control for hybrid electric vehicle [J]. Chinese Journal of Mechanical Engineering, 2012, 25(1): 98-106. [46] DE KEYSER A, VANSOMPEL H, CREVECOEUR G. Real-time energy-efficient actuation of induction motor drives using approximate dynamic programming [J]. IEEE Transactions on Industrial Electronics, 2020, 68(12): 11837-11846. [47] LI Z M, WU L, XU Y, et al. Multi-stage real-time operation of a multi-energy microgrid with electrical and thermal energy storage assets: a data-driven mpc-adp approach [J]. IEEE Transactions on Smart Grid, 2021, 13(1): 213-226. [48] XU X, LIAN C Q, ZUO L, et al. Kernel-based approximate dynamic programming for real-time online learning control: an experimental study [J]. IEEE Transactions on Control Systems Technology, 2013, 22(1): 146-156. [49] WANG D, LIU D R, LI H L. Policy iteration algorithm for online design of robust control for a class of continuous-time nonlinear systems [J]. IEEE Transactions on Automation Science and Engineering, 2014, 11(2): 627-632. [50] LIU J, HUANG Z, XU X, et al. Multi-kernel online reinforcement learning for path tracking control of intelligent vehicles [J]. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 2021, 51(11): 6962-6975. [51] WANG Z, LIU X P, LIU K F, et al. Backstepping-based lyapunov function construction using approximate dynamic programming and sum of square techniques [J]. IEEE Transactions on Cybernetics, 2016, 47(10): 3393-3403. [52] LEWIS F L, VRABIE D, VAMVOUDAKIS K G. Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers [J]. IEEE Control Systems Magazine, 2012, 32(6): 76-105. [53] YUAN J, SAMSON S Y, ZHANG G D, et al. Design and HIL realization of an online adaptive dynamic programming approach for real-time economic operations of household energy systems [J]. IEEE Transactions on Smart Grid, 2021, 13(1): 330-341. |
[1] | 刘毅, 章云. 一种基于自适应动态规划的协同优化算法[J]. 广东工业大学学报, 2017, 34(06): 15-19. |
[2] | 刘毅, 章云. 基于值迭代的自适应动态规划的收敛条件[J]. 广东工业大学学报, 2017, 34(05): 10-14. |
[3] | 曹彦锋; 余永权; . 模糊控制多因子自调整的研究[J]. 广东工业大学学报, 1998, 15(s1): 65-71. |
|