广东工业大学学报 ›› 2021, Vol. 38 ›› Issue (06): 29-34.doi: 10.12052/gdutxb.210105
陈辞1,2, 谢立华3
Chen Ci1,2, Xie Li-hua3
摘要: 本文研究了具有指定收敛速度的线性离散时间系统鲁棒跟踪设计问题。首先利用鲁棒输出调节理论描述了跟踪控制问题, 再结合系统数据与强化学习实现了具有指定收敛速度的跟踪控制。学习得到的控制方案不仅保证了跟踪误差渐近收敛到零, 而且具有针对不确定系统动态的鲁棒性。本文所述的指定收敛速度设计不依赖系统演化时间或精确系统模型, 因此是数据驱动的。
中图分类号:
[1] 柴天佑. 复杂工业过程运行优化与反馈控制[J]. 自动化学报, 2013, 39(11): 1744-1757. CHAI T Y. Operational optimization and feedback control for complex industrial processes [J]. Acta Automatica Sinica, 2013, 39(11): 1744-1757. [2] SICILIANO B, KHATIB O. Springer handbook of robotics [M]. Berlin:Springer-Verlag, 2016. [3] FRANCIS B A, WONHAM W M. The internal model principle of control theory [J]. Automatica, 1976, 12(5): 457-465. [4] ISIDORI A, BYRNES C I. Output regulation of nonlinear systems [J]. IEEE Transactions on Automat Control, 1990, 35(2): 131-40. [5] ÅSTR M K J, WITTENMARK B. Adaptive control [M].New York: Dover Publications, 2008. [6] GOODWIN G C, SIN K S. Adaptive filtering prediction and control [M]. New York: Dover Publications, 2014. [7] SASTRY S, BODSON M. Adaptive control: stability, convergence and robustness [M].New York: Dover Publications, 2011. [8] IOANNOU P A, SUN J. Robust adaptive control [M]. New York: Dover Publications, 2012. [9] KRSTIC M, KOKOTOVIC P V, KANELLAKOPOULOS I. Nonlinear and adaptive control design [J]. Lecture Notes in Control & Information Sciences, 1995,5(2):4475-4480. [10] IOANNOU P, FIDAN B. Adaptive control tutorial [M]. Philadelphia:Society for Industrial and Applied, 2006. [11] ARIYUR K B, KRSTIĆ M. Real time optimization by extremum seeking control [M]. New Jersey: John Wiley & Sons, 2003. [12] SCHEINKER A, KRSTIĆ M. Model-free stabilization by extremum seeking [M]. Switzerland:Springer International Publishing,2017. [13] LEWIS F L, VRABIE D, SYRMOS V L. Optimal control [M]. New Jersey: John Wiley & Sons, 2012. [14] SUTTON R S, BARTO A G. Reinforcement learning: an introduction [M]. Cambridge: MIT Press, 2018. [15] ZHANG H, LIU D, LUO Y, et al. Adaptive dynamic programming for control: algorithms and stability [M]. London: Springer-Verlag, 2012. [16] VRABIE D, VAMVOUDAKIS K G, LEWIS F L. Optimal adaptive control and differential games by reinforcement learning principles [M]. London:The Institution of Engineering and Technology, 2013. [17] JIANG Y, JIANG Z P. Robust adaptive dynamic programming [M]. New Jersey: John Wiley & Sons, 2017. [18] KAMALAPURKAR R, WALTERS P, ROSENFELD J, et al. Reinforcement learning for optimal feedback control [M]. Switzerland:Springer International Publishing, 2018. [19] JIANG Z P, BIAN T, GAO W. Learning-based control: a tutorial and some recent results [J]. Foundations and Trends® in Systems and Control, 2020, 8(3): 176-284. [20] CHEN C, MODARES H, XIE K, et al. Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics [J]. IEEE Transactions on Automatic Control, 2019, 64(11): 4423-4438. [21] CHEN C, LEWIS F L, XIE K, et al. Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems [J]. Automatica, 2020, 119: 109081. [22] KIUMARSI B, VAMVOUDAKIS K G, MODARES H, et al. Optimal and autonomous control using reinforcement learning: a survey [J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(6): 2042-2062. [23] KIUMARSI B, LEWIS F L. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems [J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(1): 140-151. [24] KIUMARSI B, LEWIS F L, MODARES H, et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics [J]. Automatica, 2014, 50(4): 1167-1175. [25] KIUMARSI B, LEWIS F L, JIANG Z P. H∞ control of linear discrete-time systems: off-policy reinforcement learning [J]. Automatica, 2017, 78: 144-152. [26] RIZVI S A A, LIN Z. Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control [J]. Automatica, 2018, 95: 213-221. [27] JIANG Y, KIUMARSI B, FAN J, et al. Optimal output regulation of linear discrete-time systems with unknown dynamics using reinforcement learning [J]. IEEE Transactions on Cybernetics, 2019, 50(7): 3147-3156. [28] LEWIS F L, VAMVOUDAKIS K G. Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2010, 41(1): 14-25. [29] KIUMARSI B, LEWIS F L, NAGHIBI-SISTANI M B, et al. Optimal tracking control of unknown discrete-time linear systems using input-output measured data [J]. IEEE Transactions on Cybernetics, 2015, 45(12): 2770-2779. [30] GAO W, JIANG Z P. Adaptive optimal output regulation via output-feedback: an adaptive dynamic programing approach[C]//Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC). Las Vegas: IEEE, 2016 [31] JIANG Y, FAN J, GAO W, et al. Cooperative adaptive optimal output regulation of nonlinear discrete-time multi-agent systems [J]. Automatica, 2020, 121: 109149. [32] CHEN C, XIE L, JIANG Y, et al. Robust output regulation and reinforcement learning-based output tracking design for unknown linear discrete-time systems [EB/OL].(2021-01-21)[2021-07-01]. https://arxiv.org/abs/2101.08706 [33] 姜艺, 范家璐, 柴天佑. 数据驱动的保证收敛速率最优输出调节[J/OL]. 自动化学报, 2021, 47(x): 1-12. http://www.aas.net.cn/cn/article/doi/10.16383/j.aas.c200932. JIANG Y, FAN J L, CHAI T Y. Data-driven optimal output regulation with assured convergence rate [J/OL]. Acta Automatica Sinica, 2021, 47(x): 1-12. http://www.aas.net.cn/cn/article/doi/10.16383/j.aas.c200932. [34] HUANG J. Nonlinear output regulation: theory and applications [M]. Philadelphia: Society for Industrial and Applied Mathematics,2004. [35] LANCASTER P, RODMAN L. Algebraic riccati equations [M]. Oxford: Clarendon Press, 1995. |
[1] | 李明磊, 章阳, 康嘉文, 徐敏锐, Dusit Niyato. 基于多智能体强化学习的区块链赋能车联网中的安全数据共享[J]. 广东工业大学学报, 2021, 38(06): 62-69. |
[2] | 郭心德, 丁宏强. 离散制造智能工厂场景的AGV路径规划方法[J]. 广东工业大学学报, 2021, 38(06): 70-76. |
[3] | 郑思远, 崔苗, 张广驰. 基于强化学习的无人机安全通信轨迹在线优化策略[J]. 广东工业大学学报, 2021, 38(04): 59-64. |
[4] | 叶伟杰, 高军礼, 蒋丰, 郭靖. 一种提升机器人强化学习开发效率的训练模式研究[J]. 广东工业大学学报, 2020, 37(05): 46-50. |
[5] | 吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2019, 36(01): 42-50. |
[6] | 刘毅, 章云. 一种基于自适应动态规划的协同优化算法[J]. 广东工业大学学报, 2017, 34(06): 15-19. |
[7] | 刘毅, 章云. 基于值迭代的自适应动态规划的收敛条件[J]. 广东工业大学学报, 2017, 34(05): 10-14. |
|