具有指定收敛速度的离散系统鲁棒跟踪数据驱动设计

doi:10.12052/gdutxb.210105

广东工业大学学报 ›› 2021, Vol. 38 ›› Issue (06): 29-34.doi: 10.12052/gdutxb.210105

具有指定收敛速度的离散系统鲁棒跟踪数据驱动设计

陈辞^1,2, 谢立华³

1. 广东工业大学自动化学院，广东广州 510006;
2. 广东省物联网信息技术重点实验室, 广东广州 510006;
3. 新加坡南洋理工大学电气电子工程学院，新加坡 639798

收稿日期:2021-07-08 出版日期:2021-11-10 发布日期:2021-11-09
通信作者: 谢立华(1964–)，男，博士，教授，新加坡工程院院士，主要研究方向为鲁棒控制、网络化控制、无人系统等，E-mail：ELHXIE@ntu.edu.sg E-mail:ELHXIE@ntu.edu.sg
作者简介:陈辞(1989–)，男，博士，主要研究方向为强化学习反馈控制、计算智能等，E-mail：ci.chen@gdut.edu.cn
基金资助:
国家自然科学基金资助项目(61703112，U1911401，61703112，61973087)；流程工业综合自动化国家重点实验室开放课题(2020-KF-21-02)

A Data-Driven Prescribed Convergence Rate Design for Robust Tracking of Discrete-Time Systems

Chen Ci^1,2, Xie Li-hua³

1. School of Automation, Guangdong University of Technology, Guangzhou 510006, China;
2. Guangdong Key Laboratory of IoT Information Technology, Guangzhou 510006, China;
3. School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore

Received:2021-07-08 Online:2021-11-10 Published:2021-11-09

摘要/Abstract

摘要： 本文研究了具有指定收敛速度的线性离散时间系统鲁棒跟踪设计问题。首先利用鲁棒输出调节理论描述了跟踪控制问题, 再结合系统数据与强化学习实现了具有指定收敛速度的跟踪控制。学习得到的控制方案不仅保证了跟踪误差渐近收敛到零, 而且具有针对不确定系统动态的鲁棒性。本文所述的指定收敛速度设计不依赖系统演化时间或精确系统模型, 因此是数据驱动的。

关键词: 强化学习, 指定收敛速度, 数据驱动, 值迭代, 跟踪控制

Abstract: A robust tracking control problem for linear discrete-time systems with a prescribed convergence rate is considered. The robust tracking problem is formulated by utilizing robust output regulation and is subsequently solved by reinforcement learning with integration of the prescribed convergence rate. The learned controller ensures that the tracking error asymptotically converges to zero, meanwhile it is robust to uncertain system dynamics. The proposed convergence rate design is data-driven in the sense that it does not depend on the time for the system evolution or the accurate system model.

Key words: reinforcement learning, prescribed convergence rate, data-driven, value iteration, tracking control

中图分类号:

TP273

陈辞, 谢立华. 具有指定收敛速度的离散系统鲁棒跟踪数据驱动设计[J]. 广东工业大学学报, 2021, 38(06): 29-34.

Chen Ci, Xie Li-hua. A Data-Driven Prescribed Convergence Rate Design for Robust Tracking of Discrete-Time Systems[J]. Journal of Guangdong University of Technology, 2021, 38(06): 29-34.

参考文献

[1] 柴天佑. 复杂工业过程运行优化与反馈控制[J]. 自动化学报, 2013, 39(11): 1744-1757.
CHAI T Y. Operational optimization and feedback control for complex industrial processes [J]. Acta Automatica Sinica, 2013, 39(11): 1744-1757.
[2] SICILIANO B, KHATIB O. Springer handbook of robotics [M]. Berlin:Springer-Verlag, 2016.
[3] FRANCIS B A, WONHAM W M. The internal model principle of control theory [J]. Automatica, 1976, 12(5): 457-465.
[4] ISIDORI A, BYRNES C I. Output regulation of nonlinear systems [J]. IEEE Transactions on Automat Control, 1990, 35(2): 131-40.
[5] ÅSTR M K J, WITTENMARK B. Adaptive control [M].New York: Dover Publications, 2008.
[6] GOODWIN G C, SIN K S. Adaptive filtering prediction and control [M]. New York: Dover Publications, 2014.
[7] SASTRY S, BODSON M. Adaptive control: stability, convergence and robustness [M].New York: Dover Publications, 2011.
[8] IOANNOU P A, SUN J. Robust adaptive control [M]. New York: Dover Publications, 2012.
[9] KRSTIC M, KOKOTOVIC P V, KANELLAKOPOULOS I. Nonlinear and adaptive control design [J]. Lecture Notes in Control & Information Sciences, 1995,5(2):4475-4480.
[10] IOANNOU P, FIDAN B. Adaptive control tutorial [M]. Philadelphia:Society for Industrial and Applied, 2006.
[11] ARIYUR K B, KRSTIĆ M. Real time optimization by extremum seeking control [M]. New Jersey: John Wiley & Sons, 2003.
[12] SCHEINKER A, KRSTIĆ M. Model-free stabilization by extremum seeking [M]. Switzerland:Springer International Publishing,2017.
[13] LEWIS F L, VRABIE D, SYRMOS V L. Optimal control [M]. New Jersey: John Wiley & Sons, 2012.
[14] SUTTON R S, BARTO A G. Reinforcement learning: an introduction [M]. Cambridge: MIT Press, 2018.
[15] ZHANG H, LIU D, LUO Y, et al. Adaptive dynamic programming for control: algorithms and stability [M]. London: Springer-Verlag, 2012.
[16] VRABIE D, VAMVOUDAKIS K G, LEWIS F L. Optimal adaptive control and differential games by reinforcement learning principles [M]. London:The Institution of Engineering and Technology, 2013.
[17] JIANG Y, JIANG Z P. Robust adaptive dynamic programming [M]. New Jersey: John Wiley & Sons, 2017.
[18] KAMALAPURKAR R, WALTERS P, ROSENFELD J, et al. Reinforcement learning for optimal feedback control [M]. Switzerland:Springer International Publishing, 2018.
[19] JIANG Z P, BIAN T, GAO W. Learning-based control: a tutorial and some recent results [J]. Foundations and Trends^® in Systems and Control, 2020, 8(3): 176-284.
[20] CHEN C, MODARES H, XIE K, et al. Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics [J]. IEEE Transactions on Automatic Control, 2019, 64(11): 4423-4438.
[21] CHEN C, LEWIS F L, XIE K, et al. Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems [J]. Automatica, 2020, 119: 109081.
[22] KIUMARSI B, VAMVOUDAKIS K G, MODARES H, et al. Optimal and autonomous control using reinforcement learning: a survey [J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(6): 2042-2062.
[23] KIUMARSI B, LEWIS F L. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems [J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(1): 140-151.
[24] KIUMARSI B, LEWIS F L, MODARES H, et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics [J]. Automatica, 2014, 50(4): 1167-1175.
[25] KIUMARSI B, LEWIS F L, JIANG Z P. H∞ control of linear discrete-time systems: off-policy reinforcement learning [J]. Automatica, 2017, 78: 144-152.
[26] RIZVI S A A, LIN Z. Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control [J]. Automatica, 2018, 95: 213-221.
[27] JIANG Y, KIUMARSI B, FAN J, et al. Optimal output regulation of linear discrete-time systems with unknown dynamics using reinforcement learning [J]. IEEE Transactions on Cybernetics, 2019, 50(7): 3147-3156.
[28] LEWIS F L, VAMVOUDAKIS K G. Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2010, 41(1): 14-25.
[29] KIUMARSI B, LEWIS F L, NAGHIBI-SISTANI M B, et al. Optimal tracking control of unknown discrete-time linear systems using input-output measured data [J]. IEEE Transactions on Cybernetics, 2015, 45(12): 2770-2779.
[30] GAO W, JIANG Z P. Adaptive optimal output regulation via output-feedback: an adaptive dynamic programing approach[C]//Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC). Las Vegas: IEEE, 2016
[31] JIANG Y, FAN J, GAO W, et al. Cooperative adaptive optimal output regulation of nonlinear discrete-time multi-agent systems [J]. Automatica, 2020, 121: 109149.
[32] CHEN C, XIE L, JIANG Y, et al. Robust output regulation and reinforcement learning-based output tracking design for unknown linear discrete-time systems [EB/OL].(2021-01-21)[2021-07-01]. https://arxiv.org/abs/2101.08706
[33] 姜艺, 范家璐, 柴天佑. 数据驱动的保证收敛速率最优输出调节[J/OL]. 自动化学报, 2021, 47(x): 1-12. http://www.aas.net.cn/cn/article/doi/10.16383/j.aas.c200932.
JIANG Y, FAN J L, CHAI T Y. Data-driven optimal output regulation with assured convergence rate [J/OL]. Acta Automatica Sinica, 2021, 47(x): 1-12. http://www.aas.net.cn/cn/article/doi/10.16383/j.aas.c200932.
[34] HUANG J. Nonlinear output regulation: theory and applications [M]. Philadelphia: Society for Industrial and Applied Mathematics，2004.
[35] LANCASTER P, RODMAN L. Algebraic riccati equations [M]. Oxford: Clarendon Press, 1995.

Metrics

Viewed

Full text

3529

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	12	3517

From	Others	local

Times	425	3104
Rate	12%	88%

Abstract

1645

Just accepted	Online first	Issue

0	10	1635

From	Others	local

Times	68	1577
Rate	4%	96%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed

具有指定收敛速度的离散系统鲁棒跟踪数据驱动设计

A Data-Driven Prescribed Convergence Rate Design for Robust Tracking of Discrete-Time Systems

HTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 7

Metrics

本文评价

推荐阅读 0

[1]	李明磊, 章阳, 康嘉文, 徐敏锐, Dusit Niyato. 基于多智能体强化学习的区块链赋能车联网中的安全数据共享[J]. 广东工业大学学报, 2021, 38(06): 62-69.
[2]	郭心德, 丁宏强. 离散制造智能工厂场景的AGV路径规划方法[J]. 广东工业大学学报, 2021, 38(06): 70-76.
[3]	郑思远, 崔苗, 张广驰. 基于强化学习的无人机安全通信轨迹在线优化策略[J]. 广东工业大学学报, 2021, 38(04): 59-64.
[4]	叶伟杰, 高军礼, 蒋丰, 郭靖. 一种提升机器人强化学习开发效率的训练模式研究[J]. 广东工业大学学报, 2020, 37(05): 46-50.
[5]	吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2019, 36(01): 42-50.
[6]	刘毅, 章云. 一种基于自适应动态规划的协同优化算法[J]. 广东工业大学学报, 2017, 34(06): 15-19.
[7]	刘毅, 章云. 基于值迭代的自适应动态规划的收敛条件[J]. 广东工业大学学报, 2017, 34(05): 10-14.