广东工业大学学报 ›› 2017, Vol. 34 ›› Issue (05): 10-14.doi: 10.12052/gdutxb.170081

• 综合研究 • 上一篇    下一篇

基于值迭代的自适应动态规划的收敛条件

刘毅, 章云   

  1. 广东工业大学 自动化学院, 广东 广州 510006
  • 收稿日期:2017-04-11 出版日期:2017-09-09 发布日期:2017-07-10
  • 通信作者: 章云(1963-),男,教授,主要研究方向为优化控制、非线性控制、多智能体技术等.E-mail:yun@gdut.edu.cn E-mail:yun@gdut.edu.cn
  • 作者简介:刘毅(1979-),男,博士研究生,主要研究方向为智能控制、优化控制.
  • 基金资助:
    国家自然科学基金资助项目(U1501251,51307025);高等学校博士学科点专项科研基金资助项目(20124420130001)

Convergence Condition of Value-iteration Based Adaptive Dynamic Programming

Liu Yi, Zhang Yun   

  1. School of Automation, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2017-04-11 Online:2017-09-09 Published:2017-07-10

摘要: 研究了应用于离散时间非仿射非线性系统的基于值迭代的自适应动态规划的收敛条件,指出了迭代性能指标函数初始化为半正定函数可保证值迭代收敛到最优,并给出了证明.

关键词: 自适应动态规划, 值迭代, 收敛

Abstract: The convergence condition of value-iteration based adaptive dynamic programming which is applied to discrete time nonlinear non-affine system is studied. Convergence of value-iteration based adaptive dynamic programming is proven. The proof shows that value iteration will converge to the optimal when the initial iterative performance index function is a positive semi-definite function.

Key words: adaptive dynamic programming, value iteration, convergence

中图分类号: 

  • TP273
[1] 张海舰, 成思源, 骆少明, 等. 基于动态规划法的B样条主动轮廓模型[J]. 广东工业大学学报, 2005, 22(4):26-30.ZHANG H J, CHENG S Y, LUO S M, et al. B-Spline active contour based on dynamic programming[J]. Journal of Guangdong University of Technology, 2005, 22(4):26-30. [2] BELLMAN R E. Dynamic Programming[M]. Princeton:Princeton University Press, 1957. [3] WERBOS P J. Advanced forecasting methods for global crisis warning and models of intelligence[J]. General Systems Yearbook, 1977, 22(6):25-38. [4] MILLER W T, SUTTON R S, WERBOS P J. A menu of designs for reinforcement learning over time, in neural networks for control[M]. Cambridge:MIT Press, 1991. [5] WERBOS P J. Approximate dynamic programming for real-time control and neural modeling, in handbook of intelligent control[M]. New York:Van Nostrand Reinhold, 1992. [6] MURRAY J J, COX C J, LENDARIS G G, et al. Adaptive dynamic programming[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 2002, 32(2):140-153. [7] PROKHOROV D V, WUNSCH D C. Adaptive critic designs[J]. IEEE Transactions on Neural Networks, 1997, 8(5):997-1007. [8] BERTSEKAS D P, TSITSIKLIS J N. Neuro-dynamic programming[M]. Belmont:Athena Scientific, 1996. [9] SUTTON R S, BARTO A G. Reinforcement learning:an introduction[M]. Cambridge:The MIT Press, 1998. [10] LEWIS F L, LIU D. Reinforcement learning and adaptive dynamic programming for feedback control[J]. IEEE Circuits & Systems Magazine, 2009, 9(3):32-50. [11] LEWIS F L, VRABIE D, VAMVOUDAKIS K G. Reinforcement learning and feedback control:using natural decision methods to design optimal adaptive controllers[J]. IEEE Control Systems, 2012, 32(6):76-105. [12] ABU-KHALAF M, LEWIS F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach[J]. Automatica, 2005, 41(5):779-791. [13] LIU D, WEI Q. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3):621-634. [14] WANG F, JIN N, LIU D, WEI Q. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound[J]. IEEE Transactions on Neural Networks, 2011, 22(1):24-36. [15] AL-TAMIMI A, LEWIS F L, ABU-KHALAF M. Discrete-time nonlinear HJB solution using approximate dynamic programming:convergence proof[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, 2008, 38(4):943-949. [16] ZHANG H, WEI Q, LUO Y. A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, 2008, 38(4):937-942. [17] WANG D, LIU D, WEI Q, et al. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming[J]. Automatica, 2012, 48(8):1825-1832. [18] WEI Q, LIU D. A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems[J]. IEEE Transactions on Automation Science and Engineering, 2014, 11(4):1176-1190. [19] WEI Q, LIU D, LIN H. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems[J]. IEEE Transactions on Cybernetics, 2016, 46(3):840-853. [20] LIAO X, WANG L, YU P. Stability of dynamical systems[M]. Amsterdam:Elsevier Press, 2007.
[1] 袁君, 章云, 张桂东, 李忠, 陈哲, 于晟龙. 基于自适应动态规划的能量管理系统研究综述[J]. 广东工业大学学报, 2022, 39(05): 21-28.
[2] 岑达康, 汪志波. 分数阶捕食者—食饵模型的变分迭代法[J]. 广东工业大学学报, 2022, 39(02): 62-65.
[3] 陈辞, 谢立华. 具有指定收敛速度的离散系统鲁棒跟踪数据驱动设计[J]. 广东工业大学学报, 2021, 38(06): 29-34.
[4] 张会琴, 汪志波. 带周期边界的时间分数阶扩散方程的差分格式[J]. 广东工业大学学报, 2019, 36(03): 74-79.
[5] 刘毅, 章云. 一种基于自适应动态规划的协同优化算法[J]. 广东工业大学学报, 2017, 34(06): 15-19.
[6] 周玉光, 曾碧, 叶林锋. 改进粒子群优化算法及其在4G网络-基站选址中的应用[J]. 广东工业大学学报, 2015, 32(2): 64-68.
[7] 卢萍, 金朝永. 收敛速度;误差迭代;神经网络;仿真; PID控制[J]. 广东工业大学学报, 2011, 28(4): 55-58.
[8] 曾世开, 李丽娟. 改进的群搜索优化算法在桁架结构形状优化设计中的应用[J]. 广东工业大学学报, 2010, 27(2): 27-31.
[9] 莫浩艺; . 解一类广义线性互补问题的神经网络模型[J]. 广东工业大学学报, 2007, 24(2): 20-23.
[10] 惠阿丽; 郑建明; 孙瑜; . 非线性系统闭环PD型迭代学习收敛性分析[J]. 广东工业大学学报, 2006, 23(2): 42-47.
[11] 尤秀英; . 下侧二重Laplace-Stieltjges积分在双带形内的增长性[J]. 广东工业大学学报, 2004, 21(1): 92-96.
[12] 尤秀英; 王福龙; . 下侧二重Dirichlet级数与L-Stieltjes积分的特殊级[J]. 广东工业大学学报, 2003, 20(2): 84-89.
[13] 尤秀英; . 二重Dirichlet级数与随机Dirichlet级数的迭代级数的收敛性[J]. 广东工业大学学报, 2003, 20(1): 72-76.
[14] 尤秀英; . 双侧二重随机Dirichlet级数的相关收敛公式[J]. 广东工业大学学报, 2002, 19(3): 91-95.
[15] 尤秀英; . 下侧L-S变换及其迭代象函数的q.s.增长性[J]. 广东工业大学学报, 2002, 19(2): 87-91.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!