Journal of Guangdong University of Technology ›› 2017, Vol. 34 ›› Issue (05): 10-14.doi: 10.12052/gdutxb.170081

Previous Articles     Next Articles

Convergence Condition of Value-iteration Based Adaptive Dynamic Programming

Liu Yi, Zhang Yun   

  1. School of Automation, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2017-04-11 Online:2017-09-09 Published:2017-07-10

Abstract: The convergence condition of value-iteration based adaptive dynamic programming which is applied to discrete time nonlinear non-affine system is studied. Convergence of value-iteration based adaptive dynamic programming is proven. The proof shows that value iteration will converge to the optimal when the initial iterative performance index function is a positive semi-definite function.

Key words: adaptive dynamic programming, value iteration, convergence

CLC Number: 

  • TP273
[1] 张海舰, 成思源, 骆少明, 等. 基于动态规划法的B样条主动轮廓模型[J]. 广东工业大学学报, 2005, 22(4):26-30.ZHANG H J, CHENG S Y, LUO S M, et al. B-Spline active contour based on dynamic programming[J]. Journal of Guangdong University of Technology, 2005, 22(4):26-30. [2] BELLMAN R E. Dynamic Programming[M]. Princeton:Princeton University Press, 1957. [3] WERBOS P J. Advanced forecasting methods for global crisis warning and models of intelligence[J]. General Systems Yearbook, 1977, 22(6):25-38. [4] MILLER W T, SUTTON R S, WERBOS P J. A menu of designs for reinforcement learning over time, in neural networks for control[M]. Cambridge:MIT Press, 1991. [5] WERBOS P J. Approximate dynamic programming for real-time control and neural modeling, in handbook of intelligent control[M]. New York:Van Nostrand Reinhold, 1992. [6] MURRAY J J, COX C J, LENDARIS G G, et al. Adaptive dynamic programming[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 2002, 32(2):140-153. [7] PROKHOROV D V, WUNSCH D C. Adaptive critic designs[J]. IEEE Transactions on Neural Networks, 1997, 8(5):997-1007. [8] BERTSEKAS D P, TSITSIKLIS J N. Neuro-dynamic programming[M]. Belmont:Athena Scientific, 1996. [9] SUTTON R S, BARTO A G. Reinforcement learning:an introduction[M]. Cambridge:The MIT Press, 1998. [10] LEWIS F L, LIU D. Reinforcement learning and adaptive dynamic programming for feedback control[J]. IEEE Circuits & Systems Magazine, 2009, 9(3):32-50. [11] LEWIS F L, VRABIE D, VAMVOUDAKIS K G. Reinforcement learning and feedback control:using natural decision methods to design optimal adaptive controllers[J]. IEEE Control Systems, 2012, 32(6):76-105. [12] ABU-KHALAF M, LEWIS F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach[J]. Automatica, 2005, 41(5):779-791. [13] LIU D, WEI Q. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3):621-634. [14] WANG F, JIN N, LIU D, WEI Q. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound[J]. IEEE Transactions on Neural Networks, 2011, 22(1):24-36. [15] AL-TAMIMI A, LEWIS F L, ABU-KHALAF M. Discrete-time nonlinear HJB solution using approximate dynamic programming:convergence proof[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, 2008, 38(4):943-949. [16] ZHANG H, WEI Q, LUO Y. A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, 2008, 38(4):937-942. [17] WANG D, LIU D, WEI Q, et al. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming[J]. Automatica, 2012, 48(8):1825-1832. [18] WEI Q, LIU D. A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems[J]. IEEE Transactions on Automation Science and Engineering, 2014, 11(4):1176-1190. [19] WEI Q, LIU D, LIN H. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems[J]. IEEE Transactions on Cybernetics, 2016, 46(3):840-853. [20] LIAO X, WANG L, YU P. Stability of dynamical systems[M]. Amsterdam:Elsevier Press, 2007.
[1] Yuan Jun, Zhang Yun, Zhang Gui-dong, Li Zhong, Chen Zhe, Yu Sheng-long. A Survey of Energy Management System Based on Adaptive Dynamic Programming [J]. Journal of Guangdong University of Technology, 2022, 39(05): 21-28.
[2] Cen Da-kang, Wang Zhi-bo. A Variational Iteration Method for Fractional Predator-Prey Model [J]. Journal of Guangdong University of Technology, 2022, 39(02): 62-65.
[3] Chen Ci, Xie Li-hua. A Data-Driven Prescribed Convergence Rate Design for Robust Tracking of Discrete-Time Systems [J]. Journal of Guangdong University of Technology, 2021, 38(06): 29-34.
[4] Zhang Hui-qin, Wang Zhi-bo. Finite Difference Schemes for Time Fractional Diffusion Equations with Periodic Boundary Conditions [J]. Journal of Guangdong University of Technology, 2019, 36(03): 74-79.
[5] Liu Yi, Zhang Yun. A Cooperative Optimization Algorithm Based on Adaptive Dynamic Programming [J]. Journal of Guangdong University of Technology, 2017, 34(06): 15-19.
[6] ZHOU Yu-Guang, ZENG Bi, YE Lin-Feng. Improved Particle Swarm Optimization and-Its Application in 4G Network Base Station Location [J]. Journal of Guangdong University of Technology, 2015, 32(2): 64-68.
[7] ZENG Shi-Kai, LI Li-Juan. Application of Improved Group Search Optimizer in Shape Optimization of Truss Structures [J]. Journal of Guangdong University of Technology, 2010, 27(2): 27-31.
[8] MO Hao-yi . Neural Network for a Kind of General Linear Complementarity Problem [J]. Journal of Guangdong University of Technology, 2007, 24(2): 20-23.
[9] HUI A-li~1,ZHENG Jian-ming~2,SUN Yu~1. A Closed-Loop PD-Type Iterative Learning Control Scheme for Nonlinear Systems and Its Convergence [J]. Journal of Guangdong University of Technology, 2006, 23(2): 42-47.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!