Journal of Guangdong University of Technology ›› 2019, Vol. 36 ›› Issue (01): 42-50.doi: 10.12052/gdutxb.180029

Previous Articles     Next Articles

Trajectory Tracking and Dynamic Obstacle Avoidance of Mobile Robot Based on Deep Reinforcement Learning

Wu Yun-xiong, Zeng Bi   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2018-03-08 Online:2019-01-25 Published:2018-12-05

Abstract: A method of visual perception and decision making based on deep reinforcement learning was proposed, to solve the problem of malfunction and instability in the trajectory tracking and dynamic obstacle avoidance of mobile robot in a partly observable nonlinear dynamic environment. This method was used in a general form to combine the perceptual ability of convolutional neural network (CNN) with the decision-making ability of reinforcement learning. The visual perception input of environment was transformed into the direct output control of actions by the way of end-to-end learning style, so that the system environment perception and decision-making control directly formed a closed loop. The optimal decision-making strategy was acquired from the maximization of interactive cumulative reward between robot and dynamic environment. The results of simulation experiment showed that this method could meet the requirements of multi-task intelligent perception and decision making, and well solve problems of the traditional algorithm such as easily falling into local optimum, vibrating and failing to identify the path among the similar obstacles, wavering in the narrow passage and failing to reach the targets near obstacle. It greatly improved the instantaneity and adaptability of robot trajectory tracking and dynamic obstacle avoidance.

Key words: deep reinforcement learning, mobile robot, trajectory tracking, dynamic obstacle avoidance

CLC Number: 

  • TP242.6
[1] 曾碧, 林展鹏, 邓杰航. 自主移动机器人走廊识别算法研究与改进[J]. 广东工业大学学报, 2016, 33(5):9-14 ZENG B, LIN Z P, DENG J H. Algorithm research on recognition and improvement for corridor of autonomous mobile robot[J]. Journal of Guangdong University of Technology, 2016, 33(5):9-14
[2] 马晓东, 曾碧, 叶林锋. 基于BA的改进视觉/惯性融合定位算法[J]. 广东工业大学学报, 2017, 34(6):32-36 MA X D, ZENG B, YE L F. An improved visual odometry/SINS integrated localizationalgor-ithm based on BA[J]. Journal of Guangdong University of Technology, 2017, 34(6):32-36
[3] PRENTICE S, ROY N. The Belief roadmap:efficient planning in belief space by factoring the covariance[J]. Robot, 2009, 29(11-2):1448-1465
[4] KOREN Y, BORENSTEIN J. Potential field methods and their inherent limitations for mobile robot navigation[J]. IEEE International Conference on Robotics and Automation, 2002, 2(2):1398-1404
[5] YANG S X, LUO C. A neural network approach to complete coverage path planning[J]. IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man & Cybernetics Society, 2004, 34(1):718-724
[6] CASTILLO O, TRUJILLO L, MELIN P. Multiple objective genetic algorithms for path-planning optimization in autonomous mobile robots[J]. Soft Computing, 2007, 11(3):269-279
[7] CLERC M, KENNEDY J. The particle swarm explosion, stability, and convergence in a multidimensional complex space[J]. IEEE Trans Evolutionary Computation, 2002, 6(1):58-73
[8] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. Computer Science, 2015, 8(6):A187-A195
[9] SUN Z J, XUE L, XU Y M, et al. Overview of deep learning[J]. Application Research of Computers, 2012, 29(8):2806-2810
[10] VOLODYMYR M, KORAY K, DAVID S. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-536
[11] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. Computer Science, 2013, 56(1):201-220
[12] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[J]. Computer Science, 2015, 24(1):1889-1897
[13] VAN H V, GUEZ A, SILVER D. Deep reinforcement learning with double q-learning[J]. Computer Science, 2015, 34(2):2094-2100
[14] GLASCHER J, DAW N, DAYAN P, et al. States versus rewards:dissociable neural prediction error signals underlying model-based and model-free reinforcement learning[J]. Neuron, 2010, 66(4):585-595
[15] SYLVAIN G, SILVER D. Monte-Carlo search and rapid action value estimation in computer go[J]. Artificial Intelligence, 2011, 175(11):1856-1875
[16] MNIH V, PUIGDOMENECH A, MEHDI M. Asynchronous methods for deep reinforcement learning[J]. Journal of Machine Learning Research, 2016, 33(6):1928-1937
[17] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533
[18] LEVINE S, FINN C, DARRELL T, et al. End to-end training of deep visuomotor policies[J]. Journal of Machine Learning Research, 2016, 17(1):1334-1373
[1] Cai Wen-qi, Kordabad Arash Bahari. Sliding Mode Control for Robust 3D Trajectory Tracking of Quadcopter Unmanned Autonomous Vehicles [J]. Journal of Guangdong University of Technology, 2022, 39(05): 52-60.
[2] Wang Dong, Huang Rui-yuan, Li Wei-zheng, Huang Zhi-feng. A Research on Docking Position Optimization Method of Mobile Robot for Grasping Task [J]. Journal of Guangdong University of Technology, 2021, 38(06): 53-61.
[3] Guo Xin-de, Chris Hong-qiang Ding. An AGV Path Planning Method for Discrete Manufacturing Smart Factory [J]. Journal of Guangdong University of Technology, 2021, 38(06): 70-76.
[4] Ye Pei-chu, Li Dong, Zhang Yun. Direct Sparse Visual Odometer Based on Enhanced Stereo-Camera Constraints [J]. Journal of Guangdong University of Technology, 2021, 38(04): 65-70.
[5] Liu Rui-xue, Zeng Bi, Wang Ming-hui, Lu Zhi-liang. An Autonomous Mapping Method for Robot Based on Efficient Frontier Exploration [J]. Journal of Guangdong University of Technology, 2020, 37(05): 38-45.
[6] Ye Wei-jie, Gao Jun-li, Jiang Feng, Guo Jing. A Research on a Training Model to Improve the Development Efficiency of Robot Reinforcement Learning [J]. Journal of Guangdong University of Technology, 2020, 37(05): 46-50.
[7] Wang Sheng-min, Lin Wei, Zeng Bi. Path Planning of Opposite Q Learning Robot Based on Virtual Sub-Target in Unknown Environment [J]. Journal of Guangdong University of Technology, 2019, 36(01): 51-56,62.
[8] Hu Feng-juan, Song Yanan, Xu Ronghua,Tan Fang. Optimal Ship Trajectory Tracking Based on Backstepping [J]. Journal of Guangdong University of Technology, 2013, 30(1): 50-54.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!