基于深度强化学习的移动机器人轨迹跟踪和动态避障

doi:10.12052/gdutxb.180029

广东工业大学学报 ›› 2019, Vol. 36 ›› Issue (01): 42-50.doi: 10.12052/gdutxb.180029

基于深度强化学习的移动机器人轨迹跟踪和动态避障

吴运雄, 曾碧

广东工业大学计算机学院, 广东广州 510006

收稿日期:2018-03-08 出版日期:2019-01-25 发布日期:2018-12-05
通信作者: 曾碧(1963-),女,教授,博士,主要研究方向为智能机器人、嵌入式智能计算.E-mail:zb9215@gdut.edu.cn E-mail:zb9215@gdut.edu.cn
作者简介:吴运雄(1992-),男,硕士研究生,主要研究方向为计算机视觉、深度强化学习.
基金资助:
广东省自然科学基金资助项目（2016A030313713）；广东省应用型科技研发专项项目（2015B090922012）；广东省产学研合作专项项目（2014B090904080）

Trajectory Tracking and Dynamic Obstacle Avoidance of Mobile Robot Based on Deep Reinforcement Learning

Wu Yun-xiong, Zeng Bi

School of Computers, Guangdong University of Technology, Guangzhou 510006, China

Received:2018-03-08 Online:2019-01-25 Published:2018-12-05

摘要/Abstract

摘要： 针对移动机器人在局部可观测的非线性动态环境下，实现轨迹跟踪和动态避障时容易出错和不稳定的问题，提出了基于深度强化学习的视觉感知与决策方法.该方法以一种通用的形式将卷积神经网络的感知能力与强化学习的决策能力结合在一起，通过端对端的学习方式实现从环境的视觉感知输入到动作的直接输出控制，将系统环境感知与决策控制直接形成闭环，其中最优决策策略是通过最大化机器人与动力学环境交互的累计奖回报中学习获得.仿真实验结果证明，该方法可以满足多任务智能感知与决策要求，较好地解决了传统算法存在的容易陷入局部最优、在相近的障碍物群中震荡且不能识别路径、在狭窄通道中摆动以及障碍物附近目标不可达等问题，并且大大提高了机器人轨迹跟踪和动态避障的实时性和适应性.

关键词: 深度强化学习, 移动机器人, 轨迹跟踪, 动态避障

Abstract: A method of visual perception and decision making based on deep reinforcement learning was proposed, to solve the problem of malfunction and instability in the trajectory tracking and dynamic obstacle avoidance of mobile robot in a partly observable nonlinear dynamic environment. This method was used in a general form to combine the perceptual ability of convolutional neural network (CNN) with the decision-making ability of reinforcement learning. The visual perception input of environment was transformed into the direct output control of actions by the way of end-to-end learning style, so that the system environment perception and decision-making control directly formed a closed loop. The optimal decision-making strategy was acquired from the maximization of interactive cumulative reward between robot and dynamic environment. The results of simulation experiment showed that this method could meet the requirements of multi-task intelligent perception and decision making, and well solve problems of the traditional algorithm such as easily falling into local optimum, vibrating and failing to identify the path among the similar obstacles, wavering in the narrow passage and failing to reach the targets near obstacle. It greatly improved the instantaneity and adaptability of robot trajectory tracking and dynamic obstacle avoidance.

Key words: deep reinforcement learning, mobile robot, trajectory tracking, dynamic obstacle avoidance

中图分类号:

TP242.6

吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2019, 36(01): 42-50.

Wu Yun-xiong, Zeng Bi. Trajectory Tracking and Dynamic Obstacle Avoidance of Mobile Robot Based on Deep Reinforcement Learning[J]. Journal of Guangdong University of Technology, 2019, 36(01): 42-50.

参考文献 18

[1] 曾碧, 林展鹏, 邓杰航. 自主移动机器人走廊识别算法研究与改进[J]. 广东工业大学学报, 2016, 33(5):9-14 ZENG B, LIN Z P, DENG J H. Algorithm research on recognition and improvement for corridor of autonomous mobile robot[J]. Journal of Guangdong University of Technology, 2016, 33(5):9-14
[2] 马晓东, 曾碧, 叶林锋. 基于BA的改进视觉/惯性融合定位算法[J]. 广东工业大学学报, 2017, 34(6):32-36 MA X D, ZENG B, YE L F. An improved visual odometry/SINS integrated localizationalgor-ithm based on BA[J]. Journal of Guangdong University of Technology, 2017, 34(6):32-36
[3] PRENTICE S, ROY N. The Belief roadmap:efficient planning in belief space by factoring the covariance[J]. Robot, 2009, 29(11-2):1448-1465
[4] KOREN Y, BORENSTEIN J. Potential field methods and their inherent limitations for mobile robot navigation[J]. IEEE International Conference on Robotics and Automation, 2002, 2(2):1398-1404
[5] YANG S X, LUO C. A neural network approach to complete coverage path planning[J]. IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man & Cybernetics Society, 2004, 34(1):718-724
[6] CASTILLO O, TRUJILLO L, MELIN P. Multiple objective genetic algorithms for path-planning optimization in autonomous mobile robots[J]. Soft Computing, 2007, 11(3):269-279
[7] CLERC M, KENNEDY J. The particle swarm explosion, stability, and convergence in a multidimensional complex space[J]. IEEE Trans Evolutionary Computation, 2002, 6(1):58-73
[8] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. Computer Science, 2015, 8(6):A187-A195
[9] SUN Z J, XUE L, XU Y M, et al. Overview of deep learning[J]. Application Research of Computers, 2012, 29(8):2806-2810
[10] VOLODYMYR M, KORAY K, DAVID S. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-536
[11] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. Computer Science, 2013, 56(1):201-220
[12] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[J]. Computer Science, 2015, 24(1):1889-1897
[13] VAN H V, GUEZ A, SILVER D. Deep reinforcement learning with double q-learning[J]. Computer Science, 2015, 34(2):2094-2100
[14] GLASCHER J, DAW N, DAYAN P, et al. States versus rewards:dissociable neural prediction error signals underlying model-based and model-free reinforcement learning[J]. Neuron, 2010, 66(4):585-595
[15] SYLVAIN G, SILVER D. Monte-Carlo search and rapid action value estimation in computer go[J]. Artificial Intelligence, 2011, 175(11):1856-1875
[16] MNIH V, PUIGDOMENECH A, MEHDI M. Asynchronous methods for deep reinforcement learning[J]. Journal of Machine Learning Research, 2016, 33(6):1928-1937
[17] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533
[18] LEVINE S, FINN C, DARRELL T, et al. End to-end training of deep visuomotor policies[J]. Journal of Machine Learning Research, 2016, 17(1):1334-1373

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于深度强化学习的移动机器人轨迹跟踪和动态避障

Trajectory Tracking and Dynamic Obstacle Avoidance of Mobile Robot Based on Deep Reinforcement Learning

HTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献 18

相关文章 8

Metrics

本文评价

推荐阅读 0

[1]	蔡文琦, 阿拉什·巴哈里·科达巴德. 基于滑模控制的四旋翼飞行器鲁棒三维轨迹跟踪[J]. 广东工业大学学报, 2022, 39(05): 52-60.
[2]	王东, 黄瑞元, 李伟政, 黄之峰. 面向抓取任务的移动机器人停靠位置优化方法研究[J]. 广东工业大学学报, 2021, 38(06): 53-61.
[3]	郭心德, 丁宏强. 离散制造智能工厂场景的AGV路径规划方法[J]. 广东工业大学学报, 2021, 38(06): 70-76.
[4]	叶培楚, 李东, 章云. 基于双目强约束的直接稀疏视觉里程计[J]. 广东工业大学学报, 2021, 38(04): 65-70.
[5]	刘瑞雪, 曾碧, 汪明慧, 卢智亮. 一种基于高效边界探索的机器人自主建图方法[J]. 广东工业大学学报, 2020, 37(05): 38-45.
[6]	叶伟杰, 高军礼, 蒋丰, 郭靖. 一种提升机器人强化学习开发效率的训练模式研究[J]. 广东工业大学学报, 2020, 37(05): 46-50.
[7]	汪盛民, 林伟, 曾碧. 未知环境下基于虚拟子目标的对立Q学习机器人路径规划[J]. 广东工业大学学报, 2019, 36(01): 51-56,62.
[8]	胡凤娟，宋亚男，徐荣华，谭芳. 基于Backstepping的船舶轨迹最优跟踪研究[J]. 广东工业大学学报, 2013, 30(1): 50-54.