广东工业大学学报 ›› 2019, Vol. 36 ›› Issue (01): 42-50.doi: 10.12052/gdutxb.180029
吴运雄, 曾碧
Wu Yun-xiong, Zeng Bi
摘要: 针对移动机器人在局部可观测的非线性动态环境下,实现轨迹跟踪和动态避障时容易出错和不稳定的问题,提出了基于深度强化学习的视觉感知与决策方法.该方法以一种通用的形式将卷积神经网络的感知能力与强化学习的决策能力结合在一起,通过端对端的学习方式实现从环境的视觉感知输入到动作的直接输出控制,将系统环境感知与决策控制直接形成闭环,其中最优决策策略是通过最大化机器人与动力学环境交互的累计奖回报中学习获得.仿真实验结果证明,该方法可以满足多任务智能感知与决策要求,较好地解决了传统算法存在的容易陷入局部最优、在相近的障碍物群中震荡且不能识别路径、在狭窄通道中摆动以及障碍物附近目标不可达等问题,并且大大提高了机器人轨迹跟踪和动态避障的实时性和适应性.
中图分类号:
[1] 曾碧, 林展鹏, 邓杰航. 自主移动机器人走廊识别算法研究与改进[J]. 广东工业大学学报, 2016, 33(5):9-14 ZENG B, LIN Z P, DENG J H. Algorithm research on recognition and improvement for corridor of autonomous mobile robot[J]. Journal of Guangdong University of Technology, 2016, 33(5):9-14 [2] 马晓东, 曾碧, 叶林锋. 基于BA的改进视觉/惯性融合定位算法[J]. 广东工业大学学报, 2017, 34(6):32-36 MA X D, ZENG B, YE L F. An improved visual odometry/SINS integrated localizationalgor-ithm based on BA[J]. Journal of Guangdong University of Technology, 2017, 34(6):32-36 [3] PRENTICE S, ROY N. The Belief roadmap:efficient planning in belief space by factoring the covariance[J]. Robot, 2009, 29(11-2):1448-1465 [4] KOREN Y, BORENSTEIN J. Potential field methods and their inherent limitations for mobile robot navigation[J]. IEEE International Conference on Robotics and Automation, 2002, 2(2):1398-1404 [5] YANG S X, LUO C. A neural network approach to complete coverage path planning[J]. IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man & Cybernetics Society, 2004, 34(1):718-724 [6] CASTILLO O, TRUJILLO L, MELIN P. Multiple objective genetic algorithms for path-planning optimization in autonomous mobile robots[J]. Soft Computing, 2007, 11(3):269-279 [7] CLERC M, KENNEDY J. The particle swarm explosion, stability, and convergence in a multidimensional complex space[J]. IEEE Trans Evolutionary Computation, 2002, 6(1):58-73 [8] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. Computer Science, 2015, 8(6):A187-A195 [9] SUN Z J, XUE L, XU Y M, et al. Overview of deep learning[J]. Application Research of Computers, 2012, 29(8):2806-2810 [10] VOLODYMYR M, KORAY K, DAVID S. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-536 [11] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. Computer Science, 2013, 56(1):201-220 [12] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[J]. Computer Science, 2015, 24(1):1889-1897 [13] VAN H V, GUEZ A, SILVER D. Deep reinforcement learning with double q-learning[J]. Computer Science, 2015, 34(2):2094-2100 [14] GLASCHER J, DAW N, DAYAN P, et al. States versus rewards:dissociable neural prediction error signals underlying model-based and model-free reinforcement learning[J]. Neuron, 2010, 66(4):585-595 [15] SYLVAIN G, SILVER D. Monte-Carlo search and rapid action value estimation in computer go[J]. Artificial Intelligence, 2011, 175(11):1856-1875 [16] MNIH V, PUIGDOMENECH A, MEHDI M. Asynchronous methods for deep reinforcement learning[J]. Journal of Machine Learning Research, 2016, 33(6):1928-1937 [17] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533 [18] LEVINE S, FINN C, DARRELL T, et al. End to-end training of deep visuomotor policies[J]. Journal of Machine Learning Research, 2016, 17(1):1334-1373 |
[1] | 蔡文琦, 阿拉什·巴哈里·科达巴德. 基于滑模控制的四旋翼飞行器鲁棒三维轨迹跟踪[J]. 广东工业大学学报, 2022, 39(05): 52-60. |
[2] | 王东, 黄瑞元, 李伟政, 黄之峰. 面向抓取任务的移动机器人停靠位置优化方法研究[J]. 广东工业大学学报, 2021, 38(06): 53-61. |
[3] | 郭心德, 丁宏强. 离散制造智能工厂场景的AGV路径规划方法[J]. 广东工业大学学报, 2021, 38(06): 70-76. |
[4] | 叶培楚, 李东, 章云. 基于双目强约束的直接稀疏视觉里程计[J]. 广东工业大学学报, 2021, 38(04): 65-70. |
[5] | 刘瑞雪, 曾碧, 汪明慧, 卢智亮. 一种基于高效边界探索的机器人自主建图方法[J]. 广东工业大学学报, 2020, 37(05): 38-45. |
[6] | 叶伟杰, 高军礼, 蒋丰, 郭靖. 一种提升机器人强化学习开发效率的训练模式研究[J]. 广东工业大学学报, 2020, 37(05): 46-50. |
[7] | 汪盛民, 林伟, 曾碧. 未知环境下基于虚拟子目标的对立Q学习机器人路径规划[J]. 广东工业大学学报, 2019, 36(01): 51-56,62. |
[8] | 胡凤娟,宋亚男,徐荣华,谭芳. 基于Backstepping的船舶轨迹最优跟踪研究[J]. 广东工业大学学报, 2013, 30(1): 50-54. |
|