广东工业大学学报 ›› 2020, Vol. 37 ›› Issue (05): 46-50.doi: 10.12052/gdutxb.200009

• 综合研究 • 上一篇    下一篇

一种提升机器人强化学习开发效率的训练模式研究

叶伟杰, 高军礼, 蒋丰, 郭靖   

  1. 广东工业大学 自动化学院,广东 广州 510006
  • 收稿日期:2020-01-09 出版日期:2020-09-17 发布日期:2020-09-17
  • 作者简介:叶伟杰(1994-),男,硕士研究生,主要研究方向为深度强化学习、路径规划,E-mail:380400483@qq.com
  • 基金资助:
    国家自然科学基金资助项目(61803103);国家留学基金资助项目(201908440537)

A Research on a Training Model to Improve the Development Efficiency of Robot Reinforcement Learning

Ye Wei-jie, Gao Jun-li, Jiang Feng, Guo Jing   

  1. School of Automation, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2020-01-09 Online:2020-09-17 Published:2020-09-17

摘要: 强化学习与深度学习结合的深度强化学习(Deep Reinforcement Learning,DRL)模型,目前被广泛应用于机器人控制领域。机器人强化学习需要在3D仿真环境中训练模型,然而在缺乏环境先验知识的情况下,在3D环境中进行试错学习会导致训练周期长、开发成本高的问题。因此提出一种贯通2D到3D的机器人强化学习训练模式,将计算量大、耗时多的工作部署到2D环境中,再把算法结果迁移到3D环境中进行测试。实验证明,这种训练模式能使基于个人电脑的机器人强化学习的开发效率提升5倍左右。

关键词: 深度强化学习, 机器人控制, 训练模式, 开发效率

Abstract: Deep reinforcement learning (DRL) model combining reinforcement learning and deep learning is currently widely used in the field of robot control. Robot reinforcement learning needs to train the model in a 3D simulation environment. However, in the absence of prior environmental knowledge, trial and error learning in a 3D environment leads to long training cycles and high development costs. To solve this problem, a training mode from 2D to 3D is proposed. Time-consuming and computationally intensive work is completed in a 2D environment, and the results are transferred to a 3D environment for testing. Experiments show that this training mode can improve the development efficiency by about five times, so that personal computers can also do research related to robot reinforcement learning.

Key words: deep reinforcement learning, robot control, training mode, development efficiency

中图分类号: 

  • TP242.6
[1] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search [J]. Nature, 2016, 529(7587): 484
[2] SUTTON R S, BARTO A G. Introduction to reinforcement learning[M]. Cambridge: MIT press, 1998: 4-6.
[3] IRPAN A. Deep reinforcement learning doesn’t work yet[EB/OL]. (2018-02-14)[2018-02-14]. https://www.alexirpan.com/2018/02/14/rl-hard.html.
[4] LEVINE S, PASTOR P, KRIZHEVSKY A, et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection [J]. The International Journal of Robotics Research, 2018, 37(4-5): 421-436
[5] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: Combining improvements in deep reinforcement learning[C]//Thirty-Second AAAI Conference on Artificial Intelligence. Orleans, Louisiana: AAAI, 2018.
[6] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv, 2017: 1707.06347.
[7] WANG J, HU J, MIN G, et al. Computation offloading in multi-access edge computing using a deep sequential model based on reinforcement learning [J]. IEEE Communications Magazine, 2019, 57(5): 64-69
[8] 吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2018, 36(1): 42-50
WU Y X, ZENG B. Trajectory tracking and dynamic obstacle avoidance of mobile robot based on deep reinforcement learning [J]. Journal of Guangdong University of Technology, 2018, 36(1): 42-50
[9] STOOKE A, ABBEEL P. Accelerated methods for deep reinforcement learning[J]. arXiv preprint arXiv, 2018: 1803.02811.
[10] HENDERSON P, CHANG W D, SHKURTI F, et al. Benchmark environments for multitask learning in continuous domains[J]. arXiv preprint arXiv, 2017: 1708.04352.
[11] TAI L, PAOLO G, LIU M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). [S.l.]:IEEE, 2017: 31-36.
[12] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[J]. arXiv preprint arXiv, 2018: 1802.09477.
[13] BROCKMAN G, CHEUNG V, PETTERSSON L, et al. Openai gym[J]. arXiv preprint arXiv, 2016: 1606.01540.
[14] TAKAYA K, ASAI T, KROUMOV V, et al. Simulation environment for mobile robots testing using ROS and Gazebo[C]//2016 20th International Conference on System Theory, Control and Computing (ICSTCC). Sinaia: IEEE, 2016: 96-101.
[15] BUŞONIU L, BABUŠKA R, DE SCHUTTER B. Multi-agent reinforcement learning: An overview[M]. Berlin Heidelberg: Springer, 2010: 183-221.
[1] 郭心德, 丁宏强. 离散制造智能工厂场景的AGV路径规划方法[J]. 广东工业大学学报, 2021, 38(06): 70-76.
[2] 吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2019, 36(01): 42-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!