Journal of Guangdong University of Technology ›› 2020, Vol. 37 ›› Issue (05): 46-50.doi: 10.12052/gdutxb.200009

Previous Articles     Next Articles

A Research on a Training Model to Improve the Development Efficiency of Robot Reinforcement Learning

Ye Wei-jie, Gao Jun-li, Jiang Feng, Guo Jing   

  1. School of Automation, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2020-01-09 Online:2020-09-17 Published:2020-09-17

Abstract: Deep reinforcement learning (DRL) model combining reinforcement learning and deep learning is currently widely used in the field of robot control. Robot reinforcement learning needs to train the model in a 3D simulation environment. However, in the absence of prior environmental knowledge, trial and error learning in a 3D environment leads to long training cycles and high development costs. To solve this problem, a training mode from 2D to 3D is proposed. Time-consuming and computationally intensive work is completed in a 2D environment, and the results are transferred to a 3D environment for testing. Experiments show that this training mode can improve the development efficiency by about five times, so that personal computers can also do research related to robot reinforcement learning.

Key words: deep reinforcement learning, robot control, training mode, development efficiency

CLC Number: 

  • TP242.6
[1] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search [J]. Nature, 2016, 529(7587): 484
[2] SUTTON R S, BARTO A G. Introduction to reinforcement learning[M]. Cambridge: MIT press, 1998: 4-6.
[3] IRPAN A. Deep reinforcement learning doesn’t work yet[EB/OL]. (2018-02-14)[2018-02-14]. https://www.alexirpan.com/2018/02/14/rl-hard.html.
[4] LEVINE S, PASTOR P, KRIZHEVSKY A, et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection [J]. The International Journal of Robotics Research, 2018, 37(4-5): 421-436
[5] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: Combining improvements in deep reinforcement learning[C]//Thirty-Second AAAI Conference on Artificial Intelligence. Orleans, Louisiana: AAAI, 2018.
[6] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv, 2017: 1707.06347.
[7] WANG J, HU J, MIN G, et al. Computation offloading in multi-access edge computing using a deep sequential model based on reinforcement learning [J]. IEEE Communications Magazine, 2019, 57(5): 64-69
[8] 吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2018, 36(1): 42-50
WU Y X, ZENG B. Trajectory tracking and dynamic obstacle avoidance of mobile robot based on deep reinforcement learning [J]. Journal of Guangdong University of Technology, 2018, 36(1): 42-50
[9] STOOKE A, ABBEEL P. Accelerated methods for deep reinforcement learning[J]. arXiv preprint arXiv, 2018: 1803.02811.
[10] HENDERSON P, CHANG W D, SHKURTI F, et al. Benchmark environments for multitask learning in continuous domains[J]. arXiv preprint arXiv, 2017: 1708.04352.
[11] TAI L, PAOLO G, LIU M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). [S.l.]:IEEE, 2017: 31-36.
[12] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[J]. arXiv preprint arXiv, 2018: 1802.09477.
[13] BROCKMAN G, CHEUNG V, PETTERSSON L, et al. Openai gym[J]. arXiv preprint arXiv, 2016: 1606.01540.
[14] TAKAYA K, ASAI T, KROUMOV V, et al. Simulation environment for mobile robots testing using ROS and Gazebo[C]//2016 20th International Conference on System Theory, Control and Computing (ICSTCC). Sinaia: IEEE, 2016: 96-101.
[15] BUŞONIU L, BABUŠKA R, DE SCHUTTER B. Multi-agent reinforcement learning: An overview[M]. Berlin Heidelberg: Springer, 2010: 183-221.
[1] Guo Xin-de, Chris Hong-qiang Ding. An AGV Path Planning Method for Discrete Manufacturing Smart Factory [J]. Journal of Guangdong University of Technology, 2021, 38(06): 70-76.
[2] Wu Yun-xiong, Zeng Bi. Trajectory Tracking and Dynamic Obstacle Avoidance of Mobile Robot Based on Deep Reinforcement Learning [J]. Journal of Guangdong University of Technology, 2019, 36(01): 42-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!