广东工业大学学报 ›› 2020, Vol. 37 ›› Issue (05): 46-50.doi: 10.12052/gdutxb.200009
叶伟杰, 高军礼, 蒋丰, 郭靖
Ye Wei-jie, Gao Jun-li, Jiang Feng, Guo Jing
摘要: 强化学习与深度学习结合的深度强化学习(Deep Reinforcement Learning,DRL)模型,目前被广泛应用于机器人控制领域。机器人强化学习需要在3D仿真环境中训练模型,然而在缺乏环境先验知识的情况下,在3D环境中进行试错学习会导致训练周期长、开发成本高的问题。因此提出一种贯通2D到3D的机器人强化学习训练模式,将计算量大、耗时多的工作部署到2D环境中,再把算法结果迁移到3D环境中进行测试。实验证明,这种训练模式能使基于个人电脑的机器人强化学习的开发效率提升5倍左右。
中图分类号:
[1] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search [J]. Nature, 2016, 529(7587): 484 [2] SUTTON R S, BARTO A G. Introduction to reinforcement learning[M]. Cambridge: MIT press, 1998: 4-6. [3] IRPAN A. Deep reinforcement learning doesn’t work yet[EB/OL]. (2018-02-14)[2018-02-14]. https://www.alexirpan.com/2018/02/14/rl-hard.html. [4] LEVINE S, PASTOR P, KRIZHEVSKY A, et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection [J]. The International Journal of Robotics Research, 2018, 37(4-5): 421-436 [5] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: Combining improvements in deep reinforcement learning[C]//Thirty-Second AAAI Conference on Artificial Intelligence. Orleans, Louisiana: AAAI, 2018. [6] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv, 2017: 1707.06347. [7] WANG J, HU J, MIN G, et al. Computation offloading in multi-access edge computing using a deep sequential model based on reinforcement learning [J]. IEEE Communications Magazine, 2019, 57(5): 64-69 [8] 吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2018, 36(1): 42-50 WU Y X, ZENG B. Trajectory tracking and dynamic obstacle avoidance of mobile robot based on deep reinforcement learning [J]. Journal of Guangdong University of Technology, 2018, 36(1): 42-50 [9] STOOKE A, ABBEEL P. Accelerated methods for deep reinforcement learning[J]. arXiv preprint arXiv, 2018: 1803.02811. [10] HENDERSON P, CHANG W D, SHKURTI F, et al. Benchmark environments for multitask learning in continuous domains[J]. arXiv preprint arXiv, 2017: 1708.04352. [11] TAI L, PAOLO G, LIU M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). [S.l.]:IEEE, 2017: 31-36. [12] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[J]. arXiv preprint arXiv, 2018: 1802.09477. [13] BROCKMAN G, CHEUNG V, PETTERSSON L, et al. Openai gym[J]. arXiv preprint arXiv, 2016: 1606.01540. [14] TAKAYA K, ASAI T, KROUMOV V, et al. Simulation environment for mobile robots testing using ROS and Gazebo[C]//2016 20th International Conference on System Theory, Control and Computing (ICSTCC). Sinaia: IEEE, 2016: 96-101. [15] BUŞONIU L, BABUŠKA R, DE SCHUTTER B. Multi-agent reinforcement learning: An overview[M]. Berlin Heidelberg: Springer, 2010: 183-221. |
[1] | 郭心德, 丁宏强. 离散制造智能工厂场景的AGV路径规划方法[J]. 广东工业大学学报, 2021, 38(06): 70-76. |
[2] | 吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2019, 36(01): 42-50. |
|