基于技能发现的元强化学习

    Meta Reinforcement Learning Based on Skill Discovery

    • 摘要: 在机器人控制任务的复杂环境中,元强化学习(Meta Reinforcement Learning, Meta-RL) 作为关键组成部分,它可以利用先前的经验来解决未见过的、长期的以及奖励稀疏的复杂任务。基于技能的元强化学习(Skill-based Meta-RL) 方法旨在利用任务上下文提取出有用的技能,这些技能可以帮助智能体在元测试阶段快速适应新环境。然而现有的方法所学到的技能缺乏泛用性与适应性,从而限制了其在元测试任务集中的表现。为了克服这一限制,本文提出了一种基于技能发现的元强化学习(Skill Discovery Meta-RL, SDMRL) 方法,通过此方法可以在没有奖励函数的情况下学习到更加有用的技能。本文将目标形式化为利用最大熵策略最大化信息论目标,这个简单而有效的探索目标能让智能体从非结构化数据中无监督地学习到有用的技能和技能先验。在迷宫导航(Maze Navigation) 连续控制任务中的实验表明,SDMRL方法比以往的元强化学习方法更有效,学习到的技能可以解决长期的复杂稀疏奖励任务。

       

      Abstract: In the realm of complex environments for robot control tasks, Meta Reinforcement Learning has emerged as a pivotal component, leveraging prior experiences to tackle unseen, long-term, and sparsely rewarded intricate tasks. Skill-based Meta-RL methods aim to extract useful skills from task contexts, aiding agents in swiftly adapting to new environments during meta-testing. However, the skills learned by existing methods lack generality and adaptability, limiting their performance across meta-testing task sets. To address this, a Skill Discovery Meta-RL (SDMRL) approach is proposed in this paper, by learning more useful skills in the absence of reward functions. The objective of the SDMRL is formalized as maximizing an information-theoretic objective using maximum entropy policies, enabling agents to learn valuable skills and skill priors from unstructured data in an unsupervised manner. Experimental results in continuous control tasks such as Maze Navigation demonstrate the effectiveness of the SDMRL approach over previous meta reinforcement learning methods, and the learned skills proficiently address the long-term complex sparse reward tasks.

       

    /

    返回文章
    返回