Abstract:
In the realm of complex environments for robot control tasks, Meta Reinforcement Learning has emerged as a pivotal component, leveraging prior experiences to tackle unseen, long-term, and sparsely rewarded intricate tasks. Skill-based Meta-RL methods aim to extract useful skills from task contexts, aiding agents in swiftly adapting to new environments during meta-testing. However, the skills learned by existing methods lack generality and adaptability, limiting their performance across meta-testing task sets. To address this, a Skill Discovery Meta-RL (SDMRL) approach is proposed in this paper, by learning more useful skills in the absence of reward functions. The objective of the SDMRL is formalized as maximizing an information-theoretic objective using maximum entropy policies, enabling agents to learn valuable skills and skill priors from unstructured data in an unsupervised manner. Experimental results in continuous control tasks such as Maze Navigation demonstrate the effectiveness of the SDMRL approach over previous meta reinforcement learning methods, and the learned skills proficiently address the long-term complex sparse reward tasks.