未知环境下基于虚拟子目标的对立Q学习机器人路径规划

    Path Planning of Opposite Q Learning Robot Based on Virtual Sub-Target in Unknown Environment

    • 摘要: 针对Q学习算法在复杂的未知环境下Q值更新速度慢, 容易产生维数灾难等问题, 提出了一种未知环境下基于虚拟子目标的对立Q学习机器人路径规划算法. 该算法根据移动机器人探索过的状态轨迹, 建立了2个状态链分别记录状态−动作对和状态−反向动作对, 并将每个单链当前状态的Q值, 依次反馈影响前一状态的Q值, 直到状态链的头端. 同时, 在局部探测域内通过寻找最优虚拟子目标的方法解决了大规模环境下Q学习容易产生维数灾难的问题. 实验结果表明, 在复杂的未知环境中, 该算法可以有效地加快算法学习的收敛速度, 提高学习效率, 以较优的路径完成机器人导航任务.

       

      Abstract: Aiming at the problem that in Q learning algorithm Q value is slow in updating speed in complex unknown environment and the dimensionality disaster is easy to occur, a path planning algorithm based on virtual subtarget for Q learning robot in unknown environment is proposed. According to the state trajectory explored by the mobile robot, two state chains are established to record the state-action pair and the state-reverse action pair respectively. The Q value of each single chain current state is fed back to the Q value of the previous state in turn till it affects the head of a single chain. Meanwhile, the problem that Q learning is prone to dimensionality disaster in large-scale environment is solved by finding the optimal virtual subtarget in the local detection domain. The experimental results show that the algorithm can effectively accelerate the convergence of the algorithm learning, improve the learning efficiency and complete the robot navigation task with a better path in the complex unknown environment.

       

    /

    返回文章
    返回