基于深度强化学习的对话策略网络

    Deep Reinforcement Learning Based Dialogue Policy Network

    • 摘要: 对话系统是人机交互不可或缺的一部分,其目标是实现自然、智能和流畅的对话体验。然而,传统的对话系统在理解用户意图、生成合理回复和保持对话连贯性方面面临着挑战。其核心原因是对话系统策略网络的监督训练与最终评价指标不匹配,目标函数无法涵盖系统所需关键性指标。近来部分研究工作开始探索强化学习导向型训练,然而现有工作仍有依赖于人为定义模板、对话轮次过多等问题。为了解决上述问题,本文采用了基于Transformer的对话策略网络构建单词级深度强化学习训练过程,通过组合对话系统多方面指标构建奖励,改善对话系统交互性能。实验结果显示,相对于传统监督训练方法,本文方法在对话性能指标BLEU和Comb上有所提升,在其他指标上也能保持相当的性能。本文的贡献在于将强化学习技术应用于对话系统的策略网络训练过程,通过实验验证了其有效性。

       

      Abstract: Dialogue systems are an integral part of human-computer interaction, and their goal is to achieve a natural, intelligent, and smooth dialog experience. However, traditional dialog systems face challenges in understanding user intent, generating reasonable responses, and maintaining dialog coherence. The core reason is the mismatch between the supervised training of the dialog system strategy network and the final evaluation metrics, and the objective function fails to cover the critical metrics required by the system. Some recent research work has begun to explore reinforcement learning-oriented training; however, the existing work still suffers from the problems of relying on artificially defined templates and too many dialog rounds. In order to solve the above problems, this study employs a Transformer-based dialog strategy network to construct a word-level deep reinforcement learning training process, which improves the interactive performance of the dialog system by combining the multifaceted metrics of the dialog system to construct rewards. The experimental results show that relative to the traditional supervised training method, the policy network for deep reinforcement learning has an improvement in the dialog performance metrics BLEU and Comb, and maintains comparable performance in other metrics. The contribution of this study is to apply reinforcement learning techniques to the training process of policy networks for dialog systems, and its effectiveness and potential are verified through experiments.

       

    /

    返回文章
    返回