Abstract:
Dialogue systems are an integral part of human-computer interaction, and their goal is to achieve a natural, intelligent, and smooth dialog experience. However, traditional dialog systems face challenges in understanding user intent, generating reasonable responses, and maintaining dialog coherence. The core reason is the mismatch between the supervised training of the dialog system strategy network and the final evaluation metrics, and the objective function fails to cover the critical metrics required by the system. Some recent research work has begun to explore reinforcement learning-oriented training; however, the existing work still suffers from the problems of relying on artificially defined templates and too many dialog rounds. In order to solve the above problems, this study employs a Transformer-based dialog strategy network to construct a word-level deep reinforcement learning training process, which improves the interactive performance of the dialog system by combining the multifaceted metrics of the dialog system to construct rewards. The experimental results show that relative to the traditional supervised training method, the policy network for deep reinforcement learning has an improvement in the dialog performance metrics BLEU and Comb, and maintains comparable performance in other metrics. The contribution of this study is to apply reinforcement learning techniques to the training process of policy networks for dialog systems, and its effectiveness and potential are verified through experiments.