多轮强化学习驱动的Web服务增强分类

何慧敏; 陈翀; 王涛; 程良伦

doi:10.12052/gdutxb.250184

摘要: 针对Web服务分类中长尾分布导致的尾部类别识别困难问题，本文提出一种融合强化学习的多轮数据增强与自适应损失优化方法。该方法以大语言模型为语义生成核心，构建强化学习智能体，在每一轮迭代中根据环境状态自适应地调整不同类别的增强比例与筛选阈值，驱动“生成−评估−筛选−回流”的多轮闭环数据增强；同时结合思维链推理机制，从新颖性、一致性和推理质量等多维度评估生成样本，过滤模板化和语义漂移样本，逐轮优化训练数据分布。在分类器训练阶段，本文基于当轮增强后训练集的类别频次动态计算类权重，并设计Top-k Near-Miss Focal Loss损失函数，联合刻画长尾类别与近错样本，对边界语义进行重点惩罚，从而实现面向长尾与难例的自适应损失优化。实验在真实长尾Web服务数据集、PMTD(Productive Math Tutoring Dialogue)教学对话数据集上进行，结果表明，本文方法在多项评估指标上均优于NCAL(Neural-Collapse-Advanced personalized Learning)、RGPT、SRaSLR(Social Relation Aware Service Label Recommendation Model)、LLMEmbed等基线模型，尤其在尾部类别识别方面表现突出：在多个轻量级模型上，Web数据集上Macro- \mathrmF_1 提升最高达5个百分点，Weighted- \mathrmF_1 提升约2~3个百分点。本文研究成果有效缓解了长尾数据分布带来的类别偏置问题，为开放环境下语义稀疏服务的智能识别与分类提供了可行路径。

Abstract: To address the difficulty of recognizing tail classes in Web service classification caused by long-tailed data distributions, a reinforcement learning-enhanced framework is proposed, which integrates multi-round data augmentation with adaptive loss optimization. A large language model (LLM) is employed as the core semantic generator, and a reinforcement learning agent adaptively adjusts class-specific augmentation ratios and filtering thresholds at each iteration based on the observed environment state, driving a closed-loop multi-round process of “generation-evaluation-filtering-feedback.” In parallel, a Chain-of-Thought-based reasoning mechanism is introduced to evaluate generated samples from multiple dimensions－including novelty, semantic consistency, and reasoning quality－thereby filtering out template-like and semantically drifting instances and progressively improving the training data distribution. During classifier training, class weights are dynamically computed from the frequency statistics of the augmented dataset at each iteration. A Top-k Near-Miss Focal Loss is further designed to jointly emphasize long-tailed classes and near-miss boundary samples, penalizing ambiguous semantic regions and enabling adaptive loss optimization tailored to long-tailed and hard examples. Experiments conducted on a real-world long-tailed Web service dataset and the PMTD (Productive Math Tutoring Dialogue) instructional dialogue dataset demonstrate that the proposed method outperforms mainstream baselines such as NCAL (Neural-Collapse-Advanced personalized Learning), RGPT, SRaSLR (Social Relation Aware Service Label Recommendation Model) and LLMEmbed across multiple evaluation metrics. In particular, substantial improvements are observed for tail-class recognition: on several lightweight models, Macro- \mathrmF_1 improves by up to 5 percentage points on the Web service dataset, and Weighted- \mathrmF_1 increases by approximately 2-3 percentage points. These results verify the effectiveness of the proposed approach in mitigating the bias introduced by long-tailed distributions and provide a practical solution for intelligent recognition and classification of semantically sparse services in open environments.

多轮强化学习驱动的Web服务增强分类

Reinforcement Learning-driven Multi-round Data Augmentation for Web Service Classification