基于深度强化学习的车联网频谱分配研究

    Deep Reinforcement Learning-based Spectrum Allocation in Vehicular Networks

    • 摘要: 为了缓解车联网中车辆用户数持续增长与频谱资源有限之间的矛盾,提出一种基于单调值函数分解结合自注意力机制的多智能体深度强化学习算法,旨在通过学习优化车联网中车对车链路(Vehicle-to-Vehicle,V2V) 信道选择和功率控制的联合策略,实现车对基础设施链路(Vehicle-to-Infrastructure,V2I) 总信道容量最大化,同时保证V2V链路传输时延和可靠性要求。针对车联网环境动态变化导致V2V链路无法实时收集完整信道状态信息的问题,为每条V2V链路构建深度循环Q网络,能使每条V2V链路基于自身局部观测信息自主决策。为了确保每条V2V链路局部策略的优化方向与全局利益最大化保持一致,设计了具有单调性约束的全局混合网络指引训练优化。此外,通过构建基于自注意力机制的信息交互模块,进一步优化V2V链路间的协作。仿真结果表明,与基线算法相比,本文所提出的算法在V2I链路总信道容量上提高了1.44~8.24个百分点,在V2V链路传输时延上缩短了1.93~15.04个百分点,能有效指导V2V链路根据环境的变化及时调整信道选择和功率控制的决策,从而保证了链路的通信质量。

       

      Abstract: A dynamic spectrum allocation scheme is proposed to address the growing number of vehicles and limited spectrum resources in vehicular networks. It integrates the self-attention mechanism and monotonic value function factorisation into deep multi-agent reinforcement learning to optimize channel selection and power levels for vehicle-to-vehicle (V2V) links. The global objective is to maximize the sum throughput of vehicle-to-infrastructure (V2I) links, while meeting the latency and reliability constraints of V2V links. To handle incomplete real-time channel state information due to dynamic environment, a deep recurrent Q-network is deployed for each V2V link, enabling autonomous decision-making based on local observations. To align the local strategy optimization of each V2V link with the global objective, a global mixing network with monotonicity constraints is designed to guide the algorithm training. Additionally, an information interaction model based on the self-attention mechanism further optimizes collaboration between V2V links. Compared with the baselines, the proposed algorithm increases the sum throughput of V2I links by 1.44~8.24 percentage point and reduces the transmission delay of V2V links by 1.93~15.04 percentage point . These results confirm its effectiveness in optimizing channel selection and power levels for improved communication quality.

       

    /

    返回文章
    返回