Abstract:
A dynamic spectrum allocation scheme is proposed to address the growing number of vehicles and limited spectrum resources in vehicular networks. It integrates the self-attention mechanism and monotonic value function factorisation into deep multi-agent reinforcement learning to optimize channel selection and power levels for vehicle-to-vehicle (V2V) links. The global objective is to maximize the sum throughput of vehicle-to-infrastructure (V2I) links, while meeting the latency and reliability constraints of V2V links. To handle incomplete real-time channel state information due to dynamic environment, a deep recurrent Q-network is deployed for each V2V link, enabling autonomous decision-making based on local observations. To align the local strategy optimization of each V2V link with the global objective, a global mixing network with monotonicity constraints is designed to guide the algorithm training. Additionally, an information interaction model based on the self-attention mechanism further optimizes collaboration between V2V links. Compared with the baselines, the proposed algorithm increases the sum throughput of V2I links by 1.44~8.24 percentage point and reduces the transmission delay of V2V links by 1.93~15.04 percentage point . These results confirm its effectiveness in optimizing channel selection and power levels for improved communication quality.