广东工业大学学报 ›› 2023, Vol. 40 ›› Issue (04): 85-93.doi: 10.12052/gdutxb.220159

• 计算机科学与技术 • 上一篇    下一篇

面向多用户动态频谱接入的改进双深度Q网络方法研究

何一汕, 王永华, 万频, 王磊, 伍文韬   

  1. 广东工业大学 自动化学院, 广东 广州 510006
  • 收稿日期:2022-10-19 出版日期:2023-07-25 发布日期:2023-08-02
  • 通信作者: 王永华(1979–),男,副教授,博士,主要研究方向为认知无线网络、机器学习,E-mail:wangyonghua@gdut.edu.cn
  • 作者简介:何一汕(1998–),男,硕士研究生,主要研究方向为认知无线网络和深度强化学习
  • 基金资助:
    国家自然科学基金资助项目(61971147)

An Improved Double Deep Q Network for Multi-user Dynamic Spectrum Access

He Yi-shan, Wang Yong-hua, Wan Pin, Wang Lei, Wu Wen-tao   

  1. School of Automation, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2022-10-19 Online:2023-07-25 Published:2023-08-02

摘要: 随着移动通信技术的飞速发展,有限的频谱利用资源与大量频谱通信需求之间的矛盾也日益加剧,需要新的智能方法来提高频谱利用率。本文提出了一种基于分布式优先经验池结合双深度Q网络的多用户动态频谱接入方法。通过该方法,次用户可以在动态变化的认知无线网络环境下根据自己感知信息来不断地学习,选择空闲信道完成频谱接入任务来提高频谱利用率。该方法采用分布式强化学习框架,将每个次用户视为一个智能体,各个智能体采用标准单智能体强化学习方法进行学习以降低底层计算开销。另外,该方法在神经网络训练的基础上加入优先级采样,优化了神经网络的训练效率以帮助次用户选择出最优策略。仿真实验结果表明该方法能提高接入信道时的成功率、降低碰撞率和提升通信速率。

关键词: 动态频谱接入, 分布式强化学习, 优先经验池, 深度强化学习

Abstract: With the rapid development of mobile communication technology, the contradiction between the limited spectrum utilization resources and the demand of a lot of spectrum communication is increasingly aggravated. New intelligent methods are needed to improve the utilization rate of spectrum. A multi-user dynamic spectrum access method based on distributed priority experience pool and double deep Q network is proposed. This method can help the secondary users to continuously learn according to their perceived environment information in the dynamic environment, and choose the idle channel to complete the spectrum access task for improving the spectrum utilization rate. In this method, a distributed reinforcement learning framework is adopted, and each secondary user is regarded as an agent. Each agent learns by using standard single-agent reinforcement learning method to reduce the underlying computing overhead. In addition, the method adds priority sampling on the basis of neural network training, and then optimizes the training efficiency of neural network to help sub-users choose the optimal strategy. The simulation results show that this method can improve the success rate, reduce the collision rate and improve the communication rate.

Key words: dynamic spectrum access, distributed reinforcement learning, prioritized experience pool, deep reinforcement learning

中图分类号: 

  • TN929.5
[1] SU H, ZHANG X. Cross-layer based opportunistic MAC protocols for QoS provisionings over Cognitive radio wireless networks[J]. IEEE Journal on Selected Areas in Communications, 2008, 26(1): 118-129.
[2] WANG J, HUANG Y, JIANG H. Improved algorithm of spectrum allocation based on graph coloring model in cognitive radio[C]//2009 WRI International Conference on Communications and Mobile Computing. Kunming: IEEE, 2009: 353-357.
[3] GAO L, DUAN L, HUANG J. Two-sided matching based cooperative spectrum sharing[J]. IEEE Transactions on Mobile Computing, 2017, 16(2): 538-551.
[4] 刘新浩, 马昕睿, 王大为. 基于图论模型的认知无线电频谱分配仿真建模研究[J]. 电脑与电信, 2021(3): 16-20.LIU X H, MA X R, WANG D W. Simulation modeling of cognitive radio spectrum allocation based on graph theory model[J]. Computer & Telecommunication, 2021(3): 16-20.
[5] LIU X, SUN C, ZHOU M, et al. Reinforcement learning based dynamic spectrum access in cognitive internet of vehicles[J]. China Communications, 2021, 18(7): 58-68.
[6] 郑思远, 崔苗, 张广驰. 基于强化学习的无人机安全通信轨迹在线优化策略[J]. 广东工业大学报, 2021, 38(4): 59-64.ZHENG S Y, CUI M, ZHANG G C. Reinforcement learning-based online trajectory optimization for secure UAV communications[J]. Journal of Guangdong University of Technology, 2021, 38(4): 59-64.
[7] XU F, YANG F, BAO S, et al. DQN inspired joint computing and caching resource allocation approach for software defined information-centric Internet of Things network[J]. IEEE Access, 2019, 7: 61987-61996.
[8] CHEN Y, LI Y, XU D, et al. DQN-based power control for IoT transmission against jamming[C]//2018 IEEE 87th Vehicular Technology Conference (VTC Spring). Portugal: IEEE, 2018: 1-5.
[9] SUN Y, PENG M, POOR H V. A distributed approach to improving spectral efficiency in uplink device-to-device-enabled cloud radio access networks[J]. IEEE Transactions on Communications, 2018, 66(12): 6511-6526.
[10] KAI C H, MENG X W, MEI L S, et al. Deep reinforcement learning based user association and resource allocation for d2d-enabled wireless networks[C]//2021 IEEE/CIC International Conference on Communications in China (ICCC). Xiamen: IEEE, 2021: 1172-1177.
[11] ASMUTH J, LI L, LITTMAN M L, et al. A bayesian sampling approach to exploration in reinforcement learning[J]. Eprint Arxiv, 2009, 58(7): 1805-1810.
[12] ZHANG R B, ZHONG Y, GU G C. A new accelerating algorithm for multi-agent reinforcement learning[J]. Journal of Harbin Institute of Technology, 2005, 12(1): 48-51.
[13] VIRIYASITAVAT W, BOBAN M, TSAI H M, et al. Vehicular communications: survey and challenges of channel and propagation models[J]. IEEE Vehicular Technology Magazine, 2015, 10(2): 55-66.
[14] MEINIL J, KYSTI P, JMS T, et al. WINNER II channel models[M]. New Jersey: John Wiley & Sons, Ltd, 2009.
[15] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 2018.
[16] 傅波. 基于交替跟踪的分布式多智能体合作学习算法研究[D]. 长沙: 中南大学, 2014.
[17] 郭瑝清, 陈锋. 干线动态协调控制的深度Q网络方法[J]. 信息技术与网络安全, 2020, 39(6): 1-6.GUO H Q, CHEN F. A deep Q network method for dynamic arterial coordinated control[J]. Cyber Security and Data Governance, 2020, 39(6): 1-6.
[1] 苏天赐, 何梓楠, 崔苗, 张广驰. 多无人机辅助数据收集系统的智能路径规划算法[J]. 广东工业大学学报, 2023, 40(04): 77-84.
[2] 郭心德, 丁宏强. 离散制造智能工厂场景的AGV路径规划方法[J]. 广东工业大学学报, 2021, 38(06): 70-76.
[3] 叶伟杰, 高军礼, 蒋丰, 郭靖. 一种提升机器人强化学习开发效率的训练模式研究[J]. 广东工业大学学报, 2020, 37(05): 46-50.
[4] 吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2019, 36(01): 42-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!