面向多用户动态频谱接入的改进双深度Q网络方法研究

doi:10.12052/gdutxb.220159

广东工业大学学报 ›› 2023, Vol. 40 ›› Issue (04): 85-93.doi: 10.12052/gdutxb.220159

面向多用户动态频谱接入的改进双深度Q网络方法研究

何一汕, 王永华, 万频, 王磊, 伍文韬

广东工业大学自动化学院, 广东广州 510006

收稿日期:2022-10-19 出版日期:2023-07-25 发布日期:2023-08-02
通信作者: 王永华(1979–)，男，副教授，博士，主要研究方向为认知无线网络、机器学习，E-mail：wangyonghua@gdut.edu.cn
作者简介:何一汕(1998–)，男，硕士研究生，主要研究方向为认知无线网络和深度强化学习
基金资助:
国家自然科学基金资助项目(61971147)

An Improved Double Deep Q Network for Multi-user Dynamic Spectrum Access

He Yi-shan, Wang Yong-hua, Wan Pin, Wang Lei, Wu Wen-tao

School of Automation, Guangdong University of Technology, Guangzhou 510006, China

Received:2022-10-19 Online:2023-07-25 Published:2023-08-02

摘要/Abstract

摘要： 随着移动通信技术的飞速发展，有限的频谱利用资源与大量频谱通信需求之间的矛盾也日益加剧，需要新的智能方法来提高频谱利用率。本文提出了一种基于分布式优先经验池结合双深度Q网络的多用户动态频谱接入方法。通过该方法，次用户可以在动态变化的认知无线网络环境下根据自己感知信息来不断地学习，选择空闲信道完成频谱接入任务来提高频谱利用率。该方法采用分布式强化学习框架，将每个次用户视为一个智能体，各个智能体采用标准单智能体强化学习方法进行学习以降低底层计算开销。另外，该方法在神经网络训练的基础上加入优先级采样，优化了神经网络的训练效率以帮助次用户选择出最优策略。仿真实验结果表明该方法能提高接入信道时的成功率、降低碰撞率和提升通信速率。

关键词: 动态频谱接入, 分布式强化学习, 优先经验池, 深度强化学习

Abstract: With the rapid development of mobile communication technology, the contradiction between the limited spectrum utilization resources and the demand of a lot of spectrum communication is increasingly aggravated. New intelligent methods are needed to improve the utilization rate of spectrum. A multi-user dynamic spectrum access method based on distributed priority experience pool and double deep Q network is proposed. This method can help the secondary users to continuously learn according to their perceived environment information in the dynamic environment, and choose the idle channel to complete the spectrum access task for improving the spectrum utilization rate. In this method, a distributed reinforcement learning framework is adopted, and each secondary user is regarded as an agent. Each agent learns by using standard single-agent reinforcement learning method to reduce the underlying computing overhead. In addition, the method adds priority sampling on the basis of neural network training, and then optimizes the training efficiency of neural network to help sub-users choose the optimal strategy. The simulation results show that this method can improve the success rate, reduce the collision rate and improve the communication rate.

Key words: dynamic spectrum access, distributed reinforcement learning, prioritized experience pool, deep reinforcement learning

中图分类号:

TN929.5

何一汕, 王永华, 万频, 王磊, 伍文韬. 面向多用户动态频谱接入的改进双深度Q网络方法研究[J]. 广东工业大学学报, 2023, 40(04): 85-93.

He Yi-shan, Wang Yong-hua, Wan Pin, Wang Lei, Wu Wen-tao. An Improved Double Deep Q Network for Multi-user Dynamic Spectrum Access[J]. Journal of Guangdong University of Technology, 2023, 40(04): 85-93.

参考文献

[1] SU H, ZHANG X. Cross-layer based opportunistic MAC protocols for QoS provisionings over Cognitive radio wireless networks[J]. IEEE Journal on Selected Areas in Communications, 2008, 26(1): 118-129.
[2] WANG J, HUANG Y, JIANG H. Improved algorithm of spectrum allocation based on graph coloring model in cognitive radio[C]//2009 WRI International Conference on Communications and Mobile Computing. Kunming: IEEE, 2009: 353-357.
[3] GAO L, DUAN L, HUANG J. Two-sided matching based cooperative spectrum sharing[J]. IEEE Transactions on Mobile Computing, 2017, 16(2): 538-551.
[4] 刘新浩, 马昕睿, 王大为. 基于图论模型的认知无线电频谱分配仿真建模研究[J]. 电脑与电信, 2021(3): 16-20.LIU X H, MA X R, WANG D W. Simulation modeling of cognitive radio spectrum allocation based on graph theory model[J]. Computer & Telecommunication, 2021(3): 16-20.
[5] LIU X, SUN C, ZHOU M, et al. Reinforcement learning based dynamic spectrum access in cognitive internet of vehicles[J]. China Communications, 2021, 18(7): 58-68.
[6] 郑思远, 崔苗, 张广驰. 基于强化学习的无人机安全通信轨迹在线优化策略[J]. 广东工业大学报, 2021, 38(4): 59-64.ZHENG S Y, CUI M, ZHANG G C. Reinforcement learning-based online trajectory optimization for secure UAV communications[J]. Journal of Guangdong University of Technology, 2021, 38(4): 59-64.
[7] XU F, YANG F, BAO S, et al. DQN inspired joint computing and caching resource allocation approach for software defined information-centric Internet of Things network[J]. IEEE Access, 2019, 7: 61987-61996.
[8] CHEN Y, LI Y, XU D, et al. DQN-based power control for IoT transmission against jamming[C]//2018 IEEE 87th Vehicular Technology Conference (VTC Spring). Portugal: IEEE, 2018: 1-5.
[9] SUN Y, PENG M, POOR H V. A distributed approach to improving spectral efficiency in uplink device-to-device-enabled cloud radio access networks[J]. IEEE Transactions on Communications, 2018, 66(12): 6511-6526.
[10] KAI C H, MENG X W, MEI L S, et al. Deep reinforcement learning based user association and resource allocation for d2d-enabled wireless networks[C]//2021 IEEE/CIC International Conference on Communications in China (ICCC). Xiamen: IEEE, 2021: 1172-1177.
[11] ASMUTH J, LI L, LITTMAN M L, et al. A bayesian sampling approach to exploration in reinforcement learning[J]. Eprint Arxiv, 2009, 58(7): 1805-1810.
[12] ZHANG R B, ZHONG Y, GU G C. A new accelerating algorithm for multi-agent reinforcement learning[J]. Journal of Harbin Institute of Technology, 2005, 12(1): 48-51.
[13] VIRIYASITAVAT W, BOBAN M, TSAI H M, et al. Vehicular communications: survey and challenges of channel and propagation models[J]. IEEE Vehicular Technology Magazine, 2015, 10(2): 55-66.
[14] MEINIL J, KYSTI P, JMS T, et al. WINNER II channel models[M]. New Jersey: John Wiley & Sons, Ltd, 2009.
[15] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 2018.
[16] 傅波. 基于交替跟踪的分布式多智能体合作学习算法研究[D]. 长沙: 中南大学, 2014.
[17] 郭瑝清, 陈锋. 干线动态协调控制的深度Q网络方法[J]. 信息技术与网络安全, 2020, 39(6): 1-6.GUO H Q, CHEN F. A deep Q network method for dynamic arterial coordinated control[J]. Cyber Security and Data Governance, 2020, 39(6): 1-6.

Metrics

Viewed

Full text

1019

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	1019

From	Others	local

Times	152	867
Rate	15%	85%

Abstract

428

Just accepted	Online first	Issue

0	0	428

From	Others	local

Times	1	427
Rate	0%	100%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed

面向多用户动态频谱接入的改进双深度Q网络方法研究

An Improved Double Deep Q Network for Multi-user Dynamic Spectrum Access

HTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 4

Metrics

本文评价

推荐阅读 0

[1]	苏天赐, 何梓楠, 崔苗, 张广驰. 多无人机辅助数据收集系统的智能路径规划算法[J]. 广东工业大学学报, 2023, 40(04): 77-84.
[2]	郭心德, 丁宏强. 离散制造智能工厂场景的AGV路径规划方法[J]. 广东工业大学学报, 2021, 38(06): 70-76.
[3]	叶伟杰, 高军礼, 蒋丰, 郭靖. 一种提升机器人强化学习开发效率的训练模式研究[J]. 广东工业大学学报, 2020, 37(05): 46-50.
[4]	吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2019, 36(01): 42-50.