广东工业大学学报 ›› 2023, Vol. 40 ›› Issue (04): 85-93.doi: 10.12052/gdutxb.220159
何一汕, 王永华, 万频, 王磊, 伍文韬
He Yi-shan, Wang Yong-hua, Wan Pin, Wang Lei, Wu Wen-tao
摘要: 随着移动通信技术的飞速发展,有限的频谱利用资源与大量频谱通信需求之间的矛盾也日益加剧,需要新的智能方法来提高频谱利用率。本文提出了一种基于分布式优先经验池结合双深度Q网络的多用户动态频谱接入方法。通过该方法,次用户可以在动态变化的认知无线网络环境下根据自己感知信息来不断地学习,选择空闲信道完成频谱接入任务来提高频谱利用率。该方法采用分布式强化学习框架,将每个次用户视为一个智能体,各个智能体采用标准单智能体强化学习方法进行学习以降低底层计算开销。另外,该方法在神经网络训练的基础上加入优先级采样,优化了神经网络的训练效率以帮助次用户选择出最优策略。仿真实验结果表明该方法能提高接入信道时的成功率、降低碰撞率和提升通信速率。
中图分类号:
[1] SU H, ZHANG X. Cross-layer based opportunistic MAC protocols for QoS provisionings over Cognitive radio wireless networks[J]. IEEE Journal on Selected Areas in Communications, 2008, 26(1): 118-129. [2] WANG J, HUANG Y, JIANG H. Improved algorithm of spectrum allocation based on graph coloring model in cognitive radio[C]//2009 WRI International Conference on Communications and Mobile Computing. Kunming: IEEE, 2009: 353-357. [3] GAO L, DUAN L, HUANG J. Two-sided matching based cooperative spectrum sharing[J]. IEEE Transactions on Mobile Computing, 2017, 16(2): 538-551. [4] 刘新浩, 马昕睿, 王大为. 基于图论模型的认知无线电频谱分配仿真建模研究[J]. 电脑与电信, 2021(3): 16-20.LIU X H, MA X R, WANG D W. Simulation modeling of cognitive radio spectrum allocation based on graph theory model[J]. Computer & Telecommunication, 2021(3): 16-20. [5] LIU X, SUN C, ZHOU M, et al. Reinforcement learning based dynamic spectrum access in cognitive internet of vehicles[J]. China Communications, 2021, 18(7): 58-68. [6] 郑思远, 崔苗, 张广驰. 基于强化学习的无人机安全通信轨迹在线优化策略[J]. 广东工业大学报, 2021, 38(4): 59-64.ZHENG S Y, CUI M, ZHANG G C. Reinforcement learning-based online trajectory optimization for secure UAV communications[J]. Journal of Guangdong University of Technology, 2021, 38(4): 59-64. [7] XU F, YANG F, BAO S, et al. DQN inspired joint computing and caching resource allocation approach for software defined information-centric Internet of Things network[J]. IEEE Access, 2019, 7: 61987-61996. [8] CHEN Y, LI Y, XU D, et al. DQN-based power control for IoT transmission against jamming[C]//2018 IEEE 87th Vehicular Technology Conference (VTC Spring). Portugal: IEEE, 2018: 1-5. [9] SUN Y, PENG M, POOR H V. A distributed approach to improving spectral efficiency in uplink device-to-device-enabled cloud radio access networks[J]. IEEE Transactions on Communications, 2018, 66(12): 6511-6526. [10] KAI C H, MENG X W, MEI L S, et al. Deep reinforcement learning based user association and resource allocation for d2d-enabled wireless networks[C]//2021 IEEE/CIC International Conference on Communications in China (ICCC). Xiamen: IEEE, 2021: 1172-1177. [11] ASMUTH J, LI L, LITTMAN M L, et al. A bayesian sampling approach to exploration in reinforcement learning[J]. Eprint Arxiv, 2009, 58(7): 1805-1810. [12] ZHANG R B, ZHONG Y, GU G C. A new accelerating algorithm for multi-agent reinforcement learning[J]. Journal of Harbin Institute of Technology, 2005, 12(1): 48-51. [13] VIRIYASITAVAT W, BOBAN M, TSAI H M, et al. Vehicular communications: survey and challenges of channel and propagation models[J]. IEEE Vehicular Technology Magazine, 2015, 10(2): 55-66. [14] MEINIL J, KYSTI P, JMS T, et al. WINNER II channel models[M]. New Jersey: John Wiley & Sons, Ltd, 2009. [15] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 2018. [16] 傅波. 基于交替跟踪的分布式多智能体合作学习算法研究[D]. 长沙: 中南大学, 2014. [17] 郭瑝清, 陈锋. 干线动态协调控制的深度Q网络方法[J]. 信息技术与网络安全, 2020, 39(6): 1-6.GUO H Q, CHEN F. A deep Q network method for dynamic arterial coordinated control[J]. Cyber Security and Data Governance, 2020, 39(6): 1-6. |
[1] | 苏天赐, 何梓楠, 崔苗, 张广驰. 多无人机辅助数据收集系统的智能路径规划算法[J]. 广东工业大学学报, 2023, 40(04): 77-84. |
[2] | 郭心德, 丁宏强. 离散制造智能工厂场景的AGV路径规划方法[J]. 广东工业大学学报, 2021, 38(06): 70-76. |
[3] | 叶伟杰, 高军礼, 蒋丰, 郭靖. 一种提升机器人强化学习开发效率的训练模式研究[J]. 广东工业大学学报, 2020, 37(05): 46-50. |
[4] | 吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2019, 36(01): 42-50. |
|