广东工业大学学报 ›› 2023, Vol. 40 ›› Issue (04): 77-84.doi: 10.12052/gdutxb.220090
苏天赐, 何梓楠, 崔苗, 张广驰
Su Tian-ci, He Zi-nan, Cui Miao, Zhang Guang-chi
摘要: 无人机具有高度灵活和小巧轻便等优点,已被广泛应用于无线传感器网络的数据收集。本文考虑一个用户随机分布且处于移动状态的无线传感器网络,研究如何规划多个无人机的飞行路径以有效收集网络用户的数据。通过优化多架无人机的飞行路径,使无人机在用户位置无法预测的动态环境中实现数据收集平均吞吐量最大化,同时系统受限于无人机最短飞行时间与范围约束、无人机起点与终点约束、通信距离约束、用户通信约束和无人机防碰撞约束。使用已有优化决策方法求解该问题的计算复杂度较高,同时难以求得全局最优解。针对这一情况,本文提出一种基于Dueling Double Deep Q-network(Dueling-DDQN) 的深度强化学习算法。该算法采用Dueling架构,增强算法的学习能力,提高训练过程的鲁棒性和收敛速度,同时结合了Double DQN (DDQN) 算法的优势,能有效避免因过大估计$ Q $值而导致获取次优无人机轨迹策略。仿真结果表明,此算法可以高效优化无人机的飞行路径,与已有的基准算法相比,所提算法具有更佳的收敛性和鲁棒性。
中图分类号:
[1] ZHAO N, LU W D, SHENG M, et al. UAV-assisted emergency networks in disasters[J]. IEEE Wireless Communications, 2019, 26(1): 45-51. [2] ZENG Y, ZHANG R, LIM T J. Wireless communications with unmanned aerial vehicles: opportunities and challenges[J]. IEEE Communications Magazine, 2016, 54(5): 36-42. [3] GAO M, XU X, KLINGER Y, et al. High-resolution mapping based on an unmanned aerial vehicle (UAV) to capture paleoseismic offsets along the Altyn-Tagh fault, China[J]. Sci Rep, 2017, 7(1): 1-11. [4] ZHONG C, GURSOY M C, VELIPASALAR S. Deep reinforcement learning-based edge caching in wireless networks[J]. IEEE Transactions on Cognitive Communications and Networking, 2020, 6(1): 48-61. [5] GONG J, CHANG T, SHEN C, et al. Flight time minimization of UAV for data collection over wireless sensor networks[J]. IEEE Journal on Selected Areas in Communications, 2018, 36(9): 1942-1954. [6] WU H, WEI Z, HOU Y, et al. Cell-edge user offloading via flying UAV in non-uniform heterogeneous cellular networks[J]. IEEE Transactions on Wireless Communications, 2020, 19(4): 2411-2426. [7] HUANG H, YANG Y, WANG H, et al. Deep reinforcement learning for UAV navigation through massive MIMO technique[J]. IEEE Transactions on Vehicular Technology, 2020, 69(1): 1117-1121. [8] MOZAFFARI F, SAAD W, BENNIS M, et al. Unmanned aerial vehicle with underlaid device-to-device communications: performance and tradeoffs[J]. IEEE Transactions on Wireless Communications, 2016, 15(6): 3949-3963. [9] DUONG T Q, NGUYEN L D, TUAN H D, et al. Learning-aided realtime performance optimisation of cognitive UAV-assisted disaster communication[C]//2019 IEEE Global Communications Conference (GLOBECOM). Waikoloa: IEEE, 2019: 1-6. [10] DUONG T Q, NGUYEN L D, NGUYEN L K, et al. Practical optimization of path planning and completion time of data collection for UAV-enabled disaster communications[C]// 201915th International Wireless Communications & Mobile Computing Conference (IWCMC). Tangier: IEEE, 2019: 372-377. [11] WANG K, TANG, LIU P, et al. UAV-based and energy-constrained data collection system with trajectory, time, and collection scheduling optimization[C]// International Conference on Communications in China (ICCC). Xiamen: IEEE, 2021: 893-898. [12] ZHAN C, ZENG Y, ZHANG R. Energy-efficient data collection in UAV enabled wireless sensor network[J]. IEEE Wireless Communications Letters, 2018, 7(3): 328-331. [13] YOU C, ZHANG R. 3D trajectory optimization in Rician fading for UAV-enabled data harvesting[J]. IEEE Transactions on Wireless Communications, 2019, 18(6): 3192-3207. [14] BAYERLEIN H, DE KERRET P, GESBERT D. Trajectory optimization for autonomous flying base station via reinforcement learning[C]// 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). Kalamata: IEEE, 2018: 1-5. [15] ZHANG B, LIU C H, TANG J, et al. Learning-based energy-efficient data collection by unmanned vehicles in smart cities[J]. IEEE Transactions on Industrial Informatics, 2018, 14(4): 1666-1676. [16] BAYERLEIN H, THEILE M, CACCAMO, et al. Multi-UAV path planning for wireless data harvesting with deep reinforcement learning[J]. IEEE Open Journal of the Communications Society, 2021, 2: 1171-1187. [17] XU S, ZHANG X, LI C, et al. Deep reinforcement learning approach for joint trajectory design in multi-UAV IoT networks[J]. IEEE Transactions on Vehicular Technology, 2022, 71(3): 3389-3394. [18] MA J, ZHANG Y, ZHANG J, et al. Solution to traveling agent problem based on improved ant colony algorithm[C]// 2008 ISECS International Colloquium on Computing, Communication, Control, and Management. Guangzhou: IEEE, 2008: 57-60. [19] HUANG Z, LIN H, ZHANG G. The USV path planning based on an improved DQN algorithm[C]// 2021 International Conference on Networking, Communications and Information Technology (NetCIT). Manchester: IEEE, 2021: 162-166. [20] XU W, CHEN L, YANG H. A comprehensive discussion on deep reinforcement earning[C]// 2021 International Conference on Communications, Information System and Computer Engineering (CISCE). Beijing: IEEE, 2021: 697-702. [21] TEJA K V S S R, LEE M. Efficient practice for deep reinforcement learning[C]// 2019 IEEE Symposium Series on Computational Intelligence (SSCI). Xiamen: IEEE, 2019: 77-84. |
[1] | 何一汕, 王永华, 万频, 王磊, 伍文韬. 面向多用户动态频谱接入的改进双深度Q网络方法研究[J]. 广东工业大学学报, 2023, 40(04): 85-93. |
[2] | 吴庆捷, 崔苗, 张广驰, 陈伟. 无人机信息采集系统的端到端吞吐量最大化研究[J]. 广东工业大学学报, 2022, 39(06): 53-61. |
[3] | 郭心德, 丁宏强. 离散制造智能工厂场景的AGV路径规划方法[J]. 广东工业大学学报, 2021, 38(06): 70-76. |
[4] | 叶伟杰, 高军礼, 蒋丰, 郭靖. 一种提升机器人强化学习开发效率的训练模式研究[J]. 广东工业大学学报, 2020, 37(05): 46-50. |
[5] | 吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2019, 36(01): 42-50. |
|