广东工业大学学报 ›› 2023, Vol. 40 ›› Issue (04): 77-84.doi: 10.12052/gdutxb.220090

• 计算机科学与技术 • 上一篇    下一篇

多无人机辅助数据收集系统的智能路径规划算法

苏天赐, 何梓楠, 崔苗, 张广驰   

  1. 广东工业大学 信息工程学院, 广东 广州 510006
  • 收稿日期:2022-05-22 出版日期:2023-07-25 发布日期:2023-08-02
  • 通信作者: 张广驰(1982–),男,教授,主要研究方向为新一代无线通信技术,E-mail: gczhang@gdut.edu.cn
  • 作者简介:苏天赐(1998–),男,硕士研究生,主要研究方向为无人机辅助无线通信和深度强化学习等
  • 基金资助:
    广东省科技计划项目(2022A0505050023,2022A0505020008);广东省海洋经济发展项目(粤自然资合[2023]24号);广东省特支计划项目(2019TQ05X409);江西省军民融合北斗通航重点实验室开放基金项目(2022JXRH0004)

Intelligent Path Planning Algorithm for Multi-UAV-assisted Data Collection Systems

Su Tian-ci, He Zi-nan, Cui Miao, Zhang Guang-chi   

  1. School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2022-05-22 Online:2023-07-25 Published:2023-08-02

摘要: 无人机具有高度灵活和小巧轻便等优点,已被广泛应用于无线传感器网络的数据收集。本文考虑一个用户随机分布且处于移动状态的无线传感器网络,研究如何规划多个无人机的飞行路径以有效收集网络用户的数据。通过优化多架无人机的飞行路径,使无人机在用户位置无法预测的动态环境中实现数据收集平均吞吐量最大化,同时系统受限于无人机最短飞行时间与范围约束、无人机起点与终点约束、通信距离约束、用户通信约束和无人机防碰撞约束。使用已有优化决策方法求解该问题的计算复杂度较高,同时难以求得全局最优解。针对这一情况,本文提出一种基于Dueling Double Deep Q-network(Dueling-DDQN) 的深度强化学习算法。该算法采用Dueling架构,增强算法的学习能力,提高训练过程的鲁棒性和收敛速度,同时结合了Double DQN (DDQN) 算法的优势,能有效避免因过大估计$ Q $值而导致获取次优无人机轨迹策略。仿真结果表明,此算法可以高效优化无人机的飞行路径,与已有的基准算法相比,所提算法具有更佳的收敛性和鲁棒性。

关键词: 无人机通信, 数据收集, 路径规划, 深度强化学习

Abstract: With the advantages of high flexibility and lightweight, unmanned aerial vehicles (UAVs) have been widely used in data collection of wireless sensor networks. For a multi-UAV-assisted wireless sensor network with randomly distributed and moved users, how to plan the flight paths of the UAVs to effectively collect data from the users remains a challenging problem. This paper aims to maximize the average throughput of data collection in a dynamic environment where the user's location cannot be predicted by optimizing the flight path of the UAVs, which is subject to the shortest flight time and range constraints of UAVs, the constraints of UAV start and end points, the communication distance constraints, the user communication constraints, and the UAV collision avoidance constraints. The resultant problem can be solved by using existing optimization methods with high complexity, which however is difficult to obtain the globally optimal solution. To address this problem efficiently, this paper proposes a deep reinforcement learning algorithm based on Dueling Double DQN (Dueling-DDQN). The proposed algorithm adopts the Dueling network architecture, which enhances the learning ability of the algorithm and improves the robustness and convergence speed of tracked in suboptimal solutions due to the over-estimation on the $ Q $ value. Simulation results show that the proposed algorithm can efficiently obtain the flight paths of multiple UAVs under all constraints. In particular, our proposed algorithm has encouraging convergence and stability performance in comparison with the existing benchmark algorithms.

Key words: UAV communication, data collection, path planning, deep reinforcement learning

中图分类号: 

  • TN929.5
[1] ZHAO N, LU W D, SHENG M, et al. UAV-assisted emergency networks in disasters[J]. IEEE Wireless Communications, 2019, 26(1): 45-51.
[2] ZENG Y, ZHANG R, LIM T J. Wireless communications with unmanned aerial vehicles: opportunities and challenges[J]. IEEE Communications Magazine, 2016, 54(5): 36-42.
[3] GAO M, XU X, KLINGER Y, et al. High-resolution mapping based on an unmanned aerial vehicle (UAV) to capture paleoseismic offsets along the Altyn-Tagh fault, China[J]. Sci Rep, 2017, 7(1): 1-11.
[4] ZHONG C, GURSOY M C, VELIPASALAR S. Deep reinforcement learning-based edge caching in wireless networks[J]. IEEE Transactions on Cognitive Communications and Networking, 2020, 6(1): 48-61.
[5] GONG J, CHANG T, SHEN C, et al. Flight time minimization of UAV for data collection over wireless sensor networks[J]. IEEE Journal on Selected Areas in Communications, 2018, 36(9): 1942-1954.
[6] WU H, WEI Z, HOU Y, et al. Cell-edge user offloading via flying UAV in non-uniform heterogeneous cellular networks[J]. IEEE Transactions on Wireless Communications, 2020, 19(4): 2411-2426.
[7] HUANG H, YANG Y, WANG H, et al. Deep reinforcement learning for UAV navigation through massive MIMO technique[J]. IEEE Transactions on Vehicular Technology, 2020, 69(1): 1117-1121.
[8] MOZAFFARI F, SAAD W, BENNIS M, et al. Unmanned aerial vehicle with underlaid device-to-device communications: performance and tradeoffs[J]. IEEE Transactions on Wireless Communications, 2016, 15(6): 3949-3963.
[9] DUONG T Q, NGUYEN L D, TUAN H D, et al. Learning-aided realtime performance optimisation of cognitive UAV-assisted disaster communication[C]//2019 IEEE Global Communications Conference (GLOBECOM). Waikoloa: IEEE, 2019: 1-6.
[10] DUONG T Q, NGUYEN L D, NGUYEN L K, et al. Practical optimization of path planning and completion time of data collection for UAV-enabled disaster communications[C]// 201915th International Wireless Communications & Mobile Computing Conference (IWCMC). Tangier: IEEE, 2019: 372-377.
[11] WANG K, TANG, LIU P, et al. UAV-based and energy-constrained data collection system with trajectory, time, and collection scheduling optimization[C]// International Conference on Communications in China (ICCC). Xiamen: IEEE, 2021: 893-898.
[12] ZHAN C, ZENG Y, ZHANG R. Energy-efficient data collection in UAV enabled wireless sensor network[J]. IEEE Wireless Communications Letters, 2018, 7(3): 328-331.
[13] YOU C, ZHANG R. 3D trajectory optimization in Rician fading for UAV-enabled data harvesting[J]. IEEE Transactions on Wireless Communications, 2019, 18(6): 3192-3207.
[14] BAYERLEIN H, DE KERRET P, GESBERT D. Trajectory optimization for autonomous flying base station via reinforcement learning[C]// 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). Kalamata: IEEE, 2018: 1-5.
[15] ZHANG B, LIU C H, TANG J, et al. Learning-based energy-efficient data collection by unmanned vehicles in smart cities[J]. IEEE Transactions on Industrial Informatics, 2018, 14(4): 1666-1676.
[16] BAYERLEIN H, THEILE M, CACCAMO, et al. Multi-UAV path planning for wireless data harvesting with deep reinforcement learning[J]. IEEE Open Journal of the Communications Society, 2021, 2: 1171-1187.
[17] XU S, ZHANG X, LI C, et al. Deep reinforcement learning approach for joint trajectory design in multi-UAV IoT networks[J]. IEEE Transactions on Vehicular Technology, 2022, 71(3): 3389-3394.
[18] MA J, ZHANG Y, ZHANG J, et al. Solution to traveling agent problem based on improved ant colony algorithm[C]// 2008 ISECS International Colloquium on Computing, Communication, Control, and Management. Guangzhou: IEEE, 2008: 57-60.
[19] HUANG Z, LIN H, ZHANG G. The USV path planning based on an improved DQN algorithm[C]// 2021 International Conference on Networking, Communications and Information Technology (NetCIT). Manchester: IEEE, 2021: 162-166.
[20] XU W, CHEN L, YANG H. A comprehensive discussion on deep reinforcement earning[C]// 2021 International Conference on Communications, Information System and Computer Engineering (CISCE). Beijing: IEEE, 2021: 697-702.
[21] TEJA K V S S R, LEE M. Efficient practice for deep reinforcement learning[C]// 2019 IEEE Symposium Series on Computational Intelligence (SSCI). Xiamen: IEEE, 2019: 77-84.
[1] 何一汕, 王永华, 万频, 王磊, 伍文韬. 面向多用户动态频谱接入的改进双深度Q网络方法研究[J]. 广东工业大学学报, 2023, 40(04): 85-93.
[2] 吴庆捷, 崔苗, 张广驰, 陈伟. 无人机信息采集系统的端到端吞吐量最大化研究[J]. 广东工业大学学报, 2022, 39(06): 53-61.
[3] 郭心德, 丁宏强. 离散制造智能工厂场景的AGV路径规划方法[J]. 广东工业大学学报, 2021, 38(06): 70-76.
[4] 叶伟杰, 高军礼, 蒋丰, 郭靖. 一种提升机器人强化学习开发效率的训练模式研究[J]. 广东工业大学学报, 2020, 37(05): 46-50.
[5] 吴运雄, 曾碧. 基于深度强化学习的移动机器人轨迹跟踪和动态避障[J]. 广东工业大学学报, 2019, 36(01): 42-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!