广东工业大学学报

• •    

混合交通中高速公路入口匝道合并协同驾驶决策研究

谢光强, 宁凯鑫, 李杨   

  1. 广东工业大学 计算机学院,广东 广州 510006
  • 收稿日期:2024-03-05 出版日期:2024-09-27 发布日期:2024-09-27
  • 通信作者: 李杨(1980–),女,教授,博士,主要研究方向为多智能体、差分隐私保护,E-mail:liyang@gdut.edu.cn
  • 作者简介:谢光强(1979–),男,教授,博士,主要研究方向为多智能体、智能体控制,E-mail:xiegq@gdut.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62006047,618760439);广东省重点研发项目(2021B0101220004)

Research on Cooperative Driving Decisions for Highway On-ramp Merging in Mixed Traffic

Xie Guang-qiang, Ning Kai-xin, Li Yang   

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2024-03-05 Online:2024-09-27 Published:2024-09-27

摘要: 在智能网联汽车(Connected Autonomous Vehicle, CAV)与人类驾驶车辆(Human Driving Vehicle, HDV)共存的混合交通中,高速公路入口匝道合并问题具有挑战性。涉及不同类型车辆的道路争夺问题会对交通流产生影响,车辆可能在道路上争夺位置,包括合并、变道等行为,导致CAV难以准确预测和适应其行为,增加了合并的风险,导致交通效率下降,并引发交通拥堵。传统的强化学习算法在复杂环境中难以有效地搜索到优秀的策略,并容易陷入局部最优解,无法有效应对复杂的交通情况,导致合并决策不够精准。针对上述问题,提出了ESACD(Evolutionary Soft Actor-critic for Discrete Action Settings)算法,通过CAV协作适应HDV的策略以最大化交通吞吐量。首先,提出了基于排名选择的父代选择与交叉互换方法,对交互种群进行建模。其次,设计了基于多种群的弹性训练种群,提高CAV应对动态变化的交通流量的适应性。最后,提出了基于适应度评估的二次考核机制。通过在两种不同的交通密度下进行仿真实验,实验结果表明,与传统的演员评论家(Soft Actor-critic, SAC)算法相比,采用该算法能够更高效地完成车联网在入口匝道合并任务,综合提升率较为显著。这验证了该算法能够提升训练效率,扩大交通吞吐量。

关键词: 深度进化强化学习, 智能网联车, 匝道合并, 混合交通

Abstract: In the mixed traffic environment where Connected Autonomous Vehicles (CAVs) and Human Driving Vehicles (HDV) coexist, the Highway On-Ramp Merging Problem presents challenges. The road contention issue involving different types of vehicles usually impacts traffic flow. Vehicles may contend for positions on the road, including merging, lane changing, and other behaviors, leading to the challenges of accurately predicting and adapting to their actions for CAVs. This increases the risk of merging, resulting in decreased traffic efficiency and traffic congestion. Traditional reinforcement learning algorithms have difficulty in effectively searching for optimal strategies in complex environments, and they are prone to getting stuck in local optima. They are unable to effectively deal with complex traffic situations, leading to imprecise merging decisions. To address these challenges, the Evolutionary Soft Actor-Critic for Discrete Action Settings (ESACD) algorithm is proposed. It maximizes the traffic throughput by adaptively coordinating CAVs to HDV strategies. Firstly, a Rank Selection-based Parent Selection and Crossover Method is introduced to model the interaction population. Secondly, a Multiple Populations with Elastic Training method is designed to enhance CAV adaptability to the changes of the dynamic traffic flow. Finally, a Fitness Evaluation-based Secondary Assessment Mechanism is proposed. Simulation experiments conducted under two different traffic densities demonstrate that the proposed algorithm more efficiently completes the merging task at highway on-ramps for connected vehicles with a significant overall improvement rate when compared with the traditional Soft Actor-Critic (SAC) algorithm. This validates the training efficiency of the proposed algorithm with expanding the traffic throughput.

Key words: deep reinforcement learning (DERL), connected autonomous vehicle (CAV), on-ramp merging, mixed traffic

中图分类号: 

  • TP391
[1] SUN J, ZHANG J, ZHANG H M. Investigation of the early-onset breakdown phenomenon at urban expressway bottlenecks in Shanghai[J]. Transportmetrica B: Transport Dynamics, 2014, 2(3): 215-228.
[2] CHEN D, SRIVASTAVA A, AHN S. Harnessing connected and automated vehicle technologies to control lane changes at freeway merge bottlenecks in mixed traffic[J]. Transportation Research Part C: Emerging Technologies, 2021, 123(1): 102950.
[3] ZHU J, EASA S, GAO K. Merging control strategies of connected and autonomous vehicles at freeway on-ramps: a comprehensive review[J]. Journal of Intelligent and Connected Vehicles, 2022, 5(2): 99-111.
[4] LIU H, ZHUANG W, YIN G, et al. Safety-critical and flexible cooperative on-ramp merging control of connected and automated vehicles in mixed traffic[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(3): 2920-2934.
[5] WANG H, WANG W, YUAN S, et al. On social interactions of merging behaviors at highway on-ramps in congested traffic[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(8): 11237-11248.
[6] CHEN D, HAJIDAVALLOO M R, LI Z, et al. Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(11): 11623-11638.
[7] ZHOU W, CHEN D, YAN J, et al. Multi-agent reinforcement learning for cooperative lane changing of connected and autonomous vehicles in mixed traffic[J]. Autonomous Intelligent Systems, 2022, 2(1): 5.
[8] VALIENTE R, TOGHI B, PEDARSANI R, et al. Robustness and adaptability of reinforcement learning-based cooperative autonomous driving in mixed-autonomy traffic[J]. IEEE Open Journal of Intelligent Transportation Systems, 2022, 3: 397-410.
[9] YUE W, LI C, WANG S, et al. Cooperative incident management in mixed traffic of CAVs and human-driven vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(11): 12462-12476.
[10] LIN Y, MCPHEE J, AZAD N L. Anti-jerk on-ramp merging using deep reinforcement learning[C]//2020 IEEE Intelligent Vehicles Symposium (IV). Las Vegas: IEEE, 2020: 7-14.
[11] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//International Conference on Machine Learning(PMLR). Beijing: PMLR 2014: 387-395.
[12] ZHAO R, SUN Z, JI A. A Deep reinforcement learning approach for automated on-ramp merging[C]//2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). Macau: IEEE, 2022: 3800-3806.
[13] BOUTON M, NAKHAEI A, FUJIMURA K, et al. Cooperation-aware reinforcement learning for merging in dense traffic[C]//2019 IEEE Intelligent Transportation Systems Conference (ITSC). Auckland: IEEE, 2019: 3441-3447.
[14] LUBARS J, GUPTA H, CHINCHALI S, et al. Combining reinforcement learning with model predictive control for on-ramp merging[C]//2021 IEEE International Intelligent Transportation Systems Conference (ITSC). Indianapolis: IEEE, 2021: 942-947.
[15] LEUNG K, SCHMERLING E, ZHANG M, et al. On infusing reachability-based safety assurance within planning frameworks for human–robot vehicle interactions[J]. The International Journal of Robotics Research, 2020, 39(10-11): 1326-1345.
[16] MAHBUB A M I, MALIKOPOULOS A A. Platoon formation in a mixed traffic environment: a model-agnostic optimal control approach[C]//2022 American Control Conference (ACC). Atlanta: IEEE, 2022: 4746-4751.
[17] WANG J, ZHENG Y, XU Q, et al. Data-driven predictive control for connected and autonomous vehicles in mixed traffic[C]//2022 American Control Conference (ACC). Atlanta: IEEE, 2022: 4739-4745.
[18] FOGEL D B. Evolutionary computation: toward a new philosophy of machine intelligence[M]. Piscataway: John Wiley & Sons, 2006.
[19] KHADKA S, TUMER K. Evolution-guided policy gradient in reinforcement learning[J]. Advances in Neural Information Processing Systems, 2018, 31(18): 1196-1208.
[20] WASTON J, PETERS J. Inferring smooth control: monte carlo posterior policy iteration with gaussian processes[C]//Conference on Robot Learning (PMLR). Atlanta: PMLR, 2023: 67-79.
[21] FUJIMOTO S, MEGER D, PRECUP D. Off-policy deep reinforcement learning without exploration[C]//International Conference on Machine Learning (PMLR). Long Beach: PMLR, 2019: 2052-2062.
[22] BAI H, CHENG R, JIN Y. Evolutionary reinforcement learning: a survey[J]. Intelligent Computing, 2023, 2(1): 0025.
[23] BODNAR C, DAY B, LIÓ P. Proximal distilled evolutionary reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 3283-3290.
[24] HAARNOJA T, ZHOUS A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International Conference on Machine Learning. Stockholm: PMLR, 2018: 1861-1870.
[25] SIGAUD O. Combining evolution and deep reinforcement learning for policy search: a survey[J]. ACM Transactions on Evolutionary Learning, 2023, 3(3): 1-20.
[26] CARR S, JANSEN N, JUNGES S, et al. Safe reinforcement learning via shielding under partial observability[C]//Proceedings of the AAAI Conference on Artificial Intelligence. San Jose: AAAI, 2023: 14748-14756. .
[27] LIU Q, DANG F, WANG X, et al. Autonomous highway merging in mixed traffic using reinforcement learning and motion predictive safety controller[C]//2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). Piscataway: IEEE, 2022: 1063-1069.
[28] TREIBER M, HENNECKE A, HELBING D. Congested traffic states in empirical observations and microscopic simulations[J]. Physical Review E, 2000, 62(2): 1805.
[29] KESTING A, TREIBER M, HELBING D. General lane-changing model MOBIL for car-following models[J]. Transportation Research Record, 2007, 1999(1): 86-94.
[30] POLACK P, ALTCHÉ F, D'ANDRÉA-NOVEL B, et al. The kinematic bicycle model: A consistent model for planning feasible trajectories for autonomous vehicles?[C]//2017 IEEE intelligent vehicles symposium (IV) . Los Angeles: IEEE, 2017: 812-818.
[1] 谢光强, 赵俊伟, 李杨, 许浩然. 基于多集群系统的车辆协同换道控制[J]. 广东工业大学学报, 2021, 38(05): 1-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!