广东工业大学学报 ›› 2022, Vol. 39 ›› Issue (04): 1-8.doi: 10.12052/gdutxb.220030

• •    下一篇

基于蒙特卡洛树搜索的众包概率规划

饶东宁, 易善桢   

  1. 广东工业大学 计算机学院, 广东 广州 510006
  • 收稿日期:2022-02-22 出版日期:2022-07-10 发布日期:2022-06-29
  • 通信作者: 易善桢(1997–),男,硕士研究生,主要研究方向为智能规划,E-mail:2111905038@mail2.gdut.edu.cn
  • 作者简介:饶东宁(1977–),男,副教授,博士,主要研究方向为智能规划
  • 基金资助:
    广东省自然科学基金资助面上项目(2021A1515012556)

Crowdsourcing Probabilistic Planning Based on Monte-Carlo Tree Search

Rao Dong-ning, Yi Shan-zhen   

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2022-02-22 Online:2022-07-10 Published:2022-06-29

摘要: 概率规划问题描述的是一个马尔科夫决策过程,其中的动作具有并行性和不确定性,从而导致概率规划问题的状态空间产生组合爆炸。过大的状态空间会降低规划器的效率,同时也会提高求解的难度。基于蒙特卡洛树搜索的众包概率规划可以将规划任务动态分配给多个规划器,由多个规划器共同对规划问题进行求解;同时使用蒙特卡洛树搜索算法构建前瞻树,通过前瞻树评估不同规划器返回的动作的质量。实验结果表明,随着时间限制放宽,该方法所求得的解的质量呈上升趋势;即使在相同条件下,该方法在求解效率和标准差上都有优势。

关键词: 概率规划, 马尔科夫决策过程, 蒙特卡洛树搜索, 前瞻树, 众包

Abstract: Probabilistic planning is a Markov decision process (MDP) in which actions are parallel and probabilistic. However, both state and action spaces of probabilistic planning problems are exponential, and excessive state space can reduce the efficiency of the planner and increase the difficulty of solving problems. Therefore, a crowdsourcing probabilistic planning is proposed based on Monte-Carlo tree search, which allocates planning tasks to multiple planners to solve probabilistic planning problems. The Monte-Carlo tree search algorithm is used to build a lookahead tree and evaluate the quality of the actions returned by different planners through the lookahead tree. Experimental results show that, with the relaxation of resource restrictions, the quality of the solution obtained by the algorithm continues to improve. Even in the same condition, the proposed method has better performance in solution efficiency and standard deviations.

Key words: probabilistic planning, Markov decision process (MDP), Monte-Carlo tree search, lookahead tree, crowdsourcing

中图分类号: 

  • TP182
[1] KLößNER T, HOFFMANN J, STEINMETZ M, et al. Pattern databases for goal-probability maximization in probabilistic planning[C]//Proceedings of the International Conference on Automated Planning and Scheduling. Palo Alto, CA: AAAI Press, 2021: 201-209.
[2] KUSHMERICK N, HANKS S, WELD D S. An algorithm for probabilistic planning [J]. Artificial Intelligence, 1995, 76(1-2): 239-286.
[3] 饶东宁, 杨锦鹏, 刘越畅. 时态规划综述及研究现状[J]. 广东工业大学学报, 2021, 38(3): 9-16.
RAO D N, YANG J P, LIU Y C. A survey of temporal planning [J]. Journal of Guangdong University of Technology, 2021, 38(3): 9-16.
[4] 饶东宁, 郭海峰, 蒋志华. 基于并行概率规划的股票指数模拟[J]. 计算机学报, 2019(6): 1334-1350.
RAO D N, GUO H F, JIANG Z H. Stock index simulation based on parallel probabilistic planning [J]. Chinese Journal of Computers, 2019(6): 1334-1350.
[5] LITTMAN M L, GOLDSMITH J, MUNDHENK M. The computational complexity of probabilistic planning [J]. Journal of Artificial Intelligence Research, 1998, 9: 1-36.
[6] GELLY S, SILVER D. Monte-Carlo tree search and rapid action value estimation in computer Go [J]. Artificial Intelligence, 2011, 175(11): 1856-1875.
[7] BROWNE C B, POWLEY E, WHITEHOUSE D, et al. A survey of monte carlo tree search methods [J]. IEEE Transactions on Computational Intelligence and AI in Games, 2012, 4(1): 1-43.
[8] SILVER D, VENESS J. Monte-Carlo planning in large POMDPs[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems. La Jolla, California: NIPS, 2010: 2164-2172.
[9] SRINIVASAN S, TALVITIE E, BOWLING M. Improving exploration in UCT using local manifolds[C]//Twenty-Ninth AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2015: 3386-3392.
[10] PATRA S, MASON J, GHALLAB M, et al. Deliberative acting, planning and learning with hierarchical operational models [J]. Artificial Intelligence, 2021, 299: 103523.
[11] CUI H, KELLER T, KHARDON R. Stochastic planning with lifted symbolic trajectory optimization[C]//Proceedings of the International Conference on Automated Planning and Scheduling. Palo Alto, CA: AAAI Press, 2019: 119-127.
[12] RAO D, HU G, JIANG Z. PRobPlan: a framework of integrating probabilistic planning into ROS [J]. IEEE Access, 2020, 8: 106516-106530.
[13] KAZEMI L, SHAHABI C. Geocrowd: enabling query answering with spatial crowdsourcing[C]// Proceedings of the 20th International Conference on Advances in Geographic Information Systems. New York: ACM, 2012: 189-198.
[14] JEONG J, JAGGI P, SANNER S. Symbolic dynamic programming for continuous state MDPs with linear program transitions[C]//Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. Amsterdam: Elsevier, 2021: 4083-4089.
[15] YOUNES H L S, LITTMAN M L. PPDDL1.0: an extension to PDDL for expressing planning domains with probabilistic effects[J/OL]. CMU-CS, 2004 [2022-2-20]http://reports-archive.adm.cs.cmu.edu/anon/anon/home/ftp/usr0/ftp/2004/CMU-CS-04-167.pdf
[16] BELLMAN R E. A markov decision process [J]. Journal of Mathematical Mechanics, 1957, 6: 679-684.
[17] KELLER T, EYERICH P. PROST: Probabilistic planning based on UCT[C]//Twenty-Second International Conference on Automated Planning and Scheduling. Palo Alto, CA: AAAI Press, 2012: 119-127.
[18] GUESTRIN C, KOLLER D, PARR R, et al. Efficient solution algorithms for factored MDPs [J]. Journal of Artificial Intelligence Research, 2003, 19: 399-468.
[19] COLES A, COLES A, OLAYA A G, et al. A survey of the seventh international planning competition [J]. AI Magazine, 2012, 33(1): 83-88.
[20] CANAL G, KRIVIĆ S, LUFF P, et al. PlanVerb: domain-independent verbalization and summary of task plans[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2022.
[21] YOUNES H L S, LITTMAN M L, WEISSMAN D, et al. The first probabilistic track of the international planning competition [J]. Journal of Artificial Intelligence Research, 2005, 24: 851-887.
[22] KAUR J, CHATTERJEE I, LIKHACHEV M. Speeding up search-based motion planning using expansion delay heuristics[C]//Proceedings of the International Conference on Automated Planning and Scheduling. Palo Alto, CA: AAAI Press, 2021: 528-532.
[23] SINGH A, LIPOVETZKY N, RAMIREZ M, et al. Approximate novelty search[C]//Proceedings of the International Conference on Automated Planning and Scheduling. Palo Alto, CA: AAAI Press, 2021: 349-357.
[24] VALLATI M, CHRPA L, GRZEŚ M, et al. The 2014 international planning competition: Progress and trends [J]. AI Magazine, 2015, 36(3): 90-98.
[25] GEIßER F, SPECK D, KELLER T. An analysis of the probabilistic track of the IPC 2018[C]//ICAPS 2019 Workshop on the International Planning Competition (WIPC). Palo Alto, CA: AAAI Press, 2019: 27-35.
[1] 胡斌, 周颖慧, 陶小梅. 情感智能与心理生理计算[J]. 广东工业大学学报, 2021, 38(04): 1-8.
[2] 汪培庄, 曾繁慧, 孙慧, 李兴森, 郭建威, 孟祥福, 何静. 知识图谱的拓展及其智能拓展库[J]. 广东工业大学学报, 2021, 38(04): 9-16.
[3] 饶东宁, 杨锦鹏, 刘越畅. 时态规划综述及研究现状[J]. 广东工业大学学报, 2021, 38(03): 9-16.
[4] 高红, 郗常清, 刘巍. 可拓分析与决策的应用研究:以高校招生体系为例[J]. 广东工业大学学报, 2021, 38(01): 13-20.
[5] 饶东宁, 林卓毅, 魏来. n-度中心度与k-压力中心度及其并行算法[J]. 广东工业大学学报, 2020, 37(03): 36-41.
[6] 饶东宁, 王军星, 魏来, 王雅丽. 并行最小割算法及其在金融社交网络中的应用[J]. 广东工业大学学报, 2018, 35(02): 46-50.
[7] 赵锐. 复杂矛盾问题中传导矛盾问题的分类及成因[J]. 广东工业大学学报, 2017, 34(04): 12-16.
[8] 饶东宁, 温远丽, 魏来, 王雅丽. 基于Spark平台的社交网络在不同文化环境中的中心度加权算法[J]. 广东工业大学学报, 2017, 34(03): 15-20.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!