广东工业大学学报 ›› 2021, Vol. 38 ›› Issue (06): 62-69.doi: 10.12052/gdutxb.210112

• • 上一篇    下一篇

基于多智能体强化学习的区块链赋能车联网中的安全数据共享

李明磊1, 章阳1,2, 康嘉文3, 徐敏锐4, Dusit Niyato4   

  1. 1. 武汉理工大学 计算机科学与技术学院,湖北 武汉 430000;
    2. 南京航空航天大学 计算机科学与技术学院,江苏 南京 210016;
    3. 广东工业大学 自动化学院,广东 广州 510006;
    4. 新加坡南洋理工大学 计算机科学与工程学院,新加坡 639798
  • 收稿日期:2021-07-12 出版日期:2021-11-10 发布日期:2021-11-09
  • 通信作者: 康嘉文(1989–),男,副教授,博士,主要研究方向为区块链、人工智能和物联网等,E-mail:kavinkang@ntu.edu.sg E-mail:kavinkang@ntu.edu.sg
  • 作者简介:李明磊(1995–),男,硕士研究生,主要研究方向为强化学习和区块链等
  • 基金资助:
    国家自然科学基金资助项目(62071343)

Multi-Agent Reinforcement Learning for Secure Data Sharing in Blockchain-Empowered Vehicular Networks

Li Ming-lei1, Zhang Yang1,2, Kang Jia-wen3, Xu Min-rui4, Dusit Niyato4   

  1. 1. School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430000, China;
    2. School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;
    3. School of Automation Guangdong University of Technology, Guangzhou 510006, China;
    4. School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
  • Received:2021-07-12 Online:2021-11-10 Published:2021-11-09

摘要: 针对基于委托权益证明(Delegated Proof-of-Stake, DPoS) 共识算法的区块链赋能车联网系统中区块验证的安全性与可靠性问题, 矿工通过引入轻节点(如智能手机等边缘节点)共同参与区块验证,提高区块验证的安全性和可靠性。为了激励矿工主动引入轻节点, 采用了斯坦伯格(Stackelberg)博弈模型对区块链用户与矿工进行建模, 实现区块链用户的效用和矿工的个人利润最大化。作为博弈主方的区块链用户设定最优的区块验证的交易费, 而作为博弈从方的矿工决定最优的招募验证者(即轻节点)的数量。为了找到所设计Stackelberg博弈的纳什均衡, 设计了一种基于多智能体强化学习算法来搜索接近最优的策略。最后对本文方案进行验证, 结果表明该方案既能实现区块链用户和矿工效益最大化, 也能保证区块验证的安全性与可靠性。

关键词: 区块验证, 委托权益证明, 博弈论, 多智能体强化学习

Abstract: To achieve secure and reliable block verification, miner nodes of Delegated Proof-of-Stake (DPoS) consensus algorithm can collaborate with nearby light nodes (e.g., smart phones) to verify new block data for secure blockchain-empowered vehicular networks. In order to encourage miners to actively cooperate with light nodes in block verification, a Stackelberg game model is proposed to formulate the interaction between blockchain users and miners, thus jointly maximizing the utility of blockchain users and the profits of miners. The blockchain user acts as the leader setting the optimal transaction fee for block verification, and the miners as the followers determining the optimal number of verifiers to be recruited for block verification. To find out the Nash equilibrium of the game model, a multi-agent reinforcement learning algorithm is designed to search for a strategy close to the optimal one. The numerical results show that the proposed scheme can jointly maximize the benefits of blockchain users and miners and also ensure the safety and reliability of block verification.

Key words: block verification, delegated Proof-of-Stake, game theory, multi-agent reinforcement learning

中图分类号: 

  • TP393
[1] 刘宗巍, 宋昊坤, 郝瀚, 等. 基于4S融合的新一代智能汽车创新发展战略研究[J]. 中国工程科学, 2021, 23(3): 153-162.
LIU Z W, SONG H K, HAO H, et al. Innovation and development strategies of China’s new-generation smart vehicles based on 4S integration [J]. Engineering Sciences, 2021, 23(3): 153-162.
[2] YANG Z, YANG K, LEI L, et al. Blockchain-based decentralized trust management in vehicular networks [J]. IEEE Internet of Things Journal, 2018, 6(2): 1495-1505.
[3] 王春东, 罗婉薇, 莫秀良, 等. 车联网互信认证与安全通信综述[J]. 计算机科学, 2020, 47(11): 1-9.
WANG C D, LUO W W, MO X L, et al. Survey on mutual trust authentication and secure communication of internet of vehicles [J]. Computer Science, 2020, 47(11): 1-9.
[4] CHEN C, WU J, LIN H, et al. A secure and efficient blockchain-based data trading approach for Internet of vehicles [J]. IEEE Transactions on Vehicular Technology, 2019, 68(9): 9110-9121.
[5] YUAN Y, WANG F Y. Towards blockchain-based intelligent transportation systems[C]//2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC). Rio de Janeiro: IEEE, 2016: 2663-2668.
[6] KANG J, YU R, HUANG X, et al. Enabling localized peer-to-peer electricity trading among plug-in hybrid electric vehicles using consortium blockchains [J]. IEEE Transactions on Industrial Informatics, 2017, 13(6): 3154-3164.
[7] 谭敏生, 杨杰, 丁琳, 等. 区块链共识机制综述[J]. 计算机工程, 2020, 46(12): 1-11.
TAN M S, YANG J, DING L, et al. Review of consensus mechanism of blockchain [J]. Computer Engineering, 2020, 46(12): 1-11.
[8] 高迎, 谭学程. DPoS共识机制的改进方案[J]. 计算机应用研究, 2020, 37(10): 3086-3090.
GAO Y, TAN X C. Improvement of DPoS consensus mechanism [J]. Application Research of Computers, 2020, 37(10): 3086-3090.
[9] KANG J, XIONG Z, NIYATO D, et al. Toward secure blockchain-enabled Internet of vehicles: optimizing consensus management using reputation and contract theory [J]. IEEE Transactions on Vehicular Technology, 2019, 68(3): 2906-2920.
[10] OMETOV A, BAEDINOVA Y, AFANASYEVA A, et al. An Overview on blockchain for smartphones: state-of-the-art, consensus, implementation, challenges and future trends [J]. IEEE Access, 2020, 8: 103994-104015.
[11] KANG J, XIONG Z, NIYATO D, et al. Incentivizing consensus propagation in proof-of-stake based consortium blockchain networks [J]. IEEE Wireless Communications Letters, 2018, 8(1): 157-160.
[12] DAI Y, XU D, ZHANG K, et al. Deep reinforcement learning and permissioned blockchain for content caching in vehicular edge computing and networks [J]. IEEE Transactions on Vehicular Technology, 2020, 69(4): 4312-4324.
[13] MOLLAH M B, ZHAO J, NIYATO D, et al. Blockchain for the Internet of vehicles towards intelligent transportation systems: a survey [J]. IEEE Internet of Things Journal, 2021, 8(6): 4157-4185.
[14] TOMESCU A, DEVADAS S. Catena: efficient non-equivocation via bitcoin[C]//2017 IEEE Symposium on Security and Privacy (SP). San Jose: IEEE, 2017: 393-409.
[15] CHEN J, MICALI S. Algorand: the efficient and democratic ledger[EB/OL]. (2016-07-05)[2021-07-12]. https://arxiv.org/abs/1607.01341.
[16] DELGADO-SEGURA S, BAKSHI S, JAMES L, et al. Txprobe: discovering bitcoin’s network topology using orphan transactions[EB/OL] (2018-12-10)[2021-07-08]. https://arxiv.org/abs/1812.00942.
[17] HSUEH C W, CHIN C T. EPoW: solving blockchain problems economically[C]//2017 IEEE Smart World, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation. San Francisco: IEEE, 2017: 1-8.
[18] LI L, OTA K, DONG M. Sustainable CNN for robotic: an offloading game in the 3D vision computation [J]. IEEE Transactions on Sustainable Computing, 2018, 4(1): 67-76.
[19] YANG D, XUE G, FANG X, et al. Incentive mechanisms for crowdsensing: crowdsourcing with smartphones [J]. IEEE/ACM Transactions on Networking, 2015, 24(3): 1732-1744.
[20] LIU X, WANG W, NIYATO D, et al. Evolutionary game for mining pool selection in blockchain networks [J]. IEEE Wireless Communications Letters, 2018, 7(5): 760-763.
[21] DECKER C, WATTENHOFER R. Information propagation in the bitcoin network[C]//IEEE P2P 2013 Proceedings. Trento: IEEE, 2013: 1-10.
[22] 吴雨芯, 蔡婷, 张大斌. 移动边缘计算中基于Stackelberg博弈的算力交易与定价[J]. 计算机应用, 2020, 40(9): 2683-2690.
WU Y X, CAI T, ZHANG D B. Computing power trading and pricing mobile edge computing based on Stackelberg game [J]. Journal of Computer Applications, 2020, 40(9): 2683-2690.
[23] 叶佩文, 贾向东, 杨小蓉, 等. 面向车联网的多智能体强化学习边云协同卸载[J]. 计算机工程, 2021, 47(4): 13-20.
YE P W, JIA X D, YANG X R, et al. Collaborative edge and cloud offloading for Internet of vehicles using multi-agent reinforcement learning [J]. Computer Engineering, 2021, 47(4): 13-20.
[24] 陈前斌, 谭颀, 贺兰钦, 等. 云雾混合网络下基于多智能体架构的资源分配及卸载决策研究[J]. 电子与信息学报, 2021, 43(9): 2654-2662.
CHEN Q B, TAN Q, HE L Q, et al. Research on resource allocation and offloading decision based on multi-agent architecture in cloud-fog hybrid network [J]. Journal of Electronics and Information Technology, 2021, 43(9): 2654-2662.
[25] HERNANDEZ-LEAL P, KARTAL B, TAYLOR M E. A survey and critique of multiagent deep reinforcement learning [J]. Autonomous Agents and Multi-Agent Systems, 2019, 33(6): 750-797.
[26] GUO D, TANG L, ZHANG X, et al. Joint optimization of handover control and power allocation based on multi-agent deep reinforcement learning [J]. IEEE Transactions on Vehicular Technology, 2020, 69(11): 13124-13138.
[27] SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[EB/OL]. (2015-06-08)[2021-07-12]. https://arxiv.org/abs/1506.02438.
[28] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-07-20)[2021-07-12]. https://arxiv.org/abs/1707.06347.
[1] 徐慧婷. 基于F2F的生鲜电商三级供应链决策及协调研究[J]. 广东工业大学学报, 2022, 39(03): 25-31.
[2] 李莎莎, 崔铁军. 综合操作者与管理者行为博弈的系统收益分析方法研究[J]. 广东工业大学学报, 2021, 38(04): 35-40.
[3] 陈冰儿, 王帮海, 劳南新. 基于DPoS扩展的量子加密区块链[J]. 广东工业大学学报, 2021, 38(02): 34-38.
[4] 王娜娜, 刘巍, 仇金龙. 关于航空客座率的不相容问题的研究[J]. 广东工业大学学报, 2019, 36(02): 9-13.
[5] 杨洪志,刘玮. 伦敦奥运会羽毛球消极比赛事件的可拓策略分析[J]. 广东工业大学学报, 2013, 30(2): 7-12.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!