广东工业大学学报 ›› 2021, Vol. 38 ›› Issue (06): 35-46.doi: 10.12052/gdutxb.210107

• • 上一篇    下一篇

算力网络中一种新颖的看门狗故障检测协议

梁轰1, 冯丽1, 徐方鑫1, 李光程1, 周郭许2,3   

  1. 1. 澳门科技大学 资讯科技学院,澳门 999078;
    2. 广东工业大学 自动化学院,广东 广州 510006;
    3. 粤港澳离散制造智能化联合实验室,广东 广州 510006
  • 收稿日期:2021-07-12 出版日期:2021-11-10 发布日期:2021-11-09
  • 通信作者: 冯丽(1976–),女,副教授,博士,主要研究方向为无线和移动网络、节能、SDN和网络性能分析,E-mail:lfeng@must.edu.mo E-mail:lfeng@must.edu.mo
  • 作者简介:梁轰(1993–),男,博士研究生,主要研究方向为云计算、区块链和无线网络
  • 基金资助:
    国家自然科学基金资助项目(61872451,61872452);澳门科学技术发展基金资助项目(0098/2018/A3,0037/2020/A1,0062/2020/A2)

A Novel Watchdog Fault-Detection Protocol for Compute First Networking

Liang Hong1, Feng Li1, Xu Fang-xin1, Li Guang-cheng1, Zhou Guo-xu2,3   

  1. 1. Faculty of Information Technology, Macau University of Science and Technology, Macao 999078;
    2. School of Automation, Guangdong University of Technology, Guangzhou 510006, China;
    3. Guangdong-Hong Kong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou 510006, China
  • Received:2021-07-12 Online:2021-11-10 Published:2021-11-09

摘要: 算力网络(Compute First Networking, CFN)是最新的分布式框架, 可根据计算负载和网络状态为边缘计算智能地分配计算资源。它要求实时了解本地或远程计算资源的可用状态。本文首次提出集中式故障检测协议CFN-Watchdog (简称为Watchdog), 它可以很好地满足CFN的要求并及时回收故障所占用的资源。然后, 从理论上分析各种参数(如检测阈值、任务处理时间和网络延迟)对Watchdog性能的影响。大量的仿真实验验证了本文提出的协议的有效性和理论模型的准确性。这项研究有助于边缘计算的参数优化配置和设计更好的故障检测协议。

关键词: 边缘计算, 算力网络, Watchdog, 故障检测

Abstract: Compute first networking (CFN) is a latest distributed framework that intelligently allocates computing resources for edge computing according to computing load and network status. It requires real-time visibility of available statuses of local or remote computing resources. To the best of our knowledge, thisis the first endeavor to propose a centralized fault-detection protocol called CFN-Watchdog to well meet this CFN requirement and timely recycle resources occupied by faults. The impact of various parameters (e.g., detection thresholds, task processing time, and network delay) on the Watchdog performance is then theoretically analyzed. Extensive simulations verify the effectiveness of our proposed protocol and the accuracy of our theoretical model. This study is very helpful to optimize parameter configurations and better design fault-detection protocols for edge computing.

Key words: edge computing, compute first networking, watchdog, fault detection

中图分类号: 

  • TP399
[1] LI Y. Framework of compute first networking (CFN) draft-li-rtgwg-cfn-framework-00[EB/OL]. (2019-11-04) [2021-07-15].https://datatracker.ietf.org/doc/html/draft-li-rtgwg-cfn-framework-00.
[2] SONG Y, YAU S S, YU R, et al. An approach to QoS-based task distribution in edge computing networks for IoT applications[C]//2017 IEEE International Conference on Edge Computing (EDGE). Honolulu, HI: IEEE, 2017: 32-39.
[3] VARGHESE B, WANG N, BARBHUIYA S, et al. Nikolopoulos. challenges and opportunities in edge computing[C]//2016 IEEE International Conference on Smart Cloud (SmartCloud). New York: IEEE, 2016: 20-26.
[4] KATZ D, WARD D. Bidirectional forwarding detection (BFD)[EB/OL]. (2010-06) [2021-07-15]. https://datatracker.ietf.org/doc/html/rfc5880.
[5] ZHU H, CHEN H. Adaptive failure detection via heartbeat under Hadoop[C]//2011 IEEE Asia-Pacific Services Computing Conference. Jeju: IEEE, 2011: 231-238.
[6] WIKIPEDIA. Watchdog timer[EB/OL]. (2021-07-15) [2021-07-31]. https://en.wikipedia.org/wiki/Watchdog_timer.
[7] DEERING S. ICMP router discovery messages[EB/OL]. (1991-09) [2021-07-15]. https://datatracker.ietf.org/doc/html/rfc1256.
[8] SHALUNOV S, TEITELBAUM B, KARP A, et al. A one-way active measurement protocol (OWAMP) [EB/OL]. (2006-09) [2021-07-15]. https://datatracker.ietf.org/doc/html/rfc4656.
[9] SRIDHAR K, OOGHE S, VISSERS M P J, et al. System and method for monitoring end nodes using ethernet connectivity fault management (cfm) in an access network: US 7, 688, 742 B2 [P]. 2010-03-30.
[10] KRÓL M, MASTORAKIS S, ORAN D, KUTSCHER D. Compute first networking: Distributed computing meets icn[C]//Proceedings of the 6th ACM Conference on Information-Centric Networking. [S.l.]: ACM, 2019: 67-77.
[11] ARMBRUST M, FOX A, GRIFFITH R, et al. A view of cloud computing [J]. Communications of the ACM, 2010, 53(4): 50-58.
[12] APACHE. Apache Storm[EB/OL]. (2020-06-30) [2021-07-15]. https://storm.apache.org/index.html.
[13] SOUALHIA M, KHOMH F, TAHAR S. ATLAS: an adaptive failure-aware scheduler for Hadoop[C]//2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC). Nanjing: IEEE, 2015: 1-8.
[14] YILDIZ O, IBRAHIM S, PHUONG T A, et al. Chronos: failure-aware scheduling in shared hadoop clusters[C]//2015 IEEE International Conference on Big Data (Big Data). [S.l.]: [s.n.], 2015: 313-318.
[15] SMARA M, ALIOUAT M, PATHAN A S, et al. Acceptance test for fault detection in component-based cloud computing and systems [J]. Future Generation Computer Systems., 2017, 70: 74-93.
[16] SCIRÈA, TROPEANO F A. Anagnostopoulos, and I. Chatzigiannakis. Fog-computing-based heartbeat detection and arrhythmia classification using machine learning [J]. Algorithms, 2019, 12(2): 32.
[17] HU Y C, PATEL M, SABELLA D, et al. Mobile edge computing—A key technology towards 5G [J]. ETSI White Paper, 2015, 11(11): 1-6.
[18] SUN X, ANSARI N. EdgeIoT: Mobile edge computing for the Internet of Things [J]. IEEE Communications Magazine, 2016, 54(12): 22-29.
[19] ABDELALIM K, REDIETEAB G, ROBLOT S D, et al. Adaptive negotiation for block acknowledgment session management[C]//2019 IEEE 89th Vehicular Technology Conference (VTC2019-Spring). Kuala Lumpur: IEEE, 2019: 1-5.
[20] PETERSON L L, DAVIE B S. Computer networks: a systems approach[M]. Amsterdam: Elsevier, 2007.
[1] 朱清华, 鹿安邦, 周俭铁, 侯艳. 改进多种群进化算法求解移动边缘计算中任务调度问题[J]. 广东工业大学学报, 2022, 39(04): 9-16.
[2] 王丰, 李宇龙, 林志飞, 崔苗, 张广驰. 基于计算吞吐量最大化的能量采集边缘计算系统在线资源优化配置[J]. 广东工业大学学报, 2022, 39(04): 17-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!