Non-blocking minimum processes coordinated checkpointing for hierarchical computational grid

Document Type : Original Article

Authors

Electrical Engineering Department, Assiut University, Egypt.

Abstract

Fault tolerance is an important property in grid computing as the dependability of individual grid resources may not be able to be guaranteed. Common fault tolerance techniques in distributed systems are normally achieved with checkpoint recovery, message logging with checkpointing, or through task replication on alternative resources in cases of a system outage. In this paper, we present a mailbox-based non-blocking minimum processes coordinated checkpoint protocol for hierarchical grid. In our grid model, processes on different processors communicate indirectly by sending messages over the network through mailbox-based technique at a shared node. The mailbox of each process can be exploited as an events logger since it logs the messages sent to the process in strict FIFO order. The main advantages of our approach are achieving more parallelism and suiting the highly dynamic environment where processes frequently migrate from one
node to anotherز

Keywords