Rescheduling and Checkpointing as Strategies to Run Synchronous Parallel Programs on P2P Desktop Grids

Abstract

Today, BSP (Bulk-Synchronous Parallel) represents one of the most often used models for writing tightly-coupled parallel programs. As resource substrates, commonly clusters and eventually computational grids are used to run BSP applications. In this context, here we investigate the use of collaborative computing and idle resources to execute this kind of demand, so we are proposing a model named BSPonP2P to answer the following question: How can we develop an efficient and viable model to run BSP applications on P2P Desktop Grids? We answer it by providing both process rescheduling and checkpointing to deal with dynamism at application and infrastructure levels and resource heterogeneity. The results concern a prototype that ran over a subset of the Grid5000, showing encouraging results on using collaboration and volatile resources for HPC.

Publication
30th Annual ACM Symposium on Applied Computing, pages 501-504, Salamanca, Spain, April
Date
Links