next up previous
Next: Additional farm daemons Up: Future Work Previous: Future Work

Fault Tolerance

A feature of fault tolerance can be incorporated in the farm daemon. This can be used when a node having a worker task goes down. PVM provides constructs to notify a task (if it has so indicated) when a task exits PVM or the node on which it is running halts due to some error [7] . This feature could be useful in the case where the farmer task needs ``at least one'' reply to come from the worker task for every work packet that is sent out. It would be possible to incorporate fault tolerance in the farm daemon because it could maintain a list of work packets which were currently being processed (it needn't keep a list of all work packets, as when a worker task requests for a new work packet, the earlier one can be removed from the list). When PVM would notify the farm daemon of a worker task exiting or a node malfunctioning, then the farm daemon could either fork the task on another node or using the existing nodes could resend the work packet to the other worker tasks after removing the entry about the dead task(s) from its lists.


Sameer Shende