|
![]() |
This documentation is a quick overview of how to add a new fault-tolerance protocol within ProActive. A more complete version should be released with the version 3.1. If you wish to get more informations, please feel free to send a mail to proactive@objectweb.org.
Fault-tolerance mechanism in ProActive is mainly based on the
org.objectweb.proactive.core.body.ft.protocols.FTManager
class. This class
contains several hooks that are called before and after the main actions of an
active object, e.g. sending or receiving a message, serving a request, etc.
For example, with the Pessimistic Message Logging protocol (PML), messages are
logged just before the delivery of the message to the active
object. Main methods for the FTManager of the PML protocol are then:
/** * Message must be synchronously logged before being delivered. * The LatestRcvdIndex table is updated * @see org.objectweb.proactive.core.body.ft.protocols.FTManager#onDeliverReply(org.objectweb.proactive.core.body.reply.Reply) */ public int onDeliverReply(Reply reply) { // if the ao is recovering, message are not logged if (!this.isRecovering) { try { // log the message this.storage.storeReply(this.ownerID, reply); // update latestIndex table this.updateLatestRvdIndexTable(reply); } catch (RemoteException e) { e.printStackTrace(); } } return 0; } /** * Message must be synchronously logged before being delivered. * The LatestRcvdIndex table is updated * @see org.objectweb.proactive.core.body.ft.protocols.FTManager#onReceiveRequest(org.objectweb.proactive.core.body.request.Request) */ public int onDeliverRequest(Request request) { // if the ao is recovering, message are not logged if (!this.isRecovering) { try { // log the message this.storage.storeRequest(this.ownerID, request); // update latestIndex table this.updateLatestRvdIndexTable(request); } catch (RemoteException e) { e.printStackTrace(); } } return 0; }
The local variable this.storage
is remote reference to the
checkpoint server. The FTManager class contains a reference to each
fault-tolerance server: fault-detector, checkpoint storage and localization
server. Those reference are initialized during the creation of the active
object.
A FTManager must define also a beforeRestartAfterRecovery()
method, which is called when an active object is recovered. This method usually
restore the state of the active object so as to be consistent with the others
active objects of the application.
For example, with the PML protocol, all the messages logged before the failure
must be delivered to the active object. The method
beforeRestartAfterRecovery()
thus looks like:
/** * Message logs are contained in the checkpoint info structure. */ public int beforeRestartAfterRecovery(CheckpointInfo ci, int inc) { // recovery mode: received message no longer logged this.isRecovering = true; //get messages List replies = ci.getReplyLog(); List request = ci.getRequestLog(); // add messages in the body context Iterator itRequest = request.iterator(); BlockingRequestQueue queue = owner.getRequestQueue(); // requests while (itRequest.hasNext()) { queue.add((Request) (itRequest.next())); } // replies Iterator itReplies = replies.iterator(); FuturePool fp = owner.getFuturePool(); try { while (itReplies.hasNext()) { Reply current = (Reply) (itReplies.next()); fp.receiveFutureValue(current.getSequenceNumber(), current.getSourceBodyID(), current.getResult(), current); } } catch (IOException e) { e.printStackTrace(); } // normal mode this.isRecovering = false; // enable communication this.owner.acceptCommunication(); try { // update servers this.location.updateLocation(ownerID, owner.getRemoteAdapter()); this.recovery.updateState(ownerID, RecoveryProcess.RUNNING); } catch (RemoteException e) { logger.error("Unable to connect with location server"); e.printStackTrace(); } return 0; }
The parameter ci
is a
org.objectweb.proactive.core.body.ft.checkpointing.CheckpointInfo
.
This object contains all the informations linked to the checkpoint used for
recovering the active object, and is used to restore its state. The programmer
might defines his own class implementing CheckpointInfo
, to add
needed informations, depending on the protocol.
ProActive include a global server that provide fault detection, active object
localization, resource service and checkpoint storage. For developing a new
fault-tolerance protocol, the programmer might specify the behavior of the
checkpoint storage by extending the class
org.objectweb.proactive.core.body.ft.servers.storage.CheckpointServerImpl
.
For example, only for the PML protocol and not for the CIC protocol, the
checkpoint server must be able to log synchronously messages. The other parts
of the server can be used directly.
To specify the recovery algorithm, the programmer must extends the
org.objectweb.proactive.core.body.ft.servers.recovery.RecoveryProcessImpl
.
In the case of the CIC protocol, all the active object of the application must
recover after one failure, while only the faulty process must restart with the
PML protocol; this specific behavior is coded in the recovery process.