[Ariadne Logo]


Setting Consistent Global Breakpoint and Replay

We had already used event-based analysis to show that there was some node that was not compressed at level two in the tree that was subsequently compressed at level three. It makes sense to stop the program immediately after level two is processed. With Ariadne, this was done by selecting the second CompressLevel event from the match tree and issuing the command

break after (select);
This generated ave.trace.bpoint file which had to be filtered to the pC++ format. This was done by :
geos [~/ts/newbic]% ar2pcxx ave.trace.bpoint
geos [~/ts/newbic]% ls pcxx*
pcxx_aa0.trace   pcxx_aa2.trace   pcxx_aa4.trace   pcxx_aa6.trace
pcxx_aa1.trace   pcxx_aa3.trace   pcxx_aa5.trace   pcxx_aa7.trace
To replay the program to the consistent state using the above traces and to examine the state of the program using Sneaky state based programmable debugger from the TAU Tools ,we built the replay sneaky version of the program. (Note : snzrpl-mpi has support for sneezy and replay, snz-mpi has just sneezy support, rpl-mpi is just for replay without any sneezy support.)
geos [~/ts/newbic]% build snz-rpl-mpi
build snz-rpl-mpi
        /home/csi/cuny/sameer/rs/sage/bin/sgi8k/pc++  -O -DSNEEZY -D_AA_REPLAY -sneezy -w \
                -o snzrpl-temp.C -I. -I/home/csi/cuny/sameer/rs/sage/tulip/include -I/home/csi/cuny/sameer/rs/sage/pcxxrts/include \
                -D__MPI__ newbic.pc
        CC  -g -I. -I/home/csi/cuny/sameer/rs/sage/tulip/include -I/home/csi/cuny/sameer/rs/sage/pcxxrts/include \
                -D__MPI__ -DSNEEZY -D_AA_REPLAY -c snzrpl-temp.C
        CC  -g  -o snzrpl-mpi snzrpl-temp.o \
                 /home/csi/cuny/sameer/rs/sage/tulip/mpi/lib/sgi8k/libsnzrpl-tulip.a /home/csi/cuny/sameer/rs/sage/lib/sgi8k/libproxy-sock.a /home/csi/cuny/sameer/rs/sage/lib/sgi8k/libserver.a /home/csi/cuny/sameer/rs/sage/pcxxrts/lib/sgi8k/kernelsupport.o  -lm -lmpi

The replay system in Ariadne uses the trace files and stops the computation in a globally consistent state . But what is a globally consistent state in a parallel program?


[PREV] [Back to tutorial] [NEXT]
Sameer Shende <sameer@cs.uoregon.edu>