We had already used event-based analysis to show that there was some node that was not compressed at level two in the tree that was subsequently compressed at level three. It makes sense to stop the program immediately after level two is processed. With Ariadne, this was done by selecting the second CompressLevel event from the match tree and issuing the command
break after (select);This generated ave.trace.bpoint file which had to be filtered to the pC++ format. This was done by :
geos [~/ts/newbic]% ar2pcxx ave.trace.bpoint geos [~/ts/newbic]% ls pcxx* pcxx_aa0.trace pcxx_aa2.trace pcxx_aa4.trace pcxx_aa6.trace pcxx_aa1.trace pcxx_aa3.trace pcxx_aa5.trace pcxx_aa7.traceTo replay the program to the consistent state using the above traces and to examine the state of the program using Sneaky state based programmable debugger from the TAU Tools ,we built the replay sneaky version of the program. (Note : snzrpl-mpi has support for sneezy and replay, snz-mpi has just sneezy support, rpl-mpi is just for replay without any sneezy support.)
geos [~/ts/newbic]% build snz-rpl-mpi build snz-rpl-mpi /home/csi/cuny/sameer/rs/sage/bin/sgi8k/pc++ -O -DSNEEZY -D_AA_REPLAY -sneezy -w \ -o snzrpl-temp.C -I. -I/home/csi/cuny/sameer/rs/sage/tulip/include -I/home/csi/cuny/sameer/rs/sage/pcxxrts/include \ -D__MPI__ newbic.pc CC -g -I. -I/home/csi/cuny/sameer/rs/sage/tulip/include -I/home/csi/cuny/sameer/rs/sage/pcxxrts/include \ -D__MPI__ -DSNEEZY -D_AA_REPLAY -c snzrpl-temp.C CC -g -o snzrpl-mpi snzrpl-temp.o \ /home/csi/cuny/sameer/rs/sage/tulip/mpi/lib/sgi8k/libsnzrpl-tulip.a /home/csi/cuny/sameer/rs/sage/lib/sgi8k/libproxy-sock.a /home/csi/cuny/sameer/rs/sage/lib/sgi8k/libserver.a /home/csi/cuny/sameer/rs/sage/pcxxrts/lib/sgi8k/kernelsupport.o -lm -lmpi
The replay system in Ariadne uses the trace files and stops the computation in a globally consistent state . But what is a globally consistent state in a parallel program?