Profiling the MPI Game of Life Code with TAU

  1. Here is the tarball in case you lost it from last week.

  2. Try profiling this application with TAU on Vesta.
    Purple means put it in a file.
    Blue means run it on the command line.


    • Vesta:
      • 1) Environment setup:

      • Start with a ~/.soft file that only contains:

      • +mpiwrapper-xl
      • @default
      • Manually add Tau2 to your path:

      • $ export PATH=/soft/perftools/tau/tau2/bgq/bin:$PATH

      • This points to the latest tau2 git repository (the default softenv libraries are out of date at the moment)

      • 2) Set some TAU environment variables:

      • $ export TAUROOT=/soft/perftools/tau/tau2/bgq

      • What TAU Makefiles are available?
      • $ ls $TAUROOT/lib/Makefile*

      • Try the MPI-papi one:
      • $ export TAU_MAKEFILE=$TAUROOT/lib/Makefile.tau-bgqtimers-papi-mpi-pdt

      • o

      • 3) Build the application:

      • $ cd <location of your "Game of Life" directory>
      • $ ./configure
      • Open the "Makefile" and change MPICC to "tau_cc.sh":

      • MPICC = tau_cc.sh

      • Build it:
      • $ make

      • You should see a bunch of output from the TAU wrapper...

      • o

      • 4) Launch the application:

      • Here's a reasonable jobscript.sh file:

      • #!/bin/bash
      • runjob -p 16 --np 64 --env-all --block $COBALT_PARTNAME : mlife2d -x 1024 -y 1024 -i 100
      • echo End of jobscript.sh
      • exit 0
      • Launch it:

      • $ qsub -A ATPESC2013 -q Q.ATPESC2013 -t 10 -n 4 --mode script jobscript.sh

      • It should only take a minute or so to finish...

      • o

      • 5) Take a look at the profile:

      • You might want to do this somewhere besides the Vesta head node (like /home/dozog/tau2/x86_64/bin/paraprof on Tukey):

      • $ paraprof
      • Here's what it might look like:




      • Here's the communication matrix:

      • To enable communication monitoring you need to set TAU_COMM_MATRIX=1:

      • runjob -p 16 --np 64 --envs TAU_COMM_MATRIX=1 -block $COBALT_PARTNAME : ...
      • o

      • Side Note:

      • You might consider intercepting and timing the MPI calls in the Mpi4py version of Game of Life:

      • https://github.com/ahmadia/atpesc-2013

      • With something like this:

      • $ mpiexec -n 4 tau_exec -T mpi4py python bench_life.py
      • Here's some info on profiling the Python code itself:

      • TAU / Python

      • Note: you will need to use a TAU configured with "-python" to do this.

Try profiling a CUDA application with TAU/CUPTI

  • Keeneland:
    • 1) Environment setup:

    • Manually add Tau2 to your path:

    • $ export PATH=/nics/d/home/std0015/usr/src/tau2/x86_64/bin:$PATH

    • This points to the latest tau2 git repository (the default "module load tau" is out of date at the moment)

    • 2) Set some TAU environment variables:

    • $ export TAUROOT=/nics/d/home/std0015/usr/src/tau2/x86_64

    • What TAU Makefiles are available?
    • $ ls $TAUROOT/lib/Makefile*

    • Try the CUPTI-MPI one:
    • $ export TAU_MAKEFILE=$TAUROOT/lib/Makefile.tau-cupti-mpi-pdt

    • o

    • 3) Build some CUDA application

    • I tried this one, but the performance characteristics are extremely boring:

    • o

    • This time, you might try profiling with dynamic instrumentation (i.e., building with g++ and nvcc, then launching with tau_exec)

    • Note: If using the CuPoisson example, you'll need to point to the appropriate cuda include directory in the Makefile:

    • ...
    • CFLAGS = -Wall -O3 -I /sw/kfs/cuda/4.2/linux_binary/include
    • ...
    • Build it:
    • $ make

    • o

    • 4) Launch the application:

    • Here's an example jobscript.sh file:

    • #!/bin/bash
    • #PBS -A UT-NTNLEDU
    • cd $PBS_O_WORKDIR
    • export PATH=/nics/d/home/std0015/usr/src/tau2/x86_64/bin:$PATH
    • tau_exec -T cupti -cupti ./cupoisson
    • Launch it:

    • $ qsub jobscript.sh

    • o

    • 5) Take a look at the profile. Again, it might be slightly rude to do this on the head-node...

    • $ paraprof
    • If you're having trouble with X-forwarding, you can always use the command line version::

    • $ pprof

Try profiling the OpenMP Examples using TAU/Opari

  1. Here is the zip from last Thursday in case you lost it.

  2. If you're on Vesta or Keeneland, find a TAU_MAKEFILE that was built with "openmp-opari"

  3. You're on your own this time, have fun!

Please sir, I want some more...

  • Try downloading the TAU source and build it yourself to create a custom wrapper.

  • There's much much more to TAU than what's exposed here. Take a look at the docs to learn more.