UO-logo
TAU-logo
General
Instrumentation
Sampling
Multiprocess
Threads
Runtime
Other

TAU FAQ - Frequently Asked Questions about TAU

    General Questions

  1. Who do I contact for help (after reading this page)?
  2. What is TAU?
  3. What are the pre-requisites / dependencies for TAU?
  4. How do I get TAU?
  5. How do I build TAU?
  6. What systems are supported?
  7. What languages / compilers are supported?
  8. Can you give me a simple example with GCC and OpenMP?
  9. Instrumentation Questions

  10. What is instrumentation?
  11. Why should I instrument my code?
  12. How should I instrument my code?
  13. How do I configure and use PDT?
  14. Why SHOULDN'T I instrument my code?
  15. What about binary instrumentation? Does TAU support that?
  16. Sampling Questions

  17. What is sampling?
  18. Why should I sample my code?
  19. How should I sample my code?
  20. Can I instrument and sample my code? How would I do that?
  21. Multiprocess Questions

  22. Which MPI implementations are supported?
  23. How do I build TAU for MPI?
  24. Multithread Questions

  25. How do I build TAU for OpenMP?
  26. How do I build TAU for pthreads?
  27. Runtime Questions

  28. What are the environment variables for controlling TAU at runtime?
  29. How do I collect a profile?
  30. How do I collect a trace?
  31. Other Questions

  32. I can't use -bfd=download (no network access). How can I configure binutils for TAU?

    General Questions

  1. Who do I contact for help (after reading this page)?

    When in doubt, you could read the manuals. If you can't find what you are looking for, please send an email to .
  2. What is TAU?

    Tuning and Analysis Utilities (TAU) is a project of the Performance Research Laboratory, in the Computer and Information Science Department at the University of Oregon. TAU is an instrumentation API and measurement library for performance analysis, particularly large scale parallel (HPC) applications. However, TAU can be used to measure nearly any application, including those written in C, C++, Fortran, Python, Java, etc. TAU includes source-to-source compilers and binary tools for automatic instrumentation and/or periodic sampling. TAU also includes analysis and visualization tools. TAU does lots more. For more information, see the TAU web site.
  3. What are the pre-requisites / dependencies for TAU?

    The only hard dependency for TAU is a working C++ compiler. Depending on the additional features you want to use, you may have other dependencies. Common dependencies include:
    • Low overhead instrumentation: PDT (HIGHLY recommended for instrumentation)
    • Program counter lookup: Binutils (included with TAU if not preinstalled, recommended for compiler-based instrumentation - i.e. without PDT, also recommended for OpenMP and/or sampling measurement) See these instructions for building binutils on your own (if necessary).
    • Hardware counters: PAPI
    • Callstack unwinding: Libunwind (included with TAU if not preinstalled, recommended for sampling measurement)
    • MPI (probably pre-installed on your system)
    • CUDA/CUPTI (probably pre-installed on your system)
    Unusual dependencies include:
    • SHMEM (probably pre-installed on your system)
    • Tracing: VTF, OTF, or EPILOG library
    • I/O profiling: Darshan
    • Score-P
    • Java: a working JDK
    • Python
  4. How do I get TAU?

    TAU is available for download from http://tau.uoregon.edu. Follow the instructions on the "Downloads" page, or download it directly using wget from a terminal:
    wget http://tau.uoregon.edu/tau.tgz
    ...expand the tarball...
    tar -xzf tau.tgz
    ...and change to the TAU directory. The release number will vary.
    cd tau-[major_version].[minor_version].[point_version]
  5. How do I build TAU?

    TAU is typically configured with:
    ./configure
    make install
    ...and then you need to put TAU in your shell command path:
    export PATH=$PATH:/$arch/bin # bash
    set path ($path /$arch/bin)  # csh
    ...although, this will just support measuring single threaded, single process applications with no hardware counter support, or any other options. If you want TAU, you are likely interested in those options (see below). For a full list of options, try:
    ./configure -help
    ...or:
    ./configure -fullhelp
    ...for options, although you will likely end up back on this page. The most common configuration is the specification of a compiler (other than the autodetected default). Here is an example using the Intel compiler:
    ./configure -cc=icc -c++=icpc -fortran=intel ... 
  6. What systems are supported?

    TAU has been ported to all major systems. The simplified (and not exhaustive) list includes:
    • any/all versions of *nix, Linux clusters
    • Cray Systems (XT3, CNL, XMT, etc.)
    • IBM BG/Q, BG/P, BG/L, AIX, IBM Linux)
    • Sun
    • SGI
    • Windows
    • OSX
    • GPGPU systems
    • ...and many more.
  7. What languages / compilers are supported?

    If you are instrumenting your code, TAU supports many languages and compilers. The list includes:
    • C: cc, gcc, clang, bgclang, gcc4, scgcc, KCC, pgcc, guidec, *xlc*, ecc, pathcc, orcc
    • C++: CC, KCC, g++, *xlC*, cxx, pgCC, pgcpp, FCC, guidec++, aCC, c++, ecpc, clang++, bgclang++, g++4, icpc, scgcc, scpathCC, pathCC, orCC
    • Fortran: gnu, sgi, ibm, ibm64, hp, cray, pgi, absoft, fujitsu, sun, compaq, g95, open64, kai, nec, hitachi, intel, absoft, lahey, nagware, pathscale gfortran, gfortran4
    • Unified Parallel C: upc/gcc (GNU UPC), upcc (Berkeley UPC), cc (Cray CCE UPC)
    • Python
    • Java
    ...If you are NOT instrumenting your code, you can use TAU sampling to measure any executable.
  8. Can you give me a simple example with GCC and OpenMP?

    Sure!
    ./configure -openmp                   # configures TAU
    $ make install                          # builds TAU
    $ export PATH=$PATH:`pwd`/`arch`/bin    # puts TAU utilities in your execution path
    $ export TAU_MAKEFILE=`pwd`/`arch`/lib/Makefile.tau-openmp   # tells tau_cc.sh what settings to use 
    That's it! GCC is the default compiler, and if it is in your path, you are ready to go. For example, if you have a simple program in one C file, you would instrument and build it with TAU like this:
    tau_cc.sh test.c -fopenmp -o test
    ...will parse, instrument and compile your code. When you execute it, you will get N profiles, profile.0.0.0 ... profile.0.0.N-1:
    OMP_NUM_THREADS=2 ./test
    ...and to see a summary of the profile data, run the pprof program:
    pprof
    The examples directory is full of useful examples for different TAU configurations. The simplest example is in the examples/mm directory. The full set of instructions for building TAU and running that example is:
    ./configure -openmp 
    $ make install
    $ export PATH=$PATH:`pwd`/`arch`/bin
    $ export TAU_MAKEFILE=`pwd`/`arch`/lib/Makefile.tau-openmp
    $ cd examples/mm
    $ make clean
    $ make
    $ export OMP_NUM_THREADS=2
    $ ./matmult
    $ pprof
    
    ...which will generate output something like this:
    pprof
    Reading Profile files in profile.*
    
    NODE 0;CONTEXT 0;THREAD 0:
    ---------------------------------------------------------------------------------------
    %Time    Exclusive    Inclusive       #Call      #Subrs  Inclusive Name
                  msec   total msec                          usec/call 
    ---------------------------------------------------------------------------------------
    100.0            4           36           1           1      36522 .TAU application
     89.0        0.086           32           1           1      32502 main 
     88.8        0.332           32           1          11      32416 do_work 
     48.4        0.149           17           1           1      17668 compute 
     48.0           17           17           1           1      17519 OpenMP_PARALLEL_REGION: compute.omp_fn.1 
     31.8        0.049           11           1           1      11610 compute_interchange 
     31.7           11           11           1           1      11561 OpenMP_PARALLEL_REGION: compute_interchange.omp_fn.0 
      5.1            1            1           3           3        619 initialize 
      2.4         0.88         0.88           3           0        293 allocateMatrix 
      1.2        0.409        0.422           3           3        141 OpenMP_PARALLEL_REGION: initialize.omp_fn.0 
      1.1        0.405        0.405           1           0        405 OpenMP_IMPLICIT_BARRIER: compute.omp_fn.1 
      0.7        0.251        0.251           1           0        251 OpenMP_IMPLICIT_BARRIER: compute_interchange.omp_fn.0 
      0.2        0.069        0.069           3           0         23 freeMatrix 
      0.0        0.013        0.013           3           0          4 OpenMP_IMPLICIT_BARRIER: initialize.omp_fn.0 
    
    NODE 0;CONTEXT 0;THREAD 1:
    ---------------------------------------------------------------------------------------
    %Time    Exclusive    Inclusive       #Call      #Subrs  Inclusive Name
                  msec   total msec                          usec/call 
    ---------------------------------------------------------------------------------------
    100.0        0.539           29           1           5      29793 .TAU application
     58.8           17           17           1           0      17509 OpenMP_PARALLEL_REGION: compute.omp_fn.1 
     38.8           11           11           1           0      11551 OpenMP_PARALLEL_REGION: compute_interchange.omp_fn.0 
      0.7        0.194        0.194           3           0         65 OpenMP_PARALLEL_REGION: initialize.omp_fn.0 
    
    FUNCTION SUMMARY (total):
    ---------------------------------------------------------------------------------------
    %Time    Exclusive    Inclusive       #Call      #Subrs  Inclusive Name
                  msec   total msec                          usec/call 
    ---------------------------------------------------------------------------------------
    100.0            4           66           2           6      33158 .TAU application
     52.8           34           35           2           1      17514 OpenMP_PARALLEL_REGION: compute.omp_fn.1 
     49.0        0.086           32           1           1      32502 main 
     48.9        0.332           32           1          11      32416 do_work 
     34.9           22           23           2           1      11556 OpenMP_PARALLEL_REGION: compute_interchange.omp_fn.0 
     26.6        0.149           17           1           1      17668 compute 
     17.5        0.049           11           1           1      11610 compute_interchange 
      2.8            1            1           3           3        619 initialize 
      1.3         0.88         0.88           3           0        293 allocateMatrix 
      0.9        0.603        0.616           6           3        103 OpenMP_PARALLEL_REGION: initialize.omp_fn.0 
      0.6        0.405        0.405           1           0        405 OpenMP_IMPLICIT_BARRIER: compute.omp_fn.1 
      0.4        0.251        0.251           1           0        251 OpenMP_IMPLICIT_BARRIER: compute_interchange.omp_fn.0 
      0.1        0.069        0.069           3           0         23 freeMatrix 
      0.0        0.013        0.013           3           0          4 OpenMP_IMPLICIT_BARRIER: initialize.omp_fn.0 
    
    FUNCTION SUMMARY (mean):
    ---------------------------------------------------------------------------------------
    %Time    Exclusive    Inclusive       #Call      #Subrs  Inclusive Name
                  msec   total msec                          usec/call 
    ---------------------------------------------------------------------------------------
    100.0            2           33           1           3      33158 .TAU application
     52.8           17           17           1         0.5      17514 OpenMP_PARALLEL_REGION: compute.omp_fn.1 
     49.0        0.043           16         0.5         0.5      32502 main 
     48.9        0.166           16         0.5         5.5      32416 do_work 
     34.9           11           11           1         0.5      11556 OpenMP_PARALLEL_REGION: compute_interchange.omp_fn.0 
     26.6       0.0745            8         0.5         0.5      17668 compute 
     17.5       0.0245            5         0.5         0.5      11610 compute_interchange 
      2.8        0.718        0.928         1.5         1.5        619 initialize 
      1.3         0.44         0.44         1.5           0        293 allocateMatrix 
      0.9        0.301        0.308           3         1.5        103 OpenMP_PARALLEL_REGION: initialize.omp_fn.0 
      0.6        0.203        0.203         0.5           0        405 OpenMP_IMPLICIT_BARRIER: compute.omp_fn.1 
      0.4        0.126        0.126         0.5           0        251 OpenMP_IMPLICIT_BARRIER: compute_interchange.omp_fn.0 
      0.1       0.0345       0.0345         1.5           0         23 freeMatrix 
      0.0       0.0065       0.0065         1.5           0          4 OpenMP_IMPLICIT_BARRIER: initialize.omp_fn.0 
    
    

  9. Instrumentation Questions

  10. What is instrumentation?

    Instrumentation is the process of inserting observation/measurement into an executable. It can be done at either the source code or binary stage or hooked into the runtime in the case of Java and Python. TAU instrumentation includes timers, counters and other specialized measurement.
  11. Why should I instrument my code?

    Instrumentation is the most portable way to observe the behavior of an application. In the case of C/C++ and Fortran, TAU timers and counters are simple C function calls, and are linked in just like any other library. Instrumentation is not the only method of measurement, and sometimes it is not advised (see below).
  12. How should I instrument my code?

    There are two methods for source based instrumentation. The first and recommended method is to use PDT. PDT will parse the source code, find function (and optionally outer loop) boundaries, insert TAU timers, and pass the instrumented code to the regular compiler. The second method is to use compiler based instrumentation. Compiler based instrumentation is just what it sounds like - the compiler actually inserts the TAU calls during the compilation process. We strongly recommend using PDT whenever possible, as the overhead associated with compiler-based instrumentation is typically much higher.
    Regardless of the method, once TAU is configured and built you can use TAU like any other compiler:
    tau_cc.sh -O3 -g -c test.c -o test
  13. How do I configure and use PDT?

    PDT is available for download from http://tau.uoregon.edu/pdt. Follow the instructions on the "Downloads" page, or download it directly using wget from a terminal:
    wget http://tau.uoregon.edu/pdt.tgz
    ...expand the tarball...
    tar -xzf pdt.tgz
    ...and change to the PDT directory. The release number will vary.
    cd pdtoolkit-[major_version].[minor_version]
    Configure and build PDT (optionally provide an installation prefix, otherwise the installation is in-place). The following example will build PDT with GCC and install it in /path/to/pdt/installation:
    ./configure -GNU -prefix=/path/to/pdt/installation
    make install
    Then, when configuring TAU, use the -pdt option:
    cd tau-[major_version].[minor_version].[point_version]
    ./configure -pdt=/path/to/pdt/installation
    make install
    ...for a full list of configuration options, do this:
    ./configure -help
  14. Why SHOULDN'T I instrument my code?

    The primary reason to not instrument your source code is that you don't have access to the source code. In that case, you could try binary instrumentation. The primary reason to not instrument your code AT ALL is if you have many lightweight functions that will be called millions of times and introduce too much overhead. A common example is a C++ program with many getter and setter functions. In those cases you should try sampling.
  15. What about binary instrumentation? Does TAU support that?

    Yes, TAU supports binary instrumentation with either MAQAO, DynInstAPI or PEBiL. The choice of instrumentor is configuration-dependent. After compiling your program normally, use the tau_rewrite program to instrument your program:
    tau_rewrite ./myprogram

  16. Sampling Questions

  17. What is sampling?

    Sampling is when the program is periodically interrupted, and the running state of the program is examined. The samples are aggregated and a histogram of where the program spends its time is built. Statistical theory states that more samples are taken in the functions which are executed more often and/or for longer durations.
  18. Why should I sample my code?

    The first reason to sample your program is that you don't have access to the source code. The second reason to sample your code is if you have many lightweight functions that will be called millions of times and introduce too much overhead if instrumented. A common example is a C++ program with many getter and setter functions. In those cases you should try sampling.
  19. How should I sample my code?

    The way to sample a program which is not instrumented by TAU is to run it with the tau_exec program. For example, a program with MPI is executed with:
    tau_exec -T mpi -ebs ./myprogram
    For another example, a program with just OpenMP or pthread concurrency is executed with:
    tau_exec -T serial,openmp -ebs ./myprogram
    tau_exec -T serial,pthread -ebs ./myprogram
    ...where the "serial" configuration tells TAU to not use the MPI configuration. As a final example, a program with both MPI and OpenMP is executed this way:
    tau_exec -T mpi,openmp -ebs ./myprogram
    ...or (because MPI is the default):
    tau_exec -T openmp -ebs ./myprogram
    For more information, run tau_exec with no parameters to get help.
  20. Can I instrument and sample my code? How would I do that?

    Yes. Essentially, you would either manually or automatically instrument your program, and then set the TAU_SAMPLING environment variable before executing the program:
    export TAU_SAMPLING=1 #bash
    setenv TAU_SAMPLING 1 #csh
    ./myprogram

  21. Multiprocess Questions

  22. Which MPI implementations are supported?

    All of them, for the most part. In particular:
    • MPICH
    • MVAPICH
    • LAM/MPI
    • Open MPI
    • IBM MPI
    • Cray MPI
    • HP MPI
    • ...
    TAU uses the PMPI profiling interface to measure all MPI calls. Because PMPI is part of the MPI standard, any standards-compliant implementation should be supported.
  23. How do I build TAU for MPI?

    First, make sure MPI (mpicc, mpiCC, mpif77, etc.) is in your path. Then, configure TAU with:
    ./configure -mpi
    If your MPI installation has predictable locations for include and lib directories, the configure process should find them. For example, if your mpicc compiler is in /home/user/mpi/bin, configure will use /home/user/mpi/include and /home/user/mpi/lib. If that is NOT the case, you also need to tell TAU the path to those directories:
    ./configure -mpi -mpiinc=/some/path/to/include -mpilib=/some/other/path/to/lib
    If your MPI implementation has an unusual library name(s) or additional library dependencies, you can also specify the library name(s):
    ./configure -mpi -mpiinc=/some/path/to/include -mpilib=/some/other/path/to/lib  \
    -mpilibrary="-lmy_mpi -lmy_mpi2 -L/path/to/dependency/lib -lmpi_dependency"

  24. Multithread Questions

  25. How do I build TAU for OpenMP?

    Building TAU for OpenMP is straightforward:
    ./configure -openmp
    For the best results, if you are using GCC, Open64 or OpenUH we recommend you configure with the following options (it allows for the most data collection flexibility):
    ./configure -openmp -bfd=download -unwind=download
    If you are using Intel compilers, you should configure like this (OMPT is an interface for OpenMP runtime introspection. For more information, see http://openmp.org/mp-documents/ompt-tr2.pdf or http://link.springer.com/chapter/10.1007%2F978-3-642-40698-0_13):
    ./configure -openmp -bfd=download -unwind=download -ompt=download
    If you are using some other compiler vendor and you want to use OPARI to instrument OpenMP regions, you should configure with -opari (see above for information on configuring PDT):
    ./configure -openmp -opari -pdt=/path/to/pdt/installation
  26. How do I build TAU for pthreads?

    Building TAU for pthreads is straightforward:
    ./configure -pthread

  27. Runtime Questions

  28. What are the environment variables for controlling TAU at runtime?

    There are several. A complete list is here. Common variables include:
    TAU_PROFILE
    Set to 1 to have TAU profile your code
    TAU_TRACE
    Set to 1 to have TAU trace your code
    TAU_METRICS
    Colon delimited list of TAU/PAPI metrics to profile
    PROFILEDIR
    Selectively measure groups of routines and statements. Use with -profile command line option. See Chapter 2, Profiling
    TAU_CALLPATH
    When set to 1 TAU will generate call-path data. Use with TAU_CALLPATH_DEPTH.
    TAU_CALLPATH_DEPTH
    Sets the depth of the callpath profiling. Use with TAU_CALLPATH environment variable.
    TAU_TRACK_MESSAGE
    Track MPI message statistics (profiling), messages lines (tracing).
    TAU_COMM_MATRIX
    Generate MPI communication matrix data.
    TAU_THROTTLE
    Enables the runtime throttling of events that are lightweight. See Section 1.3, “Selectively Profiling an Application”
    TAU_THROTTLE_NUMCALLS
    Set the maximum number of calls that will be profiled for any function when TAU_THROTTLE is enabled. See Section 1.3, “Selectively Profiling an Application”
    TAU_THROTTLE_PERCALL
    Set the minimum inclusive time (in milliseconds) a function has to have to be instrumented when TAU_THROTTLE is enabled. See Section 1.3, “Selectively Profiling an Application”
    TAU_TRACEFILE
    Specifies the name of Vampir trace file. Use with -TRACE TAU configuration option. See Section 3.1, “Generating Event Traces”
    TRACEDIR
    Specifies the directory where trace file are to be stored. See Section 3.1, “Generating Event Traces”
    TAU_VERBOSE
    When set TAU will print out information about the its configuration when running a instrumented application.
    TAU_PROFILE_FORMAT
    When set to snapshot TAU will generate condensed snapshot profiles (they merge together different metrics so there is only one file per node.) Instead of the default kind. When set to merged, TAU will pre-compute mean and std. dev. at the end of execution.
    TAU_SYNCHRONIZE_CLOCKS
    When set TAU will correct for any time discrepancies between nodes because of their CPU clock lag. This should produce more reliable trace data.
    TAU_SAMPLING
    Default value is 0 (off). When TAU_SAMPLING is set, we collect additional profile or trace information (depending on whether TAU_PROFILE or TAU_TRACE is set respectively) via periodic sampling at runtime. Metrics collected and sampling period is controlled by TAU_EBS_SOURCE and TAU_EBS_PERIOD variables respectively. The TAU_EBS_UNWIND variable determines if callstack unwinding is enabled at each sample.
    TAU_EBS_PERIOD
    Default value is 1,000. This variable sets the period between samples. The semantics of this value is discussed in the section above on TAU_EBS_SOURCE.
    TAU_SUMMARY
    Set this variables to 1 to generate just min/max/stddev/mean statistics instead of per-node data. Use paraprof -dumpsummary and then pprof -f profile.Max/Min to see the data.
    TAU_CUPTI_API
    Default: runtime, options: runtime,driver,both. Controls which layer of CUDA is tracked within the CUPTI measurement system. See for example: tau_exec -T serial,cupti -cupti ./matmult. Option should be set basied on which layer the CUDA program uses—runtime when the program uses the CUDA runtime API, driver when the program uses the driver API. NOTE: Both the PGI accelerator and the HMPP compilers use the driver API.
  29. How do I collect a profile?

    Profiling is the default mode of TAU - whether instrumenting or sampling, profiles will be collected by default.
  30. How do I collect a trace?

    To collect a trace, set the appropriate environment variable before executing your program:
    export TAU_TRACE=1 #bash
    setenv TAU_TRACE 1 #csh

  31. Other Questions

  32. I can't use -bfd=download (no network access). How can I configure binutils for TAU?

    Here are the steps to build binutils 2.23.2 for TAU ($installation_dir is the location where you want to install binutils):
    Get the tarball
    wget http://www.cs.uoregon.edu/research/paracomp/tau/tauprofile/dist/binutils-2.23.2.tar.gz
    Expand the tarball
    tar -xvzf binutils-2.23.2.tar.gz
    change to the source directory
    cd binutils-2.23.2
    configure
    ./configure CFLAGS=-fPIC CXXFLAGS=-fPIC —prefix=$installation_dir --disable-nls --disable-werror
    make
    make
    make install
    make install
    copy additional resources required by TAU to the installation directory
    cp bfd/*.h $installation_dir/include/.
    cp -r include/* $installation_dir/include/.
    cp libiberty/libiberty.a $installation_dir/lib
    cp libopcodes/libopcodes.a $installation_dir/lib
    edit the bfd.h header to not require config.h
    sed -e 's/#if !defined PACKAGE && !defined PACKAGE_VERSION/#if 0/' $installation_dir/include/bfd.h > /tmp/bfd.h
    mv /tmp/bfd.h $installation_dir/include
    done!