Publications: |
-
L. Li, J. P. Kenny, M. Wu , K. Huck, A. Gaenko, M. S. Gordon , C. L. Janssen, L. Curfman McInnes, H. Mori, H. M. Netzloff, B. Norris, and T. L. Windus
Adaptive Application Composition in Quantum Chemistry.
The 5th International Conference on the Quality of Software Architectures (QoSA 2009), East Stroudsburg University, Pennsylvania, USA, June 22-26, 2009
Abstract:
Component interfaces, as advanced by the Common Component Architecture
(CCA), enable easy access to complex software packages for
high-performance scientific computing. A recent focus has been
incorporating support for computational quality of service (CQoS), or
the automatic composition, substitution, and dynamic reconfiguration of
component applications. Several leading quantum chemistry packages
have achieved interoperability by adopting CCA components. Running
these computations on diverse computing platforms requires selection
among many algorithmic and hardware configuration parameters; typical
educated guesses or trial and error can result in unexpectedly low
performance. Motivated by the need for faster runtimes and increased
productivity for chemists, we present a flexible CQoS approach for
quantum chemistry that uses a generic CQoS database component to create
a training database with timing results and metadata for a range of
calculations. The database then interacts with a chemistry CQoS
component and other infrastructure to facilitate adaptive application
composition for new calculations.
-
Kevin A. Huck, Oscar Hernandez, Van Bui, Sunita Chandrasekaran, Barbara Chapman, Allen D. Malony, Lois Curfman McInnes, and Boyana Norris
Capturing Performance Knowledge for Automated Analysis.
SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, 2008
Abstract:
Automating the process of parallel performance experimentation, analysis, and
problem diagnosis can enhance environments for performance-directed application
development, compilation, and execution. This is especially true when
parametric studies, modeling, and optimization strategies require large amounts
of data to be collected and processed for knowledge synthesis and reuse. This
paper describes the integration of the PerfExplorer performance data mining
framework with the OpenUH compiler infrastructure. OpenUH provides
auto-instrumentation of source code for performance experimentation and
PerfExplorer provides automated and reusable analysis of the performance
data through a scripting interface. More importantly, PerfExplorer inference
rules have been developed to recognize and diagnose performance characteristics
important for optimization strategies and modeling. Three case studies are
presented which show our success with automation in OpenMP and MPI code tuning,
parametric characterization, and power modeling. The paper discusses how the
integration supports performance knowledge engineering across applications and
feedback-based compiler optimization in general.
-
Allen D. Malony, Sameer Shende, Alan Morris, Scott Biersdorff, Wyatt Spear, Kevin A. Huck, and Aroon Nataraj
Evolution of a Parallel Performance System.
2nd International Workshop on Tools for High Performance Computing, 2008
Abstract:
The TAU Performance System(R) is an integrated suite of tools for instrumentation
measurement, and analysis of parallel programs targeting large-scale,
high-performance computing (HPC) platforms. Representing over fifteen
calendar years and fifty person years of research and development effort,
TAU's driving concerns have been portability, flexibility, interoperability,
and scalability. The result is a performance system which has evolved into a
leading framework for parallel performance evaluation and problem solving. This
paper presents the current state of TAU, overviews the design and function of
TAU's main features, discusses best practices of TAU use, and outlines future development.
-
Kevin A. Huck, Wyatt Spear, Allen D. Malony, Sameer Shende, and Alan Morris
Parametric Studies in Eclipse with TAU and PerfExplorer.
Proceedings of Workshop on Productivity and Performance (PROPER 2008) at EuroPar 2008, (Las Palmas de Gran Canaria, Spain), 2008.
Abstract:
With support for C/C++, Fortran, MPI, OpenMP, and performance tools, the
Eclipse integrated development environment (IDE) is a serious contender as
a programming environment for parallel applications. There is interest in
adding capabilities in Eclipse for conducting workflows where an
application is executed under different scenarios and its outputs are
processed. For instance, parametric studies are a requirement in many
benchmarking and performance tuning efforts, yet there was no experiment
management support available for the Eclipse IDE. In this paper, we
describe an extension of the Parallel Tools Platform (PTP) plugin for the
Eclipse IDE. The extension provides a graphical user interface for
selecting experiment parameters, launches build and run jobs, manages the
performance data, and launches an analysis application to process the data.
We describe our implementation, and discuss three experiment examples which
demonstrate the experiment management support.
-
Van Bui, Boyana Norris, Kevin Huck, Lois Curfman McInnes, Li Li, Oscar Hernandez, and Barbara Chapman
A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications.
Component-Based High Performance Computing (CBHPC 2008), 2008
Abstract:
Characterizing the performance of scientific applications is essential for effective code optimization, both by compilers and by high-level adaptive numerical algorithms. While maximizing power efficiency is becoming increasingly important in current high-performance architectures, little or no hardware or software support exists for detailed power measurements. Hardware counter-based power models are a promising method for guiding software-based techniques for reducing power. We present a component-based infrastructure for performance and power modeling of parallel scientific applications. The power model leverages on-chip performance hardware counters and is designed to model power consumption for modern multiprocessor and multicore systems. Our tool infrastructure includes application components as well as performance and power measurement and analysis components. We collect performance data using the TAU performance component and apply the power model in the performance and power analysis of a PETSc-based parallel fluid dynamics application by using the PerfExplorer component.
-
Kevin A. Huck, Allen D. Malony, Sameer Shende and Alan Morris
Knowledge Support and Automation for Performance Analysis with PerfExplorer 2.0.
Large-Scale Programming Tools and Environments, special issue of Scientific Programming. (to appear, email for copies)
Abstract:
The integration of scalable performance analysis in parallel development tools
is difficult. The potential size of data sets and the need to compare results
from multiple experiments presents a challenge to manage and process the
information. Simply to characterize the performance of parallel applications
running on potentially hundreds of thousands of processor cores requires new
scalable analysis techniques. Furthermore, many exploratory analysis processes
are repeatable and could be automated, but are now implemented as manual
procedures. In this paper, we will discuss the current version of
PerfExplorer, a performance analysis framework which provides dimension
reduction, clustering and correlation analysis of individual trails of large
dimensions, and can perform relative performance analysis between multiple
application executions. PerfExplorer analysis processes can be captured in the
form of Python scripts, automating what would otherwise be time-consuming
tasks. We will give examples of large-scale analysis results, and discuss the
future development of the framework, including the encoding and processing of
expert performance rules, and the increasing use of performance metadata.
-
Kevin A. Huck, Allen D. Malony, Sameer Shende and Alan Morris
Scalable, Automated Performance Analysis with TAU and PerfExplorer.
Proceedings of Parallel Computing 2007, Aachen, Germany, 2007.
Abstract:
Scalable performance analysis is a challenge for parallel development tools.
The potential size of data sets and the need to compare results from multiple
experiments presents a challenge to manage and process the information, and to
characterize the performance of parallel applications running on potentially
hundreds of thousands of processor cores. In addition, many exploratory
analysis processes represent potentially repeatable processes which can and
should be automated.
In this paper, we will discuss the current version of PerfExplorer, a
performance analysis framework which provides dimension reduction, clustering
and correlation analysis of individual trails of large dimensions, and can
perform relative performance analysis between multiple application executions.
PerfExplorer analysis processes can be captured in the form of Python scripts,
automating what would otherwise be time-consuming tasks. We will give examples
of large-scale analysis results, and discuss the future development of the
framework, including the encoding and processing of expert performance rules,
and the increasing use of performance metadata.
-
D. Gunter, K. Huck, K. Karavanic, J. May, A. Malony, K. Mohror, S. Moore, A. Morris, S. Shende, V. Taylor, X. Wu, and Y. Zhang.
Performance database technology for SciDAC applications.
Journal of Physics: Conference Series, Vol. 78, 24--28 June 2007, Boston Massachusetts, USA.
Abstract:
As part of the Performance Engineering Research Institute (PERI) effort, the
Performance Database Working Group, which involves PERI researchers as well as
outside researchers at the University of Oregon, Portland State University, and Texas
A&M University, has developed technology for storing performance data collected by a
number of performance measurement and analysis tools, including TAU, PerfTrack,
Prophesy, and SvPablo. In addition to the performance data, metadata capturing the
experimental setup and conditions (e.g., source code version; input data; platform,
compiler, library, and operating system versions and configurations; runtime
environment) are exported to a common metadata schema, along with some basic
performance information. The exported information can be viewed from a common web
interface, and a link or contact information is provided for accessing the original
performance data in its home database. Analysis tools provided by the individual
databases support tasks such as parallel profile browsing and analysis, cross-experiment
analysis, and scalability studies. Performance data are currently being collected and
analyzed for the GTC and MILC SciDAC applications. The tools are being installed on
machines used by SciDAC researchers so that they can easily collect data and upload it to
an associated performance database. Work on a deeper level of interoperability that will
allow exchange of actual performance data between databases is underway.
-
Y. Zhang, R. Fowler, K. Huck, A. Malony, A. Porterfield, D. Reed, S. Shende, V. Taylor, and X. Wu..
US QCD Computational Performance Studies with PERI.
Journal of Physics: Conference Series, Vol. 78, 24--28 June 2007, Boston Massachusetts, USA.
Abstract:
We report on some of the interactions between two SciDAC projects: The National Computational Infrastructure for Lattice Gauge Theory (USQCD), and the Performance Envineering Research Institute (PERI). Many modern scientific programs consistently report the need for faster computational resources to maintain global competitiveness. However, as the size and complexity of emerging high end computing (HEC) systems continue to rise, achieving good performance on such systems is becoming ever more challenging. In order to take full advantage of the resources, it is crucial to understand the characteristics of relevant scientific applications and the systems these applications are running on. Using tools developed under PERI and by other performance measurement researchers,, we studied the performance of two applications, MILC and Chroma, on several high performance computing systems at DOE laboratories. In the case of Chroma, we discuss how the use of C++ and modern software engineering and programming methods are driving the evolution of performance tools.
-
Kevin A. Huck, Allen D. Malony, Sameer Shende and Alan
Morris.
TAUg: Runtime Global Performance Data Access Using MPI.
EuroPVM/MPI, pp. 313-321, Bonn, Germany, 2006.
Abstract:
To enable a scalable parallel application to view its global performance state,
we designed and developed TAUg, a portable runtime framework layered on the TAU
parallel performance system. TAUg leverages the MPI library to communicate
between application processes, creating an abstraction of a global performance
space from which profile views can be retrieved. We describe the TAUg design
and implementation and show its use on two test benchmarks up to 512
processors. Overhead evaluation for the use of TAUg is included in our
analysis. Future directions for improvement are discussed.
-
Li Li, Allen D. Malony and Kevin Huck.
Model-Based Relative Performance Diagnosis of Wavefront Parallel
Computations.
Euro-Par 2006 Parallel Processing Conference September 2006 (LNCS 4128). Pages 35-46.
Abstract:
Parallel performance diagnosis can be improved with the use of performance
knowledge about parallel computation models. The Hercule diagnosis system
applies model-based methods to automate performance diagnosis processes and
explain performance problems from high-level computation semantics. However,
Hercule is limited by a single experiment view. Here we introduce the concept
of relative performance diagnosis and show how it can be integrated in a
model-based diagnosis framework. The paper demonstrates the effectiveness of
Hercule's approach to relative diagnosis of the well-known Sweep3D application
based on a Wavefront model. Relative diagnoses of Sweep3D performance anomalies
in strong and weak scaling cases are given.
-
Kevin Huck and Allen D. Malony.
PerfExplorer:
A Performance Data Mining
Framework For Large-Scale Parallel Computing.
SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005. Seattle, Washington, USA.
Abstract:
Parallel applications running on high-end computer systems manifest a
complexity of performance phenomena. Tools to observe parallel performance
attempt to capture these phenomena in measurement datasets rich with
information relating multiple performance metrics to execution dynamics and
parameters specific to the application-system experiment. However, the
potential size of datasets and the need to assimilate results from multiple
experiments makes it a daunting challenge to not only process the information,
but discover and understand performance insights. In this paper, we present
PerfExplorer, a framework for parallel performance data mining and knowledge
discovery. The framework architecture enables the development and integration
of data mining operations that will be applied to large-scale parallel
performance profiles. PerfExplorer operates as a client-server system and is
built on a robust parallel performance database (PerfDMF) to access the
parallel profiles and save its analysis results. Examples are given
demonstrating these techniques for performance analysis of ASCI applications.
-
Karen L. Karavanic, John May, Kathryn Mohror, Brian Miller, Kevin Huck, Rashawn
Knapp, Brian Pugh.
Integrating Database Technology with Comparison-based Parallel Performance
Diagnosis: The PerfTrack Performance Experiment Management Tool.
SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005. Seattle, Washington, USA.
Abstract:
PerfTrack is a data store and interface for managing performance data from
large-scale parallel applications. Data collected in different locations and
formats can be compared and viewed in a single performance analysis session.
The underlying data store used in PerfTrack is implemented with a database
management system (DBMS). PerfTrack includes interfaces to the data store and
scripts for automatically collecting data describing each experiment, such as
build and platform details. We have implemented a prototype of PerfTrack that
can use Oracle or PostgreSQL for the data store. We demonstrate the prototype's
functionality with three case studies: one is a comparative study of an ASC
purple benchmark on high-end Linux and AIX platforms; the second is a parameter
study conducted at Lawrence Livermore National Laboratory (LLNL) on two high
end platforms, a 128 node cluster of IBM Power 4 processors and BlueGene/L; the
third demonstrates incorporating performance data from the Paradyn Parallel
Performance Tool into an existing PerfTrack data store.
-
P Worley, J Candy, L Carrington, K Huck, T Kaiser, G Mahinthakumar, A Malony, S
Moore, D Reed, P Roth, H Shan, S Shende, A Snavely, S Sreepathi, F Wolf, Y
Zhang
Performance
Analysis of GYRO: a tool evaluation.
Journal of Physics: Conference Series, vol. 16, pp. 551-555, 2005.
Abstract:
The performance of the Eulerian gyrokinetic-Maxwell solver code GYRO is
analyzed on five high performance computing systems. First, a manual approach
is taken, using custom scripts to analyze the output of embedded wallclock
timers, floating point operation counts collected using hardware performance
counters, and traces of user and communication events collected using the
profiling interface to Message Passing Interface (MPI) libraries. Parts of the
analysis are then repeated or extended using a number of sophisticated
performance analysis tools: IPM, KOJAK, SvPablo, TAU, and the PMaC modeling
tool suite. The paper briefly discusses what has been discovered via this
manual analysis process, what performance analyses are inconvenient or
infeasible to attempt manually, and to what extent the tools show promise in
accelerating or significantly extending the manual performance analyses.
-
Kevin Huck, Allen D. Malony, Robert Bell and Alan Morris.
Design and
Implementation of a Parallel Performance Data Management Framework.
(Winner: The Chuan-lin Wu Best Paper Award),
Proceedings of the 2005 International Conference on Parallel Processing.
June 14-17, 2005. Oslo, Norway.
Abstract:
Empirical performance evaluation of parallel systems and applications can
generate significant amounts of performance data and analysis results from
multiple experiments as performance is investigated and problems diagnosed.
Hence, the management of performance information is a core component of
performance analysis tools. To better support tool integration, portability,
and reuse, there is a strong motivation to develop performance data management
technology that can provide a common foundation for performance data storage,
access, merging, and analysis. This paper presents the design and
implementation of the Performance Data Management Framework (PerfDMF). PerfDMF
addresses objectives of performance tool integration, interoperation, and reuse
by providing common data storage, access, and analysis infrastructure for
parallel performance profiles. PerfDMF includes an extensible parallel profile
data schema and relational database schema, a profile query and analysis
programming interface, and an extendible toolkit for profile import/export and
standard analysis. We describe the PerfDMF objectives and architecture, give
detailed explanation of the major components, and show examples of PerfDMF
application.
|
Posters: |
-
Joseph Kenny (Sandia National Laboratories), Kevin Huck (University of Oregon), Li Li (Argonne National Laboratory), Lois Curfman McInnes (Argonne National Laboratory), Heather Netzloff (Ames Laboratory), Boyana Norris (Argonne National Laboratory), Meng-Shiou Wu (Ames Laboratory)
Computational Quality of Service in Quantum Chemistry.
Poster, SC'08. November, 2008.
Abstract:
Component interfaces, as advanced by the Common Component Architecture (CCA) Forum, enable easy access to software packages. A recent focus of the CCA Forum has been adding support for computational quality of service (CQoS): automatic composition, substitution and dynamic reconfiguration. Several quantum chemistry developers (GAMESS, MPQC and NWChem) have adopted CCA components, creating shared capabilities and infrastructure. These computations require many algorithmic and hardware configuration options, including the configuration of processing elements (nodes, processors/sockets and cores); typical educated guesses or trial and error result in erratic performance and efficiency. This situation is driving the development of a flexible CQoS approach for quantum chemistry applications. Our approach uses a general CQoS database component to create a training database containing timing results and metadata for a range of calculations. Once this database is populated, the chemistry CQoS component uses general CQoS infrastructure analysis capabilities to provide appropriate configuration for a new calculation.
-
D. Gunter, K. Huck, K. Karavanic, J. May, A. Malony, K. Mohror, S. Moore, A. Morris, S. Shende, V. Taylor, X. Wu, and Y. Zhang.
Performance Database Technology for SciDAC Applications.
Poster, SciDAC. June, 2007.
Abstract:
As part of the Performance Engineering Research Institute (PERI) effort, the Performance Database Working Group, which involves PERI researchers as well as outside researchers at the University of Oregon, Portland State University, and Texas A&M University, has developed technology for storing performance data collected by a number of performance measurement and analysis tools, including TAU, PerfTrack, Prophesy, and SvPablo. In addition to the performance data, metadata capturing the experimental setup and conditions (e.g., source code version; input data; platform, compiler, library, and operating system versions and configurations; runtime environment) are exported to a common metadata schema, along with some basic performance information. The exported information can be viewed from a common web interface, and a link or contact information is provided for accessing the original performance data in its home database. Analysis tools provided by the individual databases support tasks such as parallel profile browsing and analysis, cross-experiment analysis, and scalability studies. Performance data are currently being collected and analyzed for the GTC and MILC SciDAC applications. The tools are being installed on machines used by SciDAC researchers so that they can easily collect data and upload it to an associated performance database.
-
R. Fowler, Y. Zhang, A. Porterfield, D. Reed, J. Mellor-Crummey, N. Tallent, K. Huck, A. Malony, S. Shende, V. Taylor, and X. Wu.
PERI and USQCD Computational Performance Studies.
Poster, SciDAC. June, 2007.
Abstract:
USQCD encompasses a SciDAC collaboration of US scientists developing and using large-scale computers for calculations in lattice quantum chromodynamics. Software Emphasis: improved scientific productivity through modular, reusable, cross-platform, high-performance libraries. PERI is a SciDAC Institute focused on delivering petascale performance to complex scientific applications running on Leadership Class computing systems. Emphasis: improved productivity through automation of measurement, analysis, and tuning of HPC applications.
-
Kevin Huck, Kathryn Mohror, John May, Brian Miller, Karen Karavanic.
PerfTrack:
Performance Database & Analysis Tool.
Poster, Lawrence Livermore National Laboratory, UCRL-POST-205871. September, 2004.
Introduction:
Our goal is to create a tool which will help scientific programmers answer
difficult questions about application performance as the source code, build
parameters, runtime environment and hardware vary over time. We are developing
PerfTrack to explore technologies in parallel performance measurement,
modeling, analysis and prediction. We are storing performance data and the
associated environment data in a relational database. This database provides a
foundation to build analysis tools, scalable to large numbers of threads (over
1024) and capable of comparing multiple executions. The tools we develop will
be automated to gather, store and analyze data, in order to encourage their use
in the software development cycle.
|