High Performance Computing (HPC)
Working in the Performance Research Lab at the University of Oregon, I am the primary developer of the Scalable Observation System for Scientific Workflows (SOSflow) targeting current and future-scale HPC architectures.
The goal of this effort is to provide robust software infrastructure for in situ analysis of work flows, intelligent real-time work flow adaptation, and data coallation for comprehensive profiling of performance. SOS is being codesigned as both a general-purpose system and a collection of specific modules to facilitate well-established HPC codes.
- Unified global views with minimal latency and overhead
- Observation and control from both a systems perspective as well as logical work flow perspectives
- Detailed attribution of the root causes underlying observed performance characteristics
- Information collection, archival, and efficient real-time sharing through a flexible publish / subscribe model
- Event detection and event-triggered behaviors
- Intelligent management of data sampling acoss every source: hardware, OS, libraries, applications, etc.
- Many-[layer/component/channel/epoch] instrumentation and perspective-integration
- Meta-analysis of performance across many work flow sessions and software versions
- ...and much more, all designed especially for exascale (10^9 core) systems.
Get In Touch
You are welcome to contact me with any questions or comments!