insituterminology | Phase1F / Phase1FExploringAxes

Hi Everyone,

The focus for Thursday's call will be exploring our 6 axes in more depth. (As well as discussing Joseph Cottam's proposal). The basic summary of each axis can be found here: http://ix.cs.uoregon.edu/~hank/axes.pdf

The challenge will be to flesh out the individual axes. If you cannot make the call ahead of time, you are encouraged to enter your comments in the appropriate sections below:

Integration Type

(SZ) I can enumerate several discrete integration types, based on Hank's axis summary. The left terms are an approximation of the terms from the axis summary. The right terms are a different categorization:
- "Direct Internal" or "Dedicated Internal": Custom viz routines embedded into the simulation.
- "Direct External" or "Dedicated External API": An external library, module, template header, etc. dedicated to in situ viz (Catalyst, Libsim).
- "Indirect External" or "Multipurpose External API": An external library, module, middleware framework, etc. where the component could be using the simulation data for purposes in addition to in situ viz (ADIOS, GLEAN).
- "Indirect Intercept" or "External Interception": interposition where functions already used in the simulation are replaced by functions that do in situ viz without the knowledge of the simulation.
- "Indirect Inspection" or "External Inspection": obtaining the data from the simulation's memory space without its knowledge, e.g., in the same manner as a debugger. This approach is why I questioned the use of the direct/indirect terminology (and proposed an alternative) because one can say that it "directly" obtains the data from the simulation memory.

Proximity

Access

Shallow vs deep copies subcategories of shared access

Synchronization

Time-sharing vs space-sharing?

Operation Controls

Output Type

better characterization needed ... more than explorable vs non-explorable
(HT) I think output is important because
- one of the main motivations for "going in situ" is to save I/O bandwidth. The output axis defines in large parts how much saving there can be, i.e. it is key to quantify the return on investment for integrating in situ vis,
- of symmetry: we have (at least) two axes touching the input side (access and integration). Maybe the discussion should explicitly make this link.
- the desired output type defines the set of algorithms which generate said output. Those algorithms (or classes thereof) typically come with their own constraints and challenges, specifically in terms of parallelization/scalability. In terms of general-purpose systems featuring in situ capabilities - loads of different algorithms; have to cover most - this aspect might not be as relevant as for custom tailored solutions for specific use cases.
(HT) Explorable vs. non-explorable is indeed a little sketchy differentiation, IMHO, largely due to all sorts of "image-based data exploration". I'd like to propose a continuous dimension where we classify techniques according to their "level of abstraction". Essentially, in situ vis can (still) be modelled as a vis pipeline. Colloquially speaking, my intuition of "abstraction" (there may be a better word for it) here relates to the relative position of algorithms in a (conceptual) vis pipeline; the farther down the pipeline a technique is typically located, the more "abstract" its output. Along theses lines we could avoid the potential contradiction induced by any kind of "explorable images". Assuming that results are written out at some point in the pipeline, the part before the cut would be "concurrent/in situ", the part after it "post hoc". Hence, the point of the break also determines in large parts, which parameters are modifiable after the fact and which are not. All these relate to "output", not necessarily "output type"; so maybe we think about the wording here once more.
(RRS) Someone mentioned on the call the "role of in situ" and I think this axis should speak to that. If we discuss the benefits of performing tasks in situ and look at the extremes such as saving no data to disk there is an obvious risk/reward or cost/benefit scale that may dictate design or describe the benefit of part of another axis like operation controls. I think we should also differentiate when moving an analysis to take place in situ offers improved performance vs. improved capabilities, benefit vs. necessity. Discussing that which is only possible in situ (or will be) is also a nice aand future-looking context for why we are thinking about all of this anyway.
(SZ) I agree with HT that this axis is a bit more continuous than we've expressed so far. I also like the idea of representing it as a pipeline. What has been referred to as "extraction" in some publications is simply a termination of the pipeline at a particular operation, but also with the addition of a "writer" to store the result. However, I would point out one somewhat discrete portion at the end of the pipeline. I think there is some sort of output gathering/reduction that has to happen, either a writer that gathers and produces a result/file or a renderer that reduces and produces an image (which technically is also a file). This final step may be considered discrete from the rest of the visualization pipeline, but within it is becoming somewhat continuous. For example consider this continuum that borrows from Cinema: a static RGB image of typical viewing size, an enormous static RGB image (e.g., 40Kx40K) that facilitates zooming, an enormous static image rendered in floating point to allow future color mapping, a multi-view database of enormous float images for moving/rotating the viewpoint, a multi-view multi-pipeline database of enormous float images that allows swapping in/out objects from the pipelines, etc.
(SZ) Taking the "continuous/pipeline" output type concept even further, I think this is where we represent the concepts of "analysis vs visualization". Analysis could also be represented as a pipeline, even if only conceptually (may not be implemented that way). I'm thinking there's also the same operation + gather/reduction steps. The result is likely different than visualization, such as a small set of statistics values, a database of extracted features, etc.

Feedback on Cottam breakdown

Link to Joseph's breakdown

(RRS): I've been thinking about this in terms one categorization vs. another and like both. I suppose it's all the damn agreeability that's made me confused (which I have been). For me, the problem was the assumption they are directly comparable but the more I think about them the more it seems they are sets of categories with different purposes. I think this process-oriented approach leads to an elegant set of "boxes" that are arguably sufficient to describe an instance of an in situ deployment. The process-oriented approach also asks questions about things like mechanisms and APIs so not all of these boxes strike me as helpful in differentiating between instances of in situ deployments (i.e. there are too many possibilities). And so three points: 1) this highlights the difficulties of creating categorization axes, suggests that we can't do so cleanly, and provides another metric for successfully doing so; 2) if this describes instances, the box with the holes should definitely be rectified somehow; 3) I now really want *each* of us to create this kind of document and generate another paper automatically :)

Participants

Participants in Feb 25 telecon

Log

Hank Childs
(HT) Bernd Hentschel
(SZ) Sean Ziegeler