insituterminology | Phase1D / Phase1DSurveyInput

Overview

This page will eventually be converted into a PDF, which will be emailed to all participants. I will also set up a SurveyMonkey. The first half of this page is background and is for reference for the larger team. The second half of the page is for the questions we will ask.

Your tasks:

Review background (first half). Add comments about readability. Also add comments about unnecessary points or about missing point.
Review questions (second half). Add comments about readability of questions. Also add comments about missing questions.

Background

We continue to have 6 axes:

Access
Proximity
Synchronous/asynchronous
Output type
Operation Controls
Integration Type

1. Access

There are two options:

Direct access: visualization/analysis runs in the same logical address/memory space as the simulation code
Indirect access: visualization/analysis runs in a distinct logical address/memory space from the simulation code

Direct access

Matthieu Dorier has suggested this option be called "shared access", which may be a good choice. (We'll sort that out in a future phase.)
Janine Bennett has introduced the notion of hazards, which is a meaningful concept. That notion may not be a category, but I think it is directly associated with this option. Further, I (Hank) think that the most likely hazard is "WAR" (Write after read). With this hazard, the data you want to read got overwritten by something else before you read it. As in, you want to read simulation data, but the simulation writes over it before you can.
There are also likely options for:
- direct access: read only
- direct access: read+write (for steering use cases)

2. Proximity

This axis was introduced on Feb 5th telecon. It was previously "Location", but location was not a fine enough granularity.

As such, the options for proximity are not very well thought out yet.

Ken Moreland offered a strawman attempt:

Different facilities spread across the world.
Different systems in the same facility (separate computation and vis clusters).
Different nodes in the same system.
Different subsystems in the same node (CPU and GPU).
Dedicated processors in different socket on the same node.
Dedicated cores in the same processor.
Time sharing or dynamic allocation on the same cores.

Valerio commented: same core, same processor, same node, same supercomputer, same city, etc, but also distinctions about where in the memory hierarchy. Needs to encompass all of this.

(Note: this axis will need a lot of work going forward.)

3. Synchronous vs Asynchronous

We believe there is a meaningful distinction between synchronous/asynchronous and blocking/non-blocking. With this axis, we intend to convey "when things are running".

We believe that synchronous execution, i.e., the sim and vis/analysis code tradeoff on executing, does imply "blocking". However, asynchronous execution, i.e., the sim and vis/analysis code are executing concurrently, has the potential for non-blocking behavior, but not a guarantee. For example, the vis/analysis code may take 5 seconds to operate on a time slice, and the simulation produces data every 3 seconds. Depending on the policy for this case, the decision may be to have the simulation block.

4. Output Type

At this point, we want to decide if we need this axis. If so, we can worry more about what the options are.

Strawman:

explorable
- images (e.g., Cinema)
- transformed data (e.g., Lagrangian basis flows, explorable images, topological descriptions)
- compressed data (e.g., wavelets, run length encoding, Peter Lindstrom's compression, Fout & Ma's compression, etc.)
not-explorable
- Images
- Movies
- data analysis (e.g., a synthetic diagnostic)

5. Operation controls

interactive: operations being performed can be changed by the user
batch: operations being performed are fixed at the beginning, i.e,. they cannot be changed by the user

Note: some earlier iterations put in ideas of steering and also synchronous/asynchronous.

6. Integration Type

Direct source code integration
Protocol-based integration (likely through middleware framework, i.e. Conduit, ADIOS, Glean)
Function interposition

Comments on Background

HT - Regading "Output Type", I still think it's a meaningful axis but I would really like to see it labeled in a more or less continuous fashion rather than the partition into explorable/non-explorable. I believe that the type of output supported by an in situ system has significant desing impact. At the same time, I have trouble understanding the binary distinction being made here. For example, Jim Ahrens showed a nice little example at the LDAV'15 panel where he demonstrated how (many) images can be fused into an interactive data exploration; so the case might be made that these images were an explorable result.
KM - Where in these axes is it covered if the visualization unit can control the execution of the simulation? This is necessary for the fairly important use case of visual debugging. Should this be under Access (whether the visualization has access to control the execution)? Should this be under Synchronous vs Asynchronous (whether the visualization can arbitrarily cause the simulation to block)?
KM - Should the Integration Type axis include shared files at the end of its list? This has been used as an interface in the past, and I hear a lot of thoughts of using a burst buffer in essentially this manner.
JC - To support KM's first comment. That could fit on the "Operations Control" axis at the "interactive" level. Debugging can be seen is a an extreme form of controlling what the program is doing.
JC - Supporting HT...I'm also confused as to the distinction between explorable and not.

Questions:

Question 1: Should Access and Proximity be separate axes?

Discussion: there has been considerable discussion so far on whether these two axes are separate. The argument for merging the two axes has been that direct access implies close proximity. The argument against merging is that counter-examples exist ... vis/analysis can run very proximately, but have only indirect access to data. Similarly, vis/analysis can run far away, but have direct access, e.g. via PGAS.

Options:

Keep Access and Proximity separate, i.e., two axes
Merge Access and Proximity to one axis

Question 2: Should Output Type be an axis?

Discussion: the motivation behind removing this axis is that the underlying system is the same regardless of what the vis/analysis is producing. The contingent in favor of keeping this axis agree with that point, but believe that the output type is still a meaningful property of the system.

Options:

Remove this axis
Keep this axis

Question 3: Should we include analysis?

Discussion: visualization and analysis are distinct topics, but we often treat them together. Our group clearly wants to say something about in situ visualization. Should we include analysis in our discussion (likely lumped together with vis)? Or should we focus our message on visualization only?

Options:

Final document should only discuss visualization
Final document should discuss both visualization and analysis

Question 4: Should we stick with "in situ" as the main term?

Discussion: Wes Bethel made this comment: "in situ is not the opposite of post hoc; we are chasing the wrong terminology here. :) Instead, the opposite of post hoc processing is concurrent processing. In situ is one way concurrent processing is carried out (shallow share) vs in transit (deep share, which means data movement). Both may be done synchronously or asynchronously. there is plenty of historical precedent that reinforces this idea. In the 1990s, this kind of work was commonly referred to as concurrent processing or coprocessing (not In Situ processing). See, e.g., http://sda.iu.edu/docs/CoprocSurvey.pdf (Aug 1998), Concurrent Distributed VIsualization and Simulation Steering, Robert Haimes, In Parallel Computational Fluid Dynamics: Implementations and Results Using Parallel Computers, A. Ecer, J. Periaux, N. Satofuka, and S. Taylor (eds), 1995, Elsevier (google for �avs visualization system co-processing�)" Others have pointed out additional terminology, for example Kwan-Liu used "runtime" in some early papers.

Options:

The term "in situ" has too much inertia to reverse course. We should continue to describe the overall idea as "in situ". That said, we should make sure to acknowledge earlier work in the final document, and discuss that the blanket term of "in situ" may be slightly mismatched based on original concepts.
We should reduce the presence of the term "in situ" in the final document and instead focus on a more appropriate term (e.g., concurrent)
Other (please add comments)

Question 5: Do you believe the current 6 axes are sufficient to describe in situ systems?

Options:

Yes
No (please add comments)

Question 6: What are we missing? Are there any important aspects of in situ that have not been represented in the discussion so far?

Yes (please add comments)
No

Question 7: what is your name?

Comments on questions

HT - ad Q3: A question that immediately arises is where we would draw the line between the two? I have to admit that I use a very wide definition of visualization - along the lines of "a histogram printed in a paper is an image, too." That said, I guess from a systems perspective, many of the underlying design questions are similar if not identical regardless of whether you run a visualization or analysis algorithm in situ. We could focus the paper on typical vis problems, yet mention the applicability to / relevance for data analysis whereever appropriate.
(SZ) Q3: I think it helps to think about whether including analysis changes the axes, taxonomy, etc. Perhaps I'm not exercising my imagination enough, but I can't think of a use case where analysis significantly changes how we would describe the system. If I'm right, then including analysis is effectively "free." If I'm missing some major points, and analysis is an entirely different ball of wax in terms of an in situ system, then perhaps we leave it out. It occurs to me that my comment isn't a comment on the question itself, so I'm somewhat missing the point. I do think it's a great question as is, and propose no changes. Yet, I don't recall discussing it before now, so maybe some more comments are in order before it goes to a vote?
KM - Q6 should be rephrased as something that is not a yes/no question (which is already covered in Q5). I suggest wording it in such a way that openly invites participants to beat on what we have. Something like, "What are we missing? What in situ visualization --- past, present, or future --- is not characterized well?"
JC - Given questions about the details on some of the axes, I think it should be made clear that keeping an axis does not necessarily mean keeping the current sub-details. "Proximity", "Output Type" and "Operations Control" all have active issues the sub-values are clearly not yet settled.
FS - I agree with HT regarding Q3. It's very difficult to draw a line between what is considered visualization or analysis. Perhaps the question could be expanded/adjusted to ask for examples of typical vis/analysis problems people want to see included (and whether they consider it vis, analysis, or both). Or perhaps (as SZ suggests) some more discussion is needed to come to a better understanding of what we mean when we say vis vs. analysis before people vote.

Log

Hank Childs
Bernd Hentschel (HT)
Berk Geveci
Sean Ziegeler (SZ)
Ken Moreland (KM)
Joseph Cottam (JC)
Franz Sauer (FS)
Andy Bauer (AB)
Tom Peterka (TP)