insituterminology | Phase1E / Phase1ESurveyResultsAndNextSteps

The survey had 46 responses.

As a quick reference, the writeup for the 6 axes is here: http://ix.cs.uoregon.edu/~hank/axes.pdf

Participants in Feb 24 telecon

Question 1

Should Access and Proximity be separate axes?

Discussion: there has been considerable discussion so far on whether these two axes are separate. The argument for merging the two axes has been that direct access implies close proximity. The argument against merging is that counter-examples exist ... vis/analysis can run very proximately, but have only indirect access to data. Similarly, vis/analysis can run far away, but have direct access, e.g. via PGAS.

Results:

Yes: 40
No: 6

Question 2

Should Output Type be an axis?

Discussion: the motivation behind removing this axis is that the underlying system is the same regardless of what the vis/analysis is producing. The contingent in favor of keeping this axis agree with that point, but believe that the output type is still a meaningful property of the system.

Results:

Yes: 33
No: 12
Skipped: 1

Question 3

Should we include analysis?

Discussion: visualization and analysis are distinct topics, but we often treat them together. Our group clearly wants to say something about in situ visualization. Should we include analysis in our discussion (likely lumped together with vis)? Or should we focus our message on visualization only? Note: some discussion about this point focused on the blurred line between vis & analysis. The goal of this question is whether we state we are talking about in situ infrastructure for "vis & analysis" or whether we don't mention analysis. (I.e., if we choose to mention analysis, we won't distinguish between the two.)

Results:

Final document should only discuss visualization: 5
Final document should discuss both visualization and analysis: 41

Question 4

Should we stick with "in situ" as the main term?

Discussion: Wes Bethel made this comment: "in situ is not the opposite of post hoc; we are chasing the wrong terminology here. :) Instead, the opposite of post hoc processing is concurrent processing. In situ is one way concurrent processing is carried out (shallow share) vs in transit (deep share, which means data movement). Both may be done synchronously or asynchronously. there is plenty of historical precedent that reinforces this idea. In the 1990s, this kind of work was commonly referred to as concurrent processing or coprocessing (not In Situ processing)." Others have pointed out additional terminology, for example Kwan-Liu used "runtime" in some early papers.

Results:

29 votes: The term "in situ" has too much inertia to reverse course. We should continue to describe the overall idea as "in situ". That said, we should make sure to acknowledge earlier work in the final document, and discuss that the blanket term of "in situ" may be slightly mismatched based on original concepts
12 votes: We should reduce the presence of the term "in situ" in the final document and instead focus on a more appropriate term (e.g., concurrent)
5 votes: other (see subsection below for comments). Note that 2 of the 5 were essentially stating that in situ has too much inertia.

Question 5

Do you believe the current 6 axes are sufficient to describe in situ systems?

Results:

Yes: 41
No: 5

Question 6

What are we missing? Are there any important aspects of in situ that have not been represented in the discussion so far?

Results:

No: 29
Yes: 17

Next steps

Here are some topics for the telecon:

Discussing survey results
Digging into categories, especially proximity
Joseph Cottam proposed an alternative way to categorize techniques.

Comments in response to questions

(Hank is not expecting people read this for the telecons)

Comments on question 4

Reminder of the question: Should we stick with "in situ" as the main term?

I'm of two minds. I think 'in situ' has a lot of momentum and that it is likely too late to reverse that course, at least in one blow. One contribution that an EGPGV paper can make (and I think we should aim for it) is to make this duality explicit: 'in situ' narrowly conceived means running vis (and analysis) tightly coupled with a simulation in the same memory space (contrast with 'co-processing' which means concurrent but loosely coupled in different memory spaces). 'in situ' broadly conceived is an abstract handle which means 'not post-hoc'. It's nearing 'big data' levels of abstraction, but it's something that funding agencies and tangent communities have latched onto. For example, it was used in the forward-looking goals of the NSF SI^2 program officer talk, and I've seen it in other NSF and DOE contexts where it almost certainly was used in the 'broadly conceived' sense.
As maybe a slight variation of with Wes': There appear to be four related terms, synchronous, asynchronous, concurrent, post-hoc. The shallow- vs deep-share is taken care of by the proximity axis. To me a-/synchronous are two sub-cases of concurrent. If we want all four terms to survive we could go concurrent / post-hoc with concurrent split further. Especially for a taxonomy this hierarchical split feels a bit messy. Instead I think, one could make a good argument for post-hoc being simply an extreme case of asynchronous. Especially, since deep shares might well involve disk like resources such as NVRAM the extension to "true" disk is not that far. So functionally I think a-/synchronous are most descriptive to what I think about when describing a system
I think in-situ is the term that those outside the community are familiar with. If we abandon this term, we will spend a whole lot of time and effort re-branding our efforts. Not worth it.
I prefer to think of "in situ" as a subset/category of concurrent (or runtime, or perhaps simulation time) vis/analysis. And while I think one of these other terms would be more appropriate, I suspect that I may be in the minority. It would also require us to come to consensus on what that other term would be. And, unfortunately, I think that there likely is too much inertia behind the use of "in situ" in the broader sense, and it may be too late to change course now.
If we are discussing concurrent processing we should say that. Wes is correct in his comment. We should be precise in what we are discussing and be sure to make the distinction that in-situ is a method to perform concurrent processing and not an all encompassing term.

Comments on question 5

Reminder of the question: Do you believe the current 6 axes are sufficient to describe in situ systems?

Comment on 'synchronous vs. asynchronous axis': "Time-shared" vs. "space-shared" could be better categorization. Time-shared sim-vis is always synchronous, i.e. simulation stalls when visualization is done. In the case of space-shared, they may run synchronously or asynchronously or partly both. For e.g. consider sim on 10 nodes and viz on a different set of 5 nodes (same or different location). The simulation process may stall during data transfer from sim to viz nodes (for e.g. due to memory constraints). Hence sim-viz is not completely asynchronous. P.S. Many papers in the literature also refer to this category as tightly-coupled vs. loosely-coupled.
Different in situ processings have different goals (or the kind of constraints to be overcome) which will drive the type of in situ analysis and visualization methods. For example, some applications only need pictures, but other applications may need to extract/preserve geometric or statistical features. Don't know if we should separate the goals to a different section.
Joseph Cottam entered a long response, which he has since improved. Listing that separately.
I'm answering no, but it's sort of yes. I think the current 5 (don't like output) do a good job of describing the in situ part of systems that I'm aware of. There are interesting additional axes that I could think of adding -- for instance the nature of collective communication is related to but distinct from "access"... but I think those don't necessarily have to do with the "in situ" part of the system.
4 axes would be sufficient: access and proximity can be combined and output type can be removed.

Comments on question 6

Reminder of the question: What are we missing? Are there any important aspects of in situ that have not been represented in the discussion so far?

I think the axes themselves are very descriptive and good. However, it is my guess that the community will move towards prescribing "labels" for common axes configurations, i.e. we may still see people using terms like in transit, for example. After this exercise they will be able to more precisely describe what they mean by in transit, but a set of labels IMO will likely evolve as shortcuts to either points or ranges of points within this 6-d description space. I think it would be a good idea to see if others in the group anticipate this, and if so, we should proactively talk about what those labels should be called. If people agree that we should do this labeling exercise, the outcome of the discussion could end up changing people's response to question 4 above.
Data model (possibly own axis, though may be related to access): provide own data model (VTK) vs imposing no data model (adios?)
It is not clear how hybrid methods factor into the Integration axis. Or perhaps an AND/OR characterization is possible. Hybrid integrations have significant advantages, and need a description in the system.
In case this document discusses both visualization and analysis, it should address the following points:- (1) What kind of analysis are we referring to? There are many analysis computations that may not require visualization, for e.g., statistical analysis. (2) There should be some discussion about the frequency of analysis/visualization. For many analysis (like statistical analysis and temporal analysis), the frequency is an important factor.
Each axis must be broken down further into sub categories. Moreover, instead of using axes a tree metaphor seems more appropriate e.g. Co-processing | ------------------------------------------------------------------------------ | | | | | Integration Type Proximity Access Synchronization Control | ----------------------- | | | Embedded Direct Indirect
Scope, as pointed out before. Let's keep this focused on what is in-situ, and not encompass all of computing.
Please consider No. 5
Proximity and Access are really about memories. Latency and Bandwidth can be assessed based on proximity and access, Latency and bandwidth are the more fundamental properties that need to be understood to design or categorize coprocessing workflows.
I think we have our bases covered well, but I would still like to see examples of published work recast according to the new axis to check if they help clarify cases where things were ambiguous prior to our effort.
It may be too late to bring this up but if we're including output type then maybe we should also include input type/specification or something along the lines of how the user specifies what should be output. Both input specification and output type are capability aspects. To get a bit more specific about input specification, how would a user get a simulation code to output iso-surfaces of density at values x, y and z from the simulation. What about if the user wanted images of that, or explorable images? Also, what's the frequency of the output, when should it start, where to "put" it (e.g. to a file or doing something in transit with data extracts)?
If "in situ" is the blanket term, we may need to explicitly differentiate between in-transit and coprocessing, or to decide that they are the same concept.
I'm sure that there's stuff we're missing, but I think (1) we have sufficient coverage to make a good effort at characterizing the space and that we would gain more through that exercise, and through the external review of that effort, than continuing to churn on phrasing, etc, internally.
These comments are regarding the debate about the Output axis and visualization/analysis. The discussion about output type separates them into "explorable" and "non-explorable" without giving a definition for either. An example of a static image is given as a "non-explorable" type and a wavelet field compressor as an "explorable" datatype. I argue that these distinctions are almost entirely useless for our discussion. How a user (or later algorithm) processes output that comes from an in situ workflow is irrelevant to the process of generating the output, which is the purview of our discussion about in situ techniques; whether an output is "explorable" or not has to do with the interaction method of something so far down the pipeline that it falls outside of "in situ". (And the concept of "explorable" vs. "non-explorable" seems to be open to significant differences of opinion. I consider static images to be explorable while something like a hierarchy of wavelets not to be.) More fundamentally, this question of output type is really at the heart of the difference between visualization and analysis. From a high-level point of view, visualization is data processing whose final result is an image, while analysis is data processing whose final result is not an image (a number, a spreadsheet, a regression, a probability distribution function, etc.). Since the terminology and architectures necessary to enable in situ visualization are essentially identical to those necessary to enable more general data analysis, there's no real advantage to our separating them. We should be considering "analysis" as part of our discussion. (This point lends weight to the argument that consideration of output type is mostly a distraction from our goals; it serves to fragment the discussion to no real end.)
Since I don't want to waste my vote in earlier questions just to add comments there (i.e. I'd have to choose "other" instead of my actual preference), I'll instead add my comments here: 1) The "output type" axis question is really into diminishing returns. I think we've seen other axes that are flexible within a given system, such as "proximity" and "operational controls", and so while I think "output type" is probably the one that is the most flexible (i.e. almost any system is flexible enough to output either images or data), I imagine some scenarios like Cinema where the output type really is integral to the system. I am NOT strongly in favor of keeping "output type", though (i.e. I'm okay with jettisoning it in the name of added simplicity), and I'm wary of adding any more axes at this point unless they are highly compelling. 2) For terminology and use of the word "in situ" I think in our community there is a LOT of inertia for "in situ", and I worry that we can't effect a change here. HOWEVER, I think there's little downside to TRYING to change it. Frankly, I don't like the idea of having to add a qualifying statement about "in situ" not being the right word every time I use "in situ", so I'd prefer others start using an agreed-upon term. After all, we're trying to get the terminology within this concept correct, we might as well get the terminology about the concept itself correct as well.
Here are two important use cases of in situ visualization that should be covered, but it is not clear to me how they fit in the axes: * Memory sharing. I'm not sure the axes express well whether data is shared or copied. They are implied by the axes but not necessarily specified. Perhaps this could be made explicit in the proximity axis. Someone suggested proximity be the (or include) the level of the memory hierarchy being shared. * Execution flow. I don't think the axes well express the execution flow. Does the simulation control the execution or the visualization? Is there a shared framework to control it? What about the use case where the simulation controls execution but the visualization can modify it? For example, the visualization might pause the simulation for debugging purposes. Another example, simulation steering could modify the execution (or at least the parameters of the execution).
Deep memory / shared memory. Hazards. The special case where proximity is close and access is shared. Reusability of software to many simulation codes.