insituterminology | AnAlternativeWayToCategorizeTechniques / AnAlternativeWayToCategorizeTechniques

The basic idea is to take a process-oriented approach. What does the in-situ system need to accomplish and what will it use to accomplish those tasks? This lead to a 2x4 matrix. Axis 1 is the stages of a pipeline (Acquire, Transform*, Output, Control). Axis 2 provides details on those pipeline stages (Method and Resources). The taxonomy is essentially defined by a series of questions that correspond to the eight boxes in the matrix.

Matrix Taxonomy (and guiding questions)

	Acquire	Transform*	Output	Control
Method	What programatic mechanisms provide access to the data?	What algorithms are used to transform the raw data?	What form does the final output take? When is that output provided?	What runtime controls are offered? Who/what exercises that control?
Resources	What are the computational costs of access?	What are the computational costs of transformation?	What mechanisms are used to return results?	By what API's is control exercised? How are control decisions communicated?

This alternative arrangement started as a response to the question "Are we missing anything?" Its strengths include the following:

It divides and names different context (resolving several of the "it depends on context" discussions);
It naturally groups several of the existing categories that are inter-related (resolving some of those discussions);
It is focused on high-level questions, leaving space for future technologies/configurations;
It captures the whole lifecycle of a visualization clearly.

Here is how our existing categories fit in to this model. The blank spaces are areas that are unanswered in the current taxonomy.

	Acquire	Transform*	Output	Control
Method	Integration Type Access		Output Type	Operations Control
Resources	Proximity Synchronization	Proximity Synchronization

Details

Acquire is "How do I get data?" Our prior discussions cover this column fairly well. The Method slot describes what programatic tools are used, while the Resources slot describes the computational effects of the access. Thus an in-situ system that is integrated by joining the same process (an Integration Type). It may run as a separate thread but still require blocking access (so Proximity is same node, Synchronization is synchronous). In this context "computational costs" should be taken broadly and may include network traffic, memory resources or processor time.

Transform* is "What do I do with data that I have?" Method asks what algorithms will be applied? (This was mentioned on the first teleconference, but was not a part of the first PDF draft). Resources mirrors the resources for Acquire, for similar reasons. Since there may be multiple (or perhaps zero) transform stages, Transform is tagged with a Kleene-like star (*). For example, you may run multiple analyses on the data, some sharing resources with the acquisition but others on completely separate nodes. Its possible that the thing you're trying to visualize just happens to provide 3D arrays of numbers as its intermediate stage and that is your output, thus no transforms. Similarly, the resources and proximity used for the acquire stage might not be used for transformations. An in-transit analysis (say a reduction tree) makes this very clear. The resources may be different for different phases of the transformation. In a multi-phase transformation, the resources used to communicate between phases are also required.

Output is "How do I get the results of transform out of the computer?" Method is answered with both what is returned and when will it arrive. What is returned is the essence of the Output Type discussion we've been having. The timing has been alluded to as part of the discussion about Operations Control, but I've moved it here because I think it fits better. Timing options include real-time, in batches during execution (how often do you return a batch), or only at the end of the run. Resources describes how the communication actually occurs. Options include through the file-system, on shared network resources or on dedicated network resources.

Control captures possible runtime changes to an in-situ system. It sits at the end of the pipeline because it may the a response to a visualization. Control may change what is acquired/transformed/output. The Method and Resources entries combine to cover "operations control" from the draft PDF, but this break-down clarifies what details are needed. Method provides what makes the control decisions and when the decisions are made. Options for "What" include Nobody, the visualization system, the "science" system, and a person live during execution. Options for "When" include at compile time, in response to pre-defined trigger events, in response to output, and based on complex analysis of the current vis/sim/resource state Resources describes how control is exercised. Does the control just affect the visualization system, or does it also modify the science program (i.e., computational steering)? If it affects the science program, the some science program API is a necessary resources. For human-in-the loop interactive control, resources for message dissemination is also important. If control is determined algorithmically, it may require communication between nodes as well (and thus network resources.

Discussion/Limitations

The high-level questions are a two-edged sword. They make it easy to wiggle around in the boxes. The row/column labels are similarly very high-level. As presented here, there is not enough specificity to claim this taxonomy is "done". For example, I can image a system with blocking interactive control, but I have not listed synchronization as an element of output or control. Should it be?

Several boxes have two questions in them. Is the Method/Resources matrix correctly sized? Maybe it should be Method, Resources, Timing, Synchronization. I can see sensible answers to all four of these in all four of the process slots. In the current version 'Timing' is essentially a method and 'Synchronization' is a resource.

Pipeline boundaries are rarely so distinct. If you do a reduce as part of your acquisition, it is concurrently transforming and acquiring.

"Control" is not part of a pipeline in the same way that the other three boxes are. Acquire is getting values into the system. Transform* is whatever work the system actually does to the input. Output is getting the values out of the system. That's a fairly clean pipeline. Control may modify actions on any of those stages. That makes it the odd-man-out in some ways.

Several of these limitations may be fixed by better labeling, more clear questions...basically more eyes on the idea. With that, I lay it open for your comments.

Comments on Telcom (24Feb2016)

Provides good questions to ask about our axes
Provides a clear statement of resources
Ties to existing visualization models
Diverges from existing from existing discussion (What are we currently doing?) and expands the scope
May go too general (beyond in in-situ)