Recent Changes - Search:

Wiki Account Management

PmWiki

pmwiki.org

edit SideBar

Phase1BProposedInSituCategorizations

Initial take on digesting the discussion from Phase 1A

From the discussion, it appears there should be 6 axes:

  1. access
  2. location
  3. synchronous vs asynchronous
  4. output type
  5. operation controls
  6. interface

It is not clear that there is consensus that all of these axes are necessary to describe the system. We will determine that soon. It is also very possible that the 6 axes (and/or their sub-options) are poorly named. We'll figure that one out soon too.

For each axis, there is a comments subsection with the following questions: "Is this a meaningful axis? Do we have options covered? Is it non-orthogonal with other axes?" Please insert comments there. It is highly likely that this digested form does not reflect all comments made in the first pass ... apologies about this, but likely inevitable. The hope is that we converge after several iterations.

Access

Access refers to the mechanism for the visualization program to access data from the simulation code.

There are two options:

  1. Direct access to simulation code's memory (i.e., memory used by the simulation)
  2. Indirect access to simulation code's main memory

Direct Access

In the direct access case, we often make assumptions about location, i.e., that the visualization code is running in the same location. However, many folks pointed out that access and location are distinct. For example, with a PGAS model, visualization may be running on a remote resource, but be able to access the simulation. Chuck Hansen mentioned memory mapping ... I think that could be another example.

Indirect access

With indirect access, the data from the simulation code is accessible in a way other than direct access. Examples:

  1. send data over a network
  2. write data to NVRAM
  3. (AC) copy data from GPU to CPU memory (vice-versa)
  4. more??

Comments

Thoughts about the above. Is this a meaningful axis? Do we have options covered? Is it non-orthogonal with other axes?

  1. (ptb) I think the key difference between direct and indirect is a question of whether your data layout is fixed by the simulation or not. A direct access to memory implies that I have to use the same number of cores in the same node mapping etc. as the simulation which is very likely to be highly sub-optimal for the analysis and vis. An indirect access means I can manipulate the layout (most likely by making a copy). Whether I make my copy locally on the node into NVRAM or globally through PGAS or some other transport layer does not really matter for this axis.
  2. (MD) It is indeed unclear what can be qualified as direct and indirect. If the simulation makes a copy of its data in a shared memory segment for an analysis process running on the same node to read, is it direct access or indirect access? There are cases where direct access is obvious: LibSim for example does a direct access to the simulation's data. In Damaris when you use dedicated cores, you can either have the simulation make a copy of the data in shared memory (what I would call "indirect" access), or replace the allocation calls in the sim so that they are done directly in shared memory and ready for the dedicated cores to be used (what I would call "direct" access). The reason I tend to call "indirect" the fact of copying data is because we can also use message passing APIs for the same purpose: if we do an MPI_Send of the simulation's data to an analysis process in the same node, the MPI implementation is likely to use a shared-memory-based implementation that would be equivalent to making a copy in shared memory ourselves.
  3. (MD) Related to my previous comment, maybe direct = shared (i.e., requiring some sort of agreement/synchronization between sim and viz processes to know when the data can or cannot be accessed in read/write by either of the processes) while indirect = not requiring such extra mechanism (which means the data has been copied or sent over the network, or written to files, in all cases we are not working on a piece of data that is still under access by the simulation).
  4. (DW) Certaintly, the "access" category is quite important (and we might even want to subsume the below "location" category here/merging both categories). However, I wonder if we couldn't rely more on existing descriptions of memory and communication for parallel computing in general? And then maybe add, what's specific to insitu VIS?
  5. (BG) This doesn't seem to take memory hierarchy into account. If the simulation keeps part of the data in an on-package fast memory (or GPU memory) and we have to move it to main memory, is this direct access? I prefer to think in terms of the distance of the in situ code to the simulation data. You can extend this analogy to the Location section in theory - over the network is just extra distance to the simulation data.
  6. (PAN) Access seems to be more a question of whether the simulation exposes its internal data structures directly (either through direct shared-memory access or through other means, see location axis) or if the vis must acquire data indirectly through a simulation API (distinct from the vis interface axis below). As Chuck pointed out, one can have 'direct' access to remote memory via mechanisms like PGAS (though one must be aware of the performance impact of that).
  7. (KM) Under the list of indirect access, I propose adding writing data to disk (and perhaps even local disk and networked disk). This seems counter to our thoughts of in situ (particularly today), but there is precedent for coupling codes this way.
  8. (KM) So far this discussion has focused on memory access, but there are other types of access as well. One type is access to the execution of the simulation. One function of in situ visualization is for simulation debugging. This typically has the visualization pausing the simulation on a user's request. Another type of access is write access to the simulation state. Any use case of simulation steering is going to need to modify the data and/or parameters of the simulation and might perform other tricks like roll back the simulation. Another type of access is to the metadata of the simulation. Although related to what we have been calling "memory" access, this is in practice different. For example, the simulation might manage neighborhood information separate from its data model. Access to such information could be beneficial to visualization.
  9. (JK) I too agree that access and location are intertwined, but I think the keeping location as separate axis in our description is important for fully describing the in situ type. I think that taking into account the distance that data travels is very important. If the data is moved or copied just for the in situ routines to process the data, that would seem to be indirect access, regardless of whether that was still the simulations working copy. So akin to what was mentioned above, direct = shared. Second note, I agree with Ken that disk should be listed, it just has a much larger "distance" factor.
  10. (AC) Since access and location are about copying the data. Could we use Data (including transformation) as an orthogonal axes?
    1. Data
      1. Source
      2. Copy (Memory, NVRAM, Network, etc a.k.a location)
  11. (JCB) There are some aspects of “access” that are currently missing in the discussion above. Following up a bit on KM’s comment, one can consider simulation and analysis/vis tasks as coarse-grained tasks operating on data. The specifics of a scientist’s desired workflow (i.e., whether or not they want to steer the computation) determines what parallel hazards exist (e.g. read-after-write (RAW) or write-after-read (WAR)) in the full workflow. These hazards place a limit on how much asynchrony is even possible within a particular workflow. For example RAW hazards cannot be avoided – if a task A depends on the output of task B, then task A cannot be started until task B has completed. However, there is more flexibility with WAR hazards (in this case writing data too soon would cause a parallel read to use the wrong value). WAR hazards can be removed through copies of data, providing an opportunity for increased asynchrony/parallelism. In short: the workflow/algorithms place limitations on how much asynchrony you can introduce from a “correctness” perspective (this is increasingly relevant as workflows become more complex). From a performance perspective there will be tradeoffs as to whether or not introducing the asynchrony or the use of data copies is desirable. These performance factors are going to be highly dependent on how the data and computation are mapped to the hardware.

Location

Location refers to where the visualization program runs, in relation to the simulation code. Based on the active discussion on this topic, it is likely that enumerating all possible types of location will be difficult. As such, we may ultimately want to avoid identify all possible locations. But, for now, let's try:

  1. same resources as the simulation code
  2. distinct resources from the simulation code

As commented by several, this is not an either-or proposition. In situ processing may first use the same resources as the simulation code, and then transfer results to distinct resources from the simulation code.

Same resources as the simulation code

The visualization program runs on the same compute resources as the simulation program. (Is this clear and unambiguous?)

Distinct resources from the simulation code

There are levels of granularity in here:

  1. data accessible by network
  2. data accessible by accelerated mechanism
    1. burst buffer
    2. local file system
    3. dedicated connections (e.g., PCI between CPU and GPU, NVLink between GPUs)
    4. more?...

Note: there were comments about "distance to access". That idea fits here, right?

Comments

Thoughts about the above. Is this a meaningful axis? Do we have options covered? Is it non-orthogonal with other axes?

  1. (ptb) I think this is just a re-phrasing of the direct vs. indirect access. If you have "direct" access you have to run on exactly the same partite, layout, etc. as the simulation, which is what I would call the "same" resources. If you allow yourself an intermediate layer to buffer, re-arrange, copy, consolidate, etc. data in some way then effectively you are not using the same resources. It would be the job of the transport layer to isolate you from where you run. It might be physically the same resources, it might be a completely different machine. But from the perspective of the analysis code you are separated from the simulation and thus no longer using the "same" resources. Note that I by no means want to trivialize the difficultly for the transport layer to move data within the partition or off the machine. But for this axis I see mainly "same" or "not-the-same" in which case to me this feels redundant to direct vs. indirect
  2. (MD) If you run the simulation and the visualization processes on the same node, using shared memory for data transfer, you may indeed rely on a transport layer that makes both processes unaware of each other, but both processes still use the same local memory and the simulation can run out of memory because the visualization process uses too much of it. This is independent from considering how the data is transferred from a process to another. I think we could look at the question of "same resources" in terms of whether it can impact (i.e., slow down or even crash) the simulation to run out of such a resource. Running on the same node will make the visualization process use local memory, a resource that the simulation also needs. If we consider cores as resources, then running the visualization process a on dedicated core also has an impact on the simulation (slowing it down).
  3. (DW) This category might be merged with the above "access".
  4. (BG) I also vote for merging with Access.
  5. (PAN) I disagree, I vote for this to remain a distinct axis. I view this more from a logical integration with the simulation, i.e. does the vis share the same MPI communicator as the simulation. If the vis is running in the same communicator, it might still use indirect data access (i.e. access via API). If the vis runs in a distinct MPI environment, it might still access the data structures of the simulation directly via memory pointer. This also suggests the nuance of running in the same / different logical control environment (MPI comm) vs. the same /different physical resources. Are we considering co-processing / "in transit" vis as part of the broad concept of "in situ" or are we strictly dealing with vis closely-coupled with a simulation?
  6. (KM) I also feel that access and location should be separate axes. "Location" refers the allocation of the physical resources and the physical location these things are running on. "Access" refers to the allocation of software resources: data, state, metadata, and execution management. Are these axes orthogonal? Probably not. The physical location affects what access is possible and how that access is achieved. But all of these axes (with the possible exception of "output type") are intertwined. Having these two axes separate allows us to describe in situ much more clearly and precisely. For example, on node location with direct data access is different than on node location with data passing access is different than off node location with data passing access.
  7. (KM) Multicore and hyperthreading muddy the definition of "same resources" a bit. If a processor has 12 cores, the simulation runs on 11 threads, and the visualization runs on 1 thread, is that the same resources? If the same processor has 2x hyperthreading, the simulation runs on 23 threads, and the visualization runs on 1 thread, is that the same resources?
  8. (SZ) Getting less fine grained, is running the simulation on CPU and viz on GPU within the same node considered the same resources? So, perhaps a helpful way of thinking about it is, "does the viz _take_ resources away from the simulation?" Perhaps the simulation wasn't going to use a GPU anyway. Back to Ken's example: if I dedicate 1 thread to viz on a 12 core processor, I'm likely taking away a core. Yet if a simulation code's threading can't quite scale up to all cores on a KNL processor, then dedicating a few threads to viz doesn't take anything away. Am I taking away memory? Also, that logic may break down in some cases; e.g., I'm not sure I'd say that loosely-couple viz within the same system that took nodes away from the simulation is using the "same resources." So maybe that line of thought is more helpful when considering resources within a node.
  9. (HT) The above discussion shows that the aspect of "location" as discussed here is probably closely related to the underlying hardware architecture. At the same time, hardware setups are changing rather rapidly these days and along the way platforms are becoming more heterogeneous/diverse/complex,... In my opinion, we have to make sure that the concept of in situ we are trying to frame here does not get outdated with a sudden change in hardware. We certainly cannot anticipate all possible developments, yet we should try not to link our definitions too closely to specific assumptions about the underyling hardware. Regarding the term "on the same resources", I guess it's a matter of level of abstraction: will it be "on the same resource" if we mean the entire HPC system? Most certainly. Will it be the "same resource" if we mean everything within the same node? Probably yes. For me, things get interesting within the node: are sockets, CPU cores, GPUs resources in their own right? Or are they just part of the node where everything runs on? These might be the categories along a location axis. BTW, the above serves to say that I think location is an important axis in its own right.
  10. (KM) I think this discussion of "on the same resource" is getting us nowhere. The idea of being on the same resource is both ambiguous (we can't agree what it means) and not expressive enough (if it's not the same resource, then where is it). In some sense, in situ is always an economic decision on resource sharing whether you are making the decision on the node level or the cycle level. I propose instead to think about this as a proximity metric to describe how close the simulation and visualization are. Here is a quick list, off the top of my head, of possible proximities between simulation and visualization.
    1. Different facilities spread across the world.
    2. Different systems in the same facility (separate computation and vis clusters).
    3. Different nodes in the same system.
    4. Different subsystems in the same node (CPU and GPU).
    5. Dedicated processors in different socket on the same node.
    6. Dedicated cores in the same processor.
    7. Time sharing or dynamic allocation on the same cores.
  11. (JCB) I agree that the general notion of location deserves its own axis. I also agree that the breakdown into categories of "uses same" or "uses different" resources is sub-optimal. To me, the location axis is more useful if it is considered to be "resource mapping" of both computation and data onto hardware abstractions such as "execution spaces" and "memory spaces". The conversation above points out how muddied "uses same resources" or "uses different resources" gets, especially as architectures continue to evolve. Using abstractions such as execution and memory spaces 1) allows for flexibility when it comes to changes in hardware, and 2) provides the benefit of enabling the expression of a broad class of "concurrent" workflows (to use Wes' terminology below), even those we haven't anticipated yet. KM's notion of proximities can be used to annotate different resource mappings.

Synchronous vs asynchronous

  • asynchronous: visualization/analysis & simulation occur concurrently, by sharing compute resources
    • Examples:
      • 31 nodes on supercomputer for sim, 1 node for vis/analysis
      • 31 cores on one node of supercomputer for sim, 1 core for vis/analysis
      • sim running on GPU, vis runs on CPU (or vice versa)
  • synchronous (not asynchronous): processing power devoted exclusively to simulation OR visualization/analysis

Note that the examples here somewhat overlap with the location axis, but the intent is to distinguish *when* the visualization/analysis occurs with respect to the simulation code.

Comments

  1. Tom Peterka had some interesting thoughts on "time division" and "space division". This is an interesting concept. Others made comments about whether this fit the synchronous/asynchronous model to a tee, and gave counter-examples. Do folks think time division / space division has a role here?
  2. (ptb) I think the way synchronous is phrase right now is too restrictive. To me synchronous just means the simulation has to wait not that I am using all resources. I might schedule five different operation all running after the time step ended, all using part of the machine. Synchronous just means the next time step has to wait for the last of these to finish.
  3. (jsm) agreed with both the prior two points. The description as written sounds more like "time/space division" with the sync/async nature being a (highly-correlated but not 100% so) byproduct.
  4. (BG) I agree with ptb
  5. (PAN) agree
  6. (KM) I also agree. Synchronicity is really about when things are running, not where or how (although those could be implied). I think it helps to describe synchronous vs. asynchronous in terms of Gantt-like charts like this one: Here it is clear that synchronous simply means that the simulation and visualization do not run at the same time. Asynchronous is more interesting as there are different patterns of overlap that are indicative of different in situ behaviors.
  7. (JCB) agree with PTB -- asynchronous simply should imply that simulation does not block while waiting for analysis to complete.
  8. (JC) I agree with PTB. A clear distinction can be made about waiting, without bringing in discussion about why the wait is incurred (e.g., resource use).

Output type

  • explorable
    • images (e.g., ParaView Cinema)
    • transformed data (e.g., Lagrangian basis flows, explorable images, topological descriptions)
    • compressed data (e.g., wavelets, run length encoding, Peter Lindstrom's compression, Fout & Ma's compression, etc.)
  • not-explorable
    • Images
    • Movies
    • data analysis (e.g., a synthetic diagnostic)

Comments

  1. [EPD] Under explorable extracts is another type. With Libsim you can create line and slices, output these to FieldView XDB files and then load the XDB into FieldView for interactive viewing; enabling a blend of in situ + post-processing since the extracts are at the fidelity of the source grid and can contain primitive or derived solution scalar and vector data.
  2. Comments on Phase 1A were split.
    1. Some folks believed the system can be described without needing to know what it produces
    2. Others felt that stating what it is producing may be meaningful to our stakeholders
    3. This topic will likely take more discussion as we go, and possibly a vote
  3. (DW) I like to have "output type". While it may not directly affect fundamental aspects (like "access"), I find it important to classify the visualization technique (in fact, that's where insitu vis carries aspects that are not just an application of underlying parallel or distributed computing). Also, it nicely relates to the below "Operation controls".
  4. (DW) More of a challenge for future research than what's in the current literature: I see the important question of whether and how insitu vis can be better linked to incorporate the analyst - in the sense of user-driven data reduction. This might be controlled explicitly or implicitly (like in implicit user interaction). This Topic reaches into the below "operation controls".
  5. (AB) I'm having trouble with the category names here. There are only examples here of each category but no definitions and images are in both. Until someone comes up with good definitions to delineate the categories here I'm leaning towards not needing to know what it produced.
  6. (PO) Outputs include images, derived data, charts, geometry of visualization objects, transformed data, compressed data, reduced data, and full data. I think the distinction between explorable and non-explorable implies whether or not further ad hoc post-processing is possible. This category could imply (1) an increase/decrease in output size (image vs explorable image), (2) an increase/decrease in computational resource requirements, and (3) an increase/decrease in the amount of postponed processing. I think 'raw' vs 'processed' might be more appropriate with an infinite set of hybrids in between.
  7. (FS) I agree that "raw" vs "processed" might be a better classification. Just to expand, I think that one could argue that further post-processing would always be possible, even from something that is considered non-explorable. For example, one could still use a set of static images/movie to track feature evolution in screen space.
  8. (CW) For completeness, it makes sense to me to consider "output type". To be clear, if we say "output", we actually mean saving (intermediate or final) results to storage device, right? So, we do not consider showing the rendered images directly on the screen (not saving to disk) as a way of output because they are transient? For transformed data under the "explorable" category, similar to topological descriptions, you may extract some graphs encoding the data relationships as the output along with some features (which could be graph nodes) for further exploration. I agree that "raw" and "processed" may be a more flexible way of categorization.
  9. (BG) As useful as this section is, I wonder why we need it to categorize the in situ capability? I expect that many of the in situ codes will use a few of these output types - some will be able to use all.
  10. (PAN) I like output as an axis, as it has implications for both the intermediate representations needed as well as the potential I/O impact of the vis against the overall simulation. I could see simulation teams making implementation decisions based on the performance implications of this axis.
  11. (SZ) To me, the output type axis seems so distinct because it may be the only truly orthogonal axis in the bunch. I will concede that we do not need this axis to characterize how the in situ capability works. Yet, the purpose of this effort as a whole is to unify “in situ terminology" and improve the communication of these concepts in our field. I submit that in situ output type is a part of that. Moreover, since many of the output types are relatively new literature-wise, we have an opportunity to set the terminology early (rather than having to fight inaccurate yet ingrained terminology such as “in situ” itself).
  12. (HT) For me, the output type is closely related to the level of abstraction that the product of the in situ part of the overall workflow saves to disk. One of the main motivations for in situ is to reduce I/O. Higher levels of abstraction in the output data typically serve to reduce the amount of data required to be written. Thus, the ability to gauge the potential benefit of a given (in situ) technique is closely linked to its output's level of abstraction. This in turn makes this a relevant factor for a terminology covering in situ. In a nutshell, the output axis is - for me - about assessing the (potential) benefit of a technique. That said, this might not be a bi-partite decision (explorable vs non-explorable) but rather a continuous axis; essentially, one can grab (processed) data at any point of the vis pipeline and write it out for post-hoc analysis.
  13. (SF) In my opinion, this is a crucial axis, as it contains one of the most important decisions to be made when designing an in-situ approach, namely where to position in the explorability/flexibility vs. space trade-off. As there are many different solutions to resolve this trade-off (could be parameter settings or whole techniques), I also think that this axis needs to be a continuous (or at least more finely categorized than the two basic choices at the moment). In the end, even plain images could be seen as explorable when taken from different view points, and a movie could be seen as a representation that can be explored in time.
  14. (RS) A practical argument for the inclusion of output: the fact that it is divisive among us suggests it's not common, something to at least point to as valuable arising from this collaboration. More importantly though, some of the arguments regarding it not being necessary, such as it's independence as an axis, may prove invaluable in later stages of creating a taxonomy. Having somewhere to start making a clear division may provide just the view we need to reconcile some of the clutter in more difficult axes, or even point out an unexplored category that could is obviously (or not) deployable.
  15. For me, the output type is closely related to the level of abstraction that the product of the in situ part of the overall workflow saves to disk. One of the main motivations for in situ is to reduce I/O. Higher levels of abstraction in the output data typically serve to reduce the amount of data required to be written. Thus, the ability to gauge the potential benefit of a given (in situ) technique is closely linked to its output's level of abstraction. This in turn makes this a relevant factor for a terminology covering in situ. In a nutshell, the output axis is - for me - about assessing the (potential) benefit of a technique. That said, this might not be a bi-partite decision (explorable vs non-explorable) but rather a continuous axis; essentially, one can grab (processed) data at any point of the vis pipeline and write it out for post-hoc analysis.
  16. (MR) I agree with the fact that the output requested is likely to drive the in-situ/coprocessing/covisualization workflow in several cases. The goal of in-situ is to extract some meaningful data for the user from raw simulation data. The user is then likely to select the fastest or less penalizing in-situ technique that fits his needs based on this axis. For instance, if we consider the usual example of non-scalable data filter such as streamtraces, then synchronous in-situ configurations should be avoided. If the selected data filter scales well but leads to an extra-peak in terms memory consumption (many iso-contours for instance), then an in-situ configuration with dedicated and adequate memory could be the only solution.

Operation controls

  • interactive: operations being performed can be changed by the user
    • blocking: the simulation waits while the user edits the operations being performed
    • non-blocking: the simulation proceeds while the user edits the operations being performed. at some point these changes are integrated into the operations being performed
  • batch: operations being performed are fixed at the beginning, i.e,. they cannot be changed by the user

Other considerations:

  • do the results feed back to the simulation code?
    • if yes, then "steering"
  • regular vs adaptive resolution of in situ
    • example: Janine Bennett presented work at ISAV with "triggers" that cause in situ analysis to happen more frequently after an event occurs
    • this could be accomplished with both interactive and batch

Comments

  1. (ptb) How many users have "we" (the group participating here) asked about the interactive steering part ? I agree with Janine that automatic triggers could be very useful. But it still seems to me that if a human is in the loop the user would simply run the simulation in smaller chunks. Meaning, I would expect instead of a 1 week run, 7 24 hour runs, where each morning I fiddle with analysis parameters. It seems the reason we don't do this now is because you would loose so much time in the batch queue. But that is a facilities issue and I see no reason why say NERSC could schedule a 1 week job with specific breaks every k hours. For that matter you could also just hardcode this into the simulation. Meaning, stop the main time loop if you next time step wouldn't finish before 8am local time with a stop that can only be removed by a user resetting it. Still that interaction would "just" be a traditional post-processing just done very quickly and with the ability to very quickly continue.
  2. (CW) There are different levels of "steering". Minimally, in situ visualization provides visual feedback (e.g., one can tell if the simulation goes right or wrong) but there is no control over the simulation, except terminating it. The next level would be adjusting parameters based on the analysis of visualization results (e.g., using a larger value for a certain parameter). The next level would be resilience, the simulation would self-recover based on the visualization feedback so that it can get back on track if it goes wrong. Human can certainly be involved in the decision making, or a "smart" or resilient system would do it with minimal human intervention.
  3. (PAN) I don't know that there has been enough in situ capability available for us to know exactly how folks might use it. Regardless of preferred usages, we should strive to categorize the total space, and if particular modes become more popular, a good understanding of the total space might indicate why (e.g. some fundamental advantage vs. a neutral or inferior approach that happens to be well-implemented, and thus preferred).
  4. (KM) I can't say how much they are used yet or how important they will become, but I have seen or read about implementations of each of the operation controls currently listed. @ptb I actually hear about interactive steering much more than I predicted several years ago. It is particularly useful for debugging simulations at larger scales.
  5. (JP) Triggers don't have to feed back to the simulation codes, they can simply inform the type, spatial and/or temporal resolution and location of simulation artifact generation. We for instance have automatically tracked features by automatically identifying them and automatically moving the camera to ensure the feature is represented well in the output. Operation controls can be by static or dynamic, interactive and non-interactive.
  6. (JC) I work with some clients on graph data analysis. They are interested in having steering capabilities. A priori, we don't know the structure of the graph. With the visualization providing feedback, they want to prioritize processing to certain regions of the graph. Processing proceeds full-steam without user interaction, they just indirectly re-arrange the work queues by specifying regions of interest. This is human-in-the-loop, but in a non-blocking fashion. I wish I could say we had this working already...but its a use case.

Interface

  • Direct source code integration
  • Protocol-based integration (likely through middleware framework, i.e. Conduit, ADIOS, Glean)

Comments

  1. (HC) I really made a lot of changes to this, based on Ken Moreland's observation that we were conflating with location and access. So this section is pretty new. One question I have is whether the phrase I incorporated from Tom Fogal reflects folks feelings about middleware frameworks. (If not, blame me, not Tom)
  2. (PAN) I made the point above in access, but I'll make it here too: there are two interfaces to consider (1) the interface to the vis capability and (2) the interface to the simulation data. This seems to deal only with (1), and the access axis seems to deal with (2).
  3. (KM) I had an interesting conversation earlier this week about how the simulation development team made a distinction between "embedded" analysis and in situ visualization. The difference is that an embedded analysis is some code written by the simulation development team themselves and is part of the core simulation toolkit as opposed to calling a (logically) separate visualization routine written by a separate visualization development team. I admit that the distinction seems a little flimsy, but the developers felt it was important enough to justify having its own term, so probably we should, too.
  4. (SF) As I understand this at the moment, Interface is basically the implementation of Access (?). While I think it deserves to be its own axis as there are some independent choices to be made, I don't think it's orthogonal either. If that is correct, maybe we should reflect this in the naming scheme of the axis to make things more clear.
  5. (AB) Should framework be another option here?
  6. (JC) The idea is fuzzy in my head, but some tool-mediated code integration seems distinct from the two above. Like Tau uses source analyzers and compilers to actually modify the executable to get performance data. Its invasive, very direct integration BUT the no programmer does it "by hand".

Other

We still have issues that we have not incorporated. Which of these deserve to be axes in our classification? (speak up in the comments):

  1. adaptivity of data structures, as in zero-copy (like SCIRun's templates) vs making a copy and putting it in your own layout
  2. adaptivity across simulation codes, as in custom code to solve one stakeholder's problem vs code that scales to many simulation codes
    1. "Some in situ visualization is a custom, lightweight, one function visualization routine designed specifically for a specific vis for a specific sim. On the other end of the spectrum is a large, general purpose visualization system (such as Catalyst or libsim) that is a full function and flexible service. In my experience this is a pretty big design consideration."
      1. (CDH) Generally can be further decomposed into 1) what range of operations you can do and 2) what types of data you can do them on. These are general characteristics of vis and analysis systems, but things that may have extra constraints in an in situ context.
  3. Resilience is another consideration. Memory overflow is a specific instance of a cause of a hard fault, but in general little has been done to make in situ workflows resilient to hard and soft errors.
    1. (DW) I agree that resilience or fault-tolerance should be discussed because these issues are more important for in situ vis, esp. in distributed settings.
  4. data lifetime: data can be freed/changed by simulation code during vis/analysis execution vs data will not be freed/changed
    1. "Data lifetime is a function of the in situ mechanism. If it's synchronous, then it's not an issue because the upstream task is blocked. If it's asynchronous, then data must be double-buffered, and the first copy can change."
    2. "certain operations (e.g. particle tracking, auto-correlation, etc.), even if performed synchronously, will require the in situ infrastructures to likely cache/double buffer information from previous time steps"

Finally, Wes Bethel made an interesting comment regarding in situ vs concurrent:

  • "in situ is not the opposite of post hoc; we are chasing the wrong terminology here. :) Instead, the opposite of post hoc processing is concurrent processing. In situ is one way concurrent processing is carried out (shallow share) vs in transit (deep share, which means data movement). Both may be done synchronously or asynchronously. there is plenty of historical precedent that reinforces this idea. In the 1990s, this kind of work was commonly referred to as concurrent processing or coprocessing (not In Situ processing). See, e.g., http://sda.iu.edu/docs/CoprocSurvey.pdf (Aug 1998), Concurrent Distributed VIsualization and Simulation Steering, Robert Haimes, In Parallel Computational Fluid Dynamics: Implementations and Results Using Parallel Computers, A. Ecer, J. Periaux, N. Satofuka, and S. Taylor (eds), 1995, Elsevier (google for “avs visualization system co-processing”)"

Comments

  1. (BDS) I just want to add a reference to the final comment by Wes. There was some work done at ORNL during the same time frame that included resilience (called fault tolerance back in the 90's). G. A. Geist, J. A. Kohl, P. M. Papadopoulos, ``CUMULVS: Providing Fault-Tolerance, Visualization and Steering of Parallel Applications,'' International Journal of High Performance Computing Applications, Volume 11, Number 3, August 1997, pp. 224-236. The work in this reference used a different "location" for the visualization (to use our new terminology) than did the work by Haimes. Compare and contrast.
  2. (PAN) I really like Wes's comment and made a similar note above about in-situ vs. coprocessing. Are we really addressing all 'concurrent' processing modes, or just the shallow copy kind?
  3. (KM) I believe the discussion on zero-copy belongs in the Access axis. Just because sim and vis share the same memory space (location) does not mean they can directly touch each other's memory (access).
  4. (TP) Wes is correct that any discussion of different axes of in situ has to start by defining what we mean by in situ. Having a discussion of different axes implies that in situ can't just be one point in that space (synchronous, direct memory access, etc.) and must be broader. Wes calls this broader space concurrent procesesing, preferring to reserve the term in situ for one of those specific points (I think). I agree in principle, and I used to call run-time or online processing the broader term, also reserving in situ for the specific case. In practice, though, the term in situ has become so overloaded that lately I have been using it for the broader idea instead. In the workflows workshop report, we define in situ analysis as "multiple tasks running on the same supercomputer where the simulation is running. We do not differentiate between the various 'flavors' of this definition such as whether the tasks execute in the same or separate resources." That differentiation is the point of this activity. To summarize: a) define in situ before anything else, and b) the community and funding sponsors probably recognize the term in situ more readily than other terms even if they are more accurate. I will check with Lucy if/when we can post that report so that it can be cited. We have been waiting for her comments on it.
  5. (KLM) I did use the word "runtime" when I first realized in situ visualization. K.-L. Ma. Runtime volume visualization of parallel CFD, in Proceedings of Parallel CFD '95, pp. 307-314.
  6. (RS) I very much like explicit addition of both above bullets of "adaptivity". However, I believe they can likely be integrated as further modifiers in the "interface" axis.
  7. (JCB) A primary goal of this exercise is to become self-consistent within our community as to how we use different terms. As we go about this process, I would advocate that we make every effort to default to definitions and usages of terms that are as consistent as possible with those used in other communities. For example, in the case that Wes' brought up, I agree that the term "concurrent" rather than "in situ" best describes the the opposite of post-hoc processing. In situ is used in a variety of disciplines, often to mean "in place". Concurrency (as defined in “An Introduction to Concurrency in Programming Languages” by J. Sottile, Timothy G. Mattson, and Craig E Rasmussen, 2010) means: "A condition of a system in which multiple tasks are logically active at one time." This definition of concurrency very closely matches the definition TP mentioned above for "in situ analysis". I recognize that defaulting to known definitions is not always a simple process -- many of the terminology options we will debate between are overloaded terms in outside communities as well (concurrency is one such example). However, I think it is really useful if we as a community can speak to the "etymology" of our terms, and can clearly articulate how our usage of terms fits (is similar to/different than) that of other communities.
  8. (JC) Reading Wes's comment I realize that I was thinking of "concurrent visualization" during the last comment cycle. Not just the in-situ/shared-memory kind (though mostly that...just not exclusively).

Log

  • (HC) Hank Childs
  • (MD) Matthieu Dorier
  • (EPD) Earl Duque
  • (DW) Daniel Weiskopf
  • (JSM) Jeremy S. Meredith
  • (CDH) Cyrus Harrison
  • (AB) Andy Bauer
  • (PO) Patrick O'Leary
  • (FS) Franz Sauer
  • (CW) Chaoli Wang
  • (BDS) Dave Semeraro
  • (BG) Berk Geveci
  • (PAN) Paul Navrátil
  • (ptb) Peer-Timo Bremer
  • (KM) Ken Moreland
  • (JK) James Kress
  • (TP) Tom Peterka
  • (SZ) Sean Ziegeler
  • (JP) John Patchett
  • (HT) Bernd Hentschel
  • (WB) Wes Bethel
  • (AC) Amit Chourasia
  • (KLM) Kwan-Liu Ma
  • (LL) Laura Lediaev
  • (BJW) Brad Whitlock
  • (SF) Steffen Frey
  • (JCB) Janine Bennett
  • (RS) Rob Sisneros
  • (JC) Joseph Cottam
  • (DR) David Rogers
  • (MR) Michel Rasquin
Edit - History - Print - Recent Changes - Search
Page last modified on February 22, 2016, at 01:18 pm