ROSE  0.9.6a
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
graphProcessing.h File Reference

Brief Overview of Algorithm: More...

#include <boost/regex.hpp>
#include <iostream>
#include <fstream>
#include <string>
#include <assert.h>
#include <staticCFG.h>
#include <boost/graph/adjacency_list.hpp>
#include <boost/bind.hpp>
#include <boost/foreach.hpp>
#include <boost/tuple/tuple.hpp>
#include <boost/graph/graphviz.hpp>
#include <boost/graph/dominator_tree.hpp>
#include <boost/graph/reverse_graph.hpp>
#include <boost/graph/transpose_graph.hpp>
#include <boost/algorithm/string.hpp>
#include <vector>
#include <algorithm>
#include <utility>
#include <sys/time.h>
#include <sys/resource.h>
Include dependency graph for graphProcessing.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

class  SgGraphTraversal< CFG >
 

Macros

#define LP   1
 
#define PERFDEBUG   0
 

Detailed Description

Brief Overview of Algorithm:

Current Implementation

This implementation uses BOOSTs graph structure to analyze the paths of the graph

The path analyzer sends the user paths to be evaluated by the "analyzePath" function that is user defined

Further Improvements: TODO

Todo:
utilize BOOST visitors to take advantage of the BOOST graph structures abilities

Contact Info

Finally, blame can be assigned to and questions can be forwarded to the author, though response is not guaranteed

if I'm still at Lawrence hoffman34 AT llnl DOT gov

Author
Michael Hoffman

Current Implementation

The idea behind the algorithm here is to decompose the given Control Flow Graph into a Tree structure (still stored as a Control Flow Graph, though it may be possible to change this). This tree splits on children of a graph node, but does not connect in the case of multiple parents of a single node. In this we refer to out nodes as "children" and in nodes as "parents". However, we do this from the end node to the start node. This is best because then you get one value on the end node, and that value is the only one we want (however, the function takes a pointer to an empty SgIncidenceDirectedGraph as an argument, this can then be analyzed post traversal if that is wanted.

Also, following the algorithm explained above, one must realize that the tree has one leaf for EACH path. Thus small programs can lead from a small-ish graph to a VERY large tree. For example, a program with 20 if else statements would end with 2^20 paths, which means the number of nodes in the tree is greater than this. Thus with 32 or 64 you can overflow a 32 or 64 bit integer.

However, this can be partially resolved by setting the deleteTree boolean to true. This is set to false as the default, however if you don't need the tree then deleteTree will allow you to deal with much larger trees and thus much larger original graphs.

Realize that because of the potentially massive growth rate of path number, that in large cases path enumeration and counting is extremely prohibitive, as it is not difficult to create a program which has more paths than 2^64, thus an unsigned (signed?) 64 bit integer could not store the number.

Further, this is still a relatively compact method. You could just as easily force enumeration of paths, which could in some cases drastically increase the number of nodes necessary to store all the information.

The advantage of the tree structure is two-fold. First it relieves the algorithm of having to keep even more in memory than it has to at the moment, and if you want to deal with GoTo statements, one case can develop that cannot be solved otherwise, e.g.

Consider the four node tree with nodes a, b, c, d, and edges atb, atc, btc, ctb, btd, ctd. There are FOUR legitimate paths here (a, b, d; a, b, c, d; a, c, d; a, c, b, d), and any other method would recognize this as a loop and, without a special algorithm for loop behavior, this would be ignored.

The tree structure also allows for very strong parallelization, currently implemented in openMP. This is because depth has meaning in terms of a tree (it doesn't in terms of a general graph) and that you know you can evaluate all nodes at a certain depth if you have solved all the nodes at depth + 1.

Further Improvements: TODO

Todo:
*One improvement that should be implemented ASAP is changing the algorithm from a recursive algorithm to an iterative algorithm. Keeping the memory requirements down is much easier in this form and would probably increase the size of graph that the algorithm can handle.
Todo:
*Another improvement that should be implemented when possible is to allow for loop analysis. This could be implemented by simply running the algorithm on the loop, but there would need to be a provision that kept the algorithm from stopping as soon as it starts. This could be done by separating the node into two nodes, one with all the inedges and one with all the outedges. OR one could collect the loops when they are deleted (the whole loop is calculated necessarily), though nested loops would have to be considered further in order to find a way to deal with them.
Todo:
*It is possible that graph matching algorithms might prove useful to distinguish different types of graphs contained within the CFG and optimize traversal over them. Look up graph matching algorithms or pattern matching (potentially) for more information on such algorithms, though I do not believe there is an existant literature on matching subgraphs for this purpose.
Todo:
*The parallelism in this program should be optimized by someone experienced in parallelization optimization

Contact Info

Finally, blame can be assigned to and questions can be forwarded to the author, though response is not guaranteed

archangel DOT associate AT gmail DOT com or, if I'm still at Lawrence hoffman34 AT llnl DOT gov

Author
Michael Hoffman

Definition in file graphProcessing.h.

Macro Definition Documentation

#define LP   1

Definition at line 10 of file graphProcessing.h.

#define PERFDEBUG   0

Definition at line 11 of file graphProcessing.h.