Assignment 2

Due Apr 21 by 11:59pm
Points 100
Submitting a text entry box

In this assignment, you will learn about and use Clang's AST visitors to perform several different code analyses. In your uoregon-pat Bitbucket repository, create a subdirectory named a2 and place all files for this assignment there. Make sure to add source code (not object or binary files), commit, and push frequently.

1. Warm-up: Simple matching

(30 pts)

Implement a simple AST traversal -- you can choose any of the mechanisms described in week2 materials. The goal of your traversal is to output the locations (line and column) of all variables (declarations or definitions) that match a string provided as input. For example, if your binary is called find-waldo, you could apply it to the source file in problem 2 and produce the following output:

./find-waldo -match y example.c
y: 2,32
y: 5,5

(You don't have to match this exactly, but you should get approximately similar results.)

2. A more useful traversal: Counting ops

a. (30 pts)

Write a Clang tool named ops-counter that traverses the AST and counts the number of memory operations (i.e., variable references) and floating-point operations in the input source code. You do not have to consider loop iterations, so for example, given input code like this:

// File: example.c
void foo( double x[10], double y[10] ) {
  y[0] = 0;
  for (int i = 1; i < 10; i++) {
    y[i] = ( x[i] - x[i-1] ) / 2;
  }
}

Your tool could produce output such as:

./ops-counter example.c 
foo: FLOPS=2, MemOp=4

Clearly, this is not the actual number of operations that would be performed were you to run the code. To get the right number, we need to know how many iterations the loop has, which looks trivial here, but is more difficult in general and will be covered later in the term. You can, however, decide to optionally handle certain loops with simple bounds (like the above example), producing 18 FLOPS, and 25 MemOps in this case (+5 extra credit). That is still not completely right since it does not account for the fact that x[i-1] is the same memory location as x[i] in the previous iteration, but that's yet another type of analysis that is beyond the scope of this problem.

3. Finding paths

(40 pts)

One of the applications of code analysis and transformation is documentation generation. In a recent paper we were reading (https://openreview.net/pdf?id=H1gKYo09tX), ASTs are used to generate code representations suitable for different tasks. In this paper, the authors extract paths from the AST of Java or C# code, and use these paths to train a model that can be used for different generation tasks, such as summarization (Section 4.1), in which they predict Java methods’ names from their bodies, and captioning (Section 4.2), where they generate natural language descriptions of C# code snippets. You will not do all this in just one problem of your homework assignment, but you will try to produce one of the critical components of this approach -- the AST paths, described in Section 2 of the paper. The main idea is to generate all pairwise terminal-to-terminal paths through the AST, which are then encoded as numerical vectors and used as input to an encoder-decoder model. Since this is not a machine learning class, we will not worry about any aspects of this, other than producing the paths of tokens.

Write a tool called path-finder that prints out a random sample of k pairwise terminal-to-terminal paths (recall that all leaf nodes are terminals). The number of paths should be a command-line argument to your tool. Do not implement this as a text processor over an existing clang -ast-dump text output. Feel free to visualize the ASTs for any examples you use for debugging and testing, e.g., with clang -cc1 -ast-view example.c. For the example in Problem 2, you will get a dot (Graphviz) graph file that can be visualized with various tools, but most easily with GraphvizOnline to produce:

example.c.png

Some of the paths you can generate for this example (highlighted above) include:

./path-finder -k 3
DeclRefExpr ImplicitCastExpr ArraySubscriptExpr BinaryOperator ImplicitCastExpr IntegerLiteral
DeclRefExpr ImplicitCastExpr ArraySubscriptExpr BinaryOperator BinaryOperator ParenExpr BinaryOperator ImplicitCastExpr ArraySubscriptExpr ImplicitCastExpr DeclRefExpr
DeclRefExpr ImplicitCastExpr ArraySubscriptExpr BinaryOperator ImplicitCastExpr DeclRefExpr

How to submit:

Simply add, commit, and push your solutions to your Bitbucket repository. It's fine to push incremental work (in fact, I strongly encourage it!). When you are completely done, "Submit" the assignment in Canvas by simply copying and pasting your repository's URL in the submission text field.

Feel free to discuss any details on Discord, but please avoid sharing implementation details (code) directly among yourselves.

Additional resources

Clang compilation databases: https://eli.thegreenplace.net/2014/05/21/compilation-databases-for-clang-based-tools (you probably won't need to create them, but in case you get a compilation database error, check this)

Rubric

Find a Rubric

Title:

Title

Criteria

Ratings

Pts

Total Points: 5 out of 5