ROSE  0.9.6a
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Partitioner::IPDParser Class Reference

This is the parser for the instruction partitioning data (IPD) files. More...

#include <Partitioner.h>

Collaboration diagram for Partitioner::IPDParser:

Classes

class  Exception
 Exception thrown when something cannot be parsed. More...
 

Public Member Functions

 IPDParser (Partitioner *p, const char *input, size_t len, const std::string &input_name="")
 
void parse ()
 Top-level parsing function. More...
 

Static Public Member Functions

static void unparse (std::ostream &, SgNode *ast)
 Unparse an AST into an IPD file. More...
 

Private Member Functions

void skip_space ()
 
bool is_terminal (const char *to_match)
 
bool is_symbol (const char *to_match)
 
bool is_string ()
 
bool is_number ()
 
void match_terminal (const char *to_match)
 
void match_symbol (const char *to_match)
 
std::string match_symbol ()
 
std::string match_string ()
 
rose_addr_t match_number ()
 
std::string match_asm ()
 
bool parse_File ()
 
bool parse_Declaration ()
 
bool parse_FuncDecl ()
 
bool parse_FuncBody ()
 
bool parse_FuncStmtList ()
 
bool parse_FuncStmt ()
 
bool parse_ReturnSpec ()
 
bool parse_BlockDecl ()
 
bool parse_BlockBody ()
 
bool parse_BlockStmtList ()
 
bool parse_BlockStmt ()
 
bool parse_Alias ()
 
bool parse_Successors ()
 

Private Attributes

Partitionerpartitioner
 Partitioner to be initialized. More...
 
const char * input
 Input to be parsed. More...
 
size_t len
 Length of input, not counting NUL termination (if any). More...
 
std::string input_name
 Optional name of input (usually a file name). More...
 
size_t at
 Current parse position w.r.t. More...
 
Functioncur_func
 Non-null when inside a FuncBody nonterminal. More...
 
BlockConfigcur_block
 Non-null when inside a BlockBody nonterminal. More...
 

Detailed Description

This is the parser for the instruction partitioning data (IPD) files.

These files are text-based descriptions of the functions and basic blocks used by the partitioner and allow the user to seed the partitioner with additional information that is not otherwise available to the partitioner.

For instance, the analyst may know that a function begins at a certain virtual address but for some reason the partitioner does not discover this address in its normal mode of operation. The analyst can create an IPD file that describes the function so that the Partitioning process finds the function.

An IPD file is able to:

  • specify an entry address of a function that is otherwise not detected.
  • give a name to a function that doesn't have one.
  • specify whether the function ever returns to the caller.
  • list additional basic blocks that appear in the function.
  • specify the address of a basic block that is otherwise not detected.
  • indicate that a basic block is semantically equivalent to another basic block.
  • override the control-flow successors for a basic block.

The language non-terminals are:

File := Declaration+
Declaration := FuncDecl | BlockDecl
FuncDecl := 'function' Address [Name] [FuncBody]
FuncBody := '{' FuncStmtList '}'
FuncStmtList := FuncStmt [';' FuncStmtList]
FuncStmt := ( Empty | BlockDecl | ReturnSpec )
ReturnSpec := 'return' | 'returns' | 'noreturn'
BlockDecl := 'block' Address Integer [BlockBody]
BlockBody := '{' BlockStmtList '}'
BlockStmtList := BlockStmt [';' BlockStmtList]
BlockStmt := ( Empty | Alias | Successors ) ';'
Alias := 'alias' Address
Successors := ('successor' | 'successors') [SuccessorAddrList|AssemblyCode]
SuccessorAddrList := '{' (AddressList | AddressList '...' | '...') '}'
AddressList := Address ( ',' AddressList )*
Address: Integer
Integer: DECIMAL_INTEGER | OCTAL_INTEGER | HEXADECIMAL_INTEGER
Name: STRING
AssemblyCode: asm '{' ASSEMBLY '}'

Language terminals:

HEXADECIMAL_INTEGER: as in C, for example: 0x08045fe2
OCTAL_INTEGER: as in C, for example, 0775
DECIMAL_INTEGER: as in C, for example, 1234
STRING: double quoted. Use backslash to escape embedded double quotes
ASSEMBLY: x86 assembly instructions (must contain balanced curly braces, if any)

Comments begin with a hash ('#') and continue to the end of the line. The hash character is not treated specially inside quoted strings. Comments within an ASSEMBLY terminal must conform to the syntax accepted by the Netwide Assembler (nasm), namely semicolon in place of a hash.

Semantics

A block declaration specifies the virtual memory address of the block's first instruction. The integer after the address specifies the number of instructions in the block. If the specified length is less than the number of instructions that ROSE would otherwise place in the block at that address, then ROSE will create a block of exactly the specified size. Likewise, if the specified address is midway into a block that ROSE would otherwise create, ROSE will create a block at the specified address anyway, causing the previous instructions to be in a separate block (or blocks). If the specified block size is larger than what ROSE would otherwise place in the block, the block will be created with fewer instructions but the BlockBody will be ignored.

A function declaration specifies the virtual memory address of the entry point of a function. The body may specify whether the function returns. As of this writing [2010-05-13] a function declared as non-returning will be marked as returning if ROSE discovers that a basic block of the function returns.

If a block declaration appears inside a function declaration, then ROSE will assign the block to the function.

The block 'alias' attribute is used to indicate that two basic blocks perform the exact same operation. The specified address is the address of the basic block to use instead of this basic block. All control-flow edges pointing to this block will be rewritten to point to the specified address instead.

Example file:

function 0x805116 "func11" { # declare a new function named "func11"
returns; # this function returns to callers
block 0x805116 { # block at 0x805116 is part of func11
alias 0x8052116, 0x8052126 # use block 0x805116 in place of 0x8052116 and 0x8052126
}
}

Basic Block Successors

A block declaration can specify control-flow successors in two ways: as a list of addresses, or as an x86 assembly language program that's interpretted by ROSE. The benefits of using a program to determine the successors is that the program can directly extract information, such as jump tables, from the specimen executable.

The assembly source code is fed to the Netwide Assembler, nasm (http://www.nasm.us/), which assembles it into i386 machine code. When ROSE needs to figure out the successors for a basic block it will interpret the basic block, then load the successor program and interpret it, then extract the successor list from the program's return value. ROSE interprets the program rather than running it directly so that the program can operate on unknown, symbolic data values rather than actual 32-bit numbers.

The successor program is interpretted in a context that makes it appear to have been called (via CALL instruction) from the end of the basic block being analyzed. These arguments are passed to the program:

  • The address of an "svec" object to be filled in by the program. The first four-byte word at this address is the number of successor addresses that immediately follow and must be a known value upon return of the program. The following values are the successors–either known values or unknown values.
  • The size of the "svec" object in bytes. The object is allocated by ROSE and is a fixed size (8192 bytes at the time of this writing–able to hold 2047 successors).
  • The starting virtual address of the first instruction of the basic block.
  • The address immediately after the last instruction of the basic block. Depending on the Partitioner settings, basic block may or may not be contiguous in memory.
  • The value of the stack pointer at the end of the basic block. ROSE creates a new stack before starting the successor program because the basic block's stack might not be at a known memory address.

The successor program may either fall off the end or execute a RET statement.

For instance, if the 5-instruction block at virtual address 0x00c01115 ends with an indirect jump through a 256-element jump table beginning at 0x00c037fa, then a program to compute the successors might look like this:

block 0x00c01115 5 {
successors asm {
push ebp
mov ebp, esp
; ecx is the base address of the successors return vector,
; the first element of which is the vector size.
mov ecx, [ebp+8]
add ecx, 4
; loop over the entries in the jump table, copying each
; address from the jump table to the svec return value
xor eax, eax
loop:
cmp eax, 256
je done
mov ebx, [0x00c037fa+eax*4]
mov [ecx+eax*4], ebx
inc eax
jmp loop
done:
; set the number of entries in the svec
mov ecx, [ebp+8]
mov DWORD [ecx], 256
mov esp, ebp
pop ebp
ret

Example Programmatic Usage

The easiest way to parse an IPD file is to read it into memory and then call the parse() method. The following code demonstrates the use of mmap to read the file into memory, parse it, and release it from memory. For simplicity, we do not check for errors in this example.

int fd = open("test.ipd", O_RDONLY);
struct stat sb;
fstat(fd, &sb);
const char *content = (char*)mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
Partitioner::IPDParser(p, content, sb.st_size).parse();
munmap(content, sb.st_size);

Definition at line 1750 of file Partitioner.h.

Constructor & Destructor Documentation

Partitioner::IPDParser::IPDParser ( Partitioner p,
const char *  input,
size_t  len,
const std::string &  input_name = "" 
)
inline

Definition at line 1761 of file Partitioner.h.

Member Function Documentation

void Partitioner::IPDParser::parse ( )

Top-level parsing function.

Referenced by Partitioner::load_config().

static void Partitioner::IPDParser::unparse ( std::ostream &  ,
SgNode ast 
)
static

Unparse an AST into an IPD file.

void Partitioner::IPDParser::skip_space ( )
private
bool Partitioner::IPDParser::is_terminal ( const char *  to_match)
private
bool Partitioner::IPDParser::is_symbol ( const char *  to_match)
private
bool Partitioner::IPDParser::is_string ( )
private
bool Partitioner::IPDParser::is_number ( )
private
void Partitioner::IPDParser::match_terminal ( const char *  to_match)
private
void Partitioner::IPDParser::match_symbol ( const char *  to_match)
private
std::string Partitioner::IPDParser::match_symbol ( )
private
std::string Partitioner::IPDParser::match_string ( )
private
rose_addr_t Partitioner::IPDParser::match_number ( )
private
std::string Partitioner::IPDParser::match_asm ( )
private
bool Partitioner::IPDParser::parse_File ( )
private
bool Partitioner::IPDParser::parse_Declaration ( )
private
bool Partitioner::IPDParser::parse_FuncDecl ( )
private
bool Partitioner::IPDParser::parse_FuncBody ( )
private
bool Partitioner::IPDParser::parse_FuncStmtList ( )
private
bool Partitioner::IPDParser::parse_FuncStmt ( )
private
bool Partitioner::IPDParser::parse_ReturnSpec ( )
private
bool Partitioner::IPDParser::parse_BlockDecl ( )
private
bool Partitioner::IPDParser::parse_BlockBody ( )
private
bool Partitioner::IPDParser::parse_BlockStmtList ( )
private
bool Partitioner::IPDParser::parse_BlockStmt ( )
private
bool Partitioner::IPDParser::parse_Alias ( )
private
bool Partitioner::IPDParser::parse_Successors ( )
private

Member Data Documentation

Partitioner* Partitioner::IPDParser::partitioner
private

Partitioner to be initialized.

Definition at line 1752 of file Partitioner.h.

const char* Partitioner::IPDParser::input
private

Input to be parsed.

Definition at line 1753 of file Partitioner.h.

size_t Partitioner::IPDParser::len
private

Length of input, not counting NUL termination (if any).

Definition at line 1754 of file Partitioner.h.

std::string Partitioner::IPDParser::input_name
private

Optional name of input (usually a file name).

Definition at line 1755 of file Partitioner.h.

size_t Partitioner::IPDParser::at
private

Current parse position w.r.t.

"input".

Definition at line 1756 of file Partitioner.h.

Function* Partitioner::IPDParser::cur_func
private

Non-null when inside a FuncBody nonterminal.

Definition at line 1757 of file Partitioner.h.

BlockConfig* Partitioner::IPDParser::cur_block
private

Non-null when inside a BlockBody nonterminal.

Definition at line 1758 of file Partitioner.h.


The documentation for this class was generated from the following file: