This is the parser for the instruction partitioning data (IPD) files. More...

#include <Partitioner.h>

Collaboration diagram for Partitioner::IPDParser:

Classes
class	Exception
	Exception thrown when something cannot be parsed. More...

Public Member Functions
	IPDParser (Partitioner p, const char input, size_t len, const std::string &input_name="")

void	parse ()
	Top-level parsing function. More...

Static Public Member Functions
static void	unparse (std::ostream &, SgNode *ast)
	Unparse an AST into an IPD file. More...

Private Member Functions
void	skip_space ()

bool	is_terminal (const char *to_match)

bool	is_symbol (const char *to_match)

bool	is_string ()

bool	is_number ()

void	match_terminal (const char *to_match)

void	match_symbol (const char *to_match)

std::string	match_symbol ()

std::string	match_string ()

rose_addr_t	match_number ()

std::string	match_asm ()

bool	parse_File ()

bool	parse_Declaration ()

bool	parse_FuncDecl ()

bool	parse_FuncBody ()

bool	parse_FuncStmtList ()

bool	parse_FuncStmt ()

bool	parse_ReturnSpec ()

bool	parse_BlockDecl ()

bool	parse_BlockBody ()

bool	parse_BlockStmtList ()

bool	parse_BlockStmt ()

bool	parse_Alias ()

bool	parse_Successors ()

Private Attributes
Partitioner *	partitioner
	Partitioner to be initialized. More...

const char *	input
	Input to be parsed. More...

size_t	len
	Length of input, not counting NUL termination (if any). More...

std::string	input_name
	Optional name of input (usually a file name). More...

size_t	at
	Current parse position w.r.t. More...

Function *	cur_func
	Non-null when inside a FuncBody nonterminal. More...

BlockConfig *	cur_block
	Non-null when inside a BlockBody nonterminal. More...

Detailed Description

This is the parser for the instruction partitioning data (IPD) files.

These files are text-based descriptions of the functions and basic blocks used by the partitioner and allow the user to seed the partitioner with additional information that is not otherwise available to the partitioner.

For instance, the analyst may know that a function begins at a certain virtual address but for some reason the partitioner does not discover this address in its normal mode of operation. The analyst can create an IPD file that describes the function so that the Partitioning process finds the function.

An IPD file is able to:

specify an entry address of a function that is otherwise not detected.
give a name to a function that doesn't have one.
specify whether the function ever returns to the caller.
list additional basic blocks that appear in the function.
specify the address of a basic block that is otherwise not detected.
indicate that a basic block is semantically equivalent to another basic block.
override the control-flow successors for a basic block.

The language non-terminals are:

File := Declaration+
Declaration := FuncDecl | BlockDecl
FuncDecl := 'function' Address [Name] [FuncBody]
FuncBody := '{' FuncStmtList '}'
FuncStmtList := FuncStmt [';' FuncStmtList]
FuncStmt := ( Empty | BlockDecl | ReturnSpec )
ReturnSpec := 'return' | 'returns' | 'noreturn'
BlockDecl := 'block' Address Integer [BlockBody]
BlockBody := '{' BlockStmtList '}'
BlockStmtList := BlockStmt [';' BlockStmtList]
BlockStmt := ( Empty | Alias | Successors ) ';'
Alias := 'alias' Address
Successors := ('successor' | 'successors') [SuccessorAddrList|AssemblyCode]
SuccessorAddrList := '{' (AddressList | AddressList '...' | '...') '}'
AddressList := Address ( ',' AddressList )*
Address: Integer
Integer: DECIMAL_INTEGER | OCTAL_INTEGER | HEXADECIMAL_INTEGER
Name: STRING
AssemblyCode: asm '{' ASSEMBLY '}'

Language terminals:

HEXADECIMAL_INTEGER: as in C, for example: 0x08045fe2
OCTAL_INTEGER: as in C, for example, 0775
DECIMAL_INTEGER: as in C, for example, 1234
STRING: double quoted. Use backslash to escape embedded double quotes
ASSEMBLY: x86 assembly instructions (must contain balanced curly braces, if any)

Comments begin with a hash ('#') and continue to the end of the line. The hash character is not treated specially inside quoted strings. Comments within an ASSEMBLY terminal must conform to the syntax accepted by the Netwide Assembler (nasm), namely semicolon in place of a hash.

Semantics

A block declaration specifies the virtual memory address of the block's first instruction. The integer after the address specifies the number of instructions in the block. If the specified length is less than the number of instructions that ROSE would otherwise place in the block at that address, then ROSE will create a block of exactly the specified size. Likewise, if the specified address is midway into a block that ROSE would otherwise create, ROSE will create a block at the specified address anyway, causing the previous instructions to be in a separate block (or blocks). If the specified block size is larger than what ROSE would otherwise place in the block, the block will be created with fewer instructions but the BlockBody will be ignored.

A function declaration specifies the virtual memory address of the entry point of a function. The body may specify whether the function returns. As of this writing [2010-05-13] a function declared as non-returning will be marked as returning if ROSE discovers that a basic block of the function returns.

If a block declaration appears inside a function declaration, then ROSE will assign the block to the function.

The block 'alias' attribute is used to indicate that two basic blocks perform the exact same operation. The specified address is the address of the basic block to use instead of this basic block. All control-flow edges pointing to this block will be rewritten to point to the specified address instead.

Example file:

function 0x805116 "func11" {             # declare a new function named "func11"
    returns;                             # this function returns to callers
    block 0x805116 {                     # block at 0x805116 is part of func11
        alias 0x8052116, 0x8052126       # use block 0x805116 in place of 0x8052116 and 0x8052126
    }
}

Basic Block Successors

A block declaration can specify control-flow successors in two ways: as a list of addresses, or as an x86 assembly language program that's interpretted by ROSE. The benefits of using a program to determine the successors is that the program can directly extract information, such as jump tables, from the specimen executable.

The assembly source code is fed to the Netwide Assembler, nasm (http://www.nasm.us/), which assembles it into i386 machine code. When ROSE needs to figure out the successors for a basic block it will interpret the basic block, then load the successor program and interpret it, then extract the successor list from the program's return value. ROSE interprets the program rather than running it directly so that the program can operate on unknown, symbolic data values rather than actual 32-bit numbers.

The successor program is interpretted in a context that makes it appear to have been called (via CALL instruction) from the end of the basic block being analyzed. These arguments are passed to the program:

The address of an "svec" object to be filled in by the program. The first four-byte word at this address is the number of successor addresses that immediately follow and must be a known value upon return of the program. The following values are the successors–either known values or unknown values.
The size of the "svec" object in bytes. The object is allocated by ROSE and is a fixed size (8192 bytes at the time of this writing–able to hold 2047 successors).
The starting virtual address of the first instruction of the basic block.
The address immediately after the last instruction of the basic block. Depending on the Partitioner settings, basic block may or may not be contiguous in memory.
The value of the stack pointer at the end of the basic block. ROSE creates a new stack before starting the successor program because the basic block's stack might not be at a known memory address.

The successor program may either fall off the end or execute a RET statement.

For instance, if the 5-instruction block at virtual address 0x00c01115 ends with an indirect jump through a 256-element jump table beginning at 0x00c037fa, then a program to compute the successors might look like this:

block 0x00c01115 5 {
  successors asm {
      push ebp
      mov ebp, esp
      ; ecx is the base address of the successors return vector,
      ; the first element of which is the vector size.
      mov ecx, [ebp+8]
      add ecx, 4
      ; loop over the entries in the jump table, copying each
      ; address from the jump table to the svec return value
      xor eax, eax
    loop:
      cmp eax, 256
      je done
      mov ebx, [0x00c037fa+eax*4]
      mov [ecx+eax*4], ebx
      inc eax
      jmp loop
    done:
      ; set the number of entries in the svec
      mov ecx, [ebp+8]
      mov DWORD [ecx], 256
      mov esp, ebp
      pop ebp
      ret

Example Programmatic Usage

The easiest way to parse an IPD file is to read it into memory and then call the parse() method. The following code demonstrates the use of mmap to read the file into memory, parse it, and release it from memory. For simplicity, we do not check for errors in this example.

Partitioner p;
int fd = open("test.ipd", O_RDONLY);
struct stat sb;
fstat(fd, &sb);
const char *content = (char*)mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
Partitioner::IPDParser(p, content, sb.st_size).parse();
munmap(content, sb.st_size);

Definition at line 1750 of file Partitioner.h.

Constructor & Destructor Documentation

Partitioner::IPDParser::IPDParser	(	Partitioner *	p,
		const char *	input,
		size_t	len,
		const std::string &	input_name = `""`
	)

inline

Definition at line 1761 of file Partitioner.h.

Member Function Documentation

void Partitioner::IPDParser::parse ( )

Top-level parsing function.

Referenced by Partitioner::load_config().

static void Partitioner::IPDParser::unparse	(	std::ostream &	,
		SgNode *	ast
	)

static

Unparse an AST into an IPD file.

void Partitioner::IPDParser::skip_space ( )

private

bool Partitioner::IPDParser::is_terminal ( const char * to_match)

private

bool Partitioner::IPDParser::is_symbol ( const char * to_match)

private

bool Partitioner::IPDParser::is_string ( )

private

bool Partitioner::IPDParser::is_number ( )

private

void Partitioner::IPDParser::match_terminal ( const char * to_match)

private

void Partitioner::IPDParser::match_symbol ( const char * to_match)

private

std::string Partitioner::IPDParser::match_symbol ( )

private

std::string Partitioner::IPDParser::match_string ( )

private

rose_addr_t Partitioner::IPDParser::match_number ( )

private

std::string Partitioner::IPDParser::match_asm ( )

private

bool Partitioner::IPDParser::parse_File ( )

private

bool Partitioner::IPDParser::parse_Declaration ( )

private

bool Partitioner::IPDParser::parse_FuncDecl ( )

private

bool Partitioner::IPDParser::parse_FuncBody ( )

private

bool Partitioner::IPDParser::parse_FuncStmtList ( )

private

bool Partitioner::IPDParser::parse_FuncStmt ( )

private

bool Partitioner::IPDParser::parse_ReturnSpec ( )

private

bool Partitioner::IPDParser::parse_BlockDecl ( )

private

bool Partitioner::IPDParser::parse_BlockBody ( )

private

bool Partitioner::IPDParser::parse_BlockStmtList ( )

private

bool Partitioner::IPDParser::parse_BlockStmt ( )

private

bool Partitioner::IPDParser::parse_Alias ( )

private

bool Partitioner::IPDParser::parse_Successors ( )

private

Member Data Documentation

Partitioner* Partitioner::IPDParser::partitioner

private

Partitioner to be initialized.

Definition at line 1752 of file Partitioner.h.

const char* Partitioner::IPDParser::input

private

Input to be parsed.

Definition at line 1753 of file Partitioner.h.

size_t Partitioner::IPDParser::len

private

Length of input, not counting NUL termination (if any).

Definition at line 1754 of file Partitioner.h.

std::string Partitioner::IPDParser::input_name

private

Optional name of input (usually a file name).

Definition at line 1755 of file Partitioner.h.

size_t Partitioner::IPDParser::at

private

Current parse position w.r.t.

"input".

Definition at line 1756 of file Partitioner.h.

Function* Partitioner::IPDParser::cur_func

private

Non-null when inside a FuncBody nonterminal.

Definition at line 1757 of file Partitioner.h.

BlockConfig* Partitioner::IPDParser::cur_block

private

Non-null when inside a BlockBody nonterminal.

Definition at line 1758 of file Partitioner.h.

The documentation for this class was generated from the following file:

Partitioner.h

Classes

Public Member Functions

Static Public Member Functions

Private Member Functions

Private Attributes

Detailed Description

Semantics

Basic Block Successors

Example Programmatic Usage

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation