|
| CodeCriteria () |
|
| CodeCriteria (const RegionStats *mean, const RegionStats *variance, double threshold) |
|
virtual | ~CodeCriteria () |
|
virtual CodeCriteria * | create () const |
|
virtual double | get_mean (size_t id) const |
|
virtual void | set_mean (size_t id, double mean) |
|
virtual double | get_variance (size_t id) const |
|
virtual void | set_variance (size_t id, double variance) |
|
virtual double | get_weight (size_t id) const |
|
virtual void | set_weight (size_t id, double weight) |
|
void | set_value (size_t id, double mean, double variance, double weight) |
|
double | get_threshold () const |
|
void | set_threshold (double th) |
|
virtual double | get_vote (const RegionStats *, std::vector< double > *votes=NULL) const |
|
virtual bool | satisfied_by (const RegionStats *, double *raw_vote_ptr=NULL, std::ostream *debug=NULL) const |
|
virtual void | print (std::ostream &, const RegionStats *stats=NULL, const std::vector< double > *votes=NULL, const double *total_vote=NULL) const |
|
Criteria to decide whether a region of memory contains code.
Ultimately, one often needs to answer the question of whether an arbitrary region of memory contains code or data. A CodeCriteria object can be used to help answer that question. Such an object contains criteria for multiple analyses. The criteria can be initialized by hand, or by running the analyses over parts of the program that we already know to be code (see Partitioner::aggregate_statistics()). In the latter case, the criteria are automatically fine tuned based on characteristics of the specimen executable itself.
Each criterion is assumed to have a Gaussian distribution (this class can be specialized if something else is needed) and therefore stores a mean and variance. Each criterion also stores a weight relative to the other criteria.
To determine the probability that a sample contains code, the analyses, , are run over the sample to produce a set of analysis results . Each analysis result is compared against the corresponding probability density function to obtain the likelihood (in the range zero to one) that the sample is code. The probability density function is characterized by the criterion mean, , and variance . The Guassian probability distribution function is:
The likelihood, that is representative of valid code is computed as the area under the probability density curve further from the mean value than . In other words:
A criterion that has an undefined value does not contribute to the final vote. Similarly, criteria that have zero variance contribute a vote of zero or one:
The individual probabilities from each analysis are weighted relative to one another to obtain a final probability, which is then compared against a threshold. If the probability is equal to or greater than the threshold, then the sample is considered to be code.
The Partitioner never instantiates a CodeCriteria object directly, but rather always uses the new_code_criteria() virtual method. This allows the user to easily augment this class to do something more interesting.
Here's an example of using this class to determine if some uncategorized region of memory contains code. First we compute aggregate statistics across all the known functions. Then we use the mean and variance in those statistics to create a code criteria specification. Then we run the same analyses over the uncategorized region of memory and ask whether the results satisfy the criteria. This example is essentially the implementation of Partitioner::is_code().
partitioner->aggregate_statistics();
std::cout <<"this looks like code" <<std::endl;
delete stats;
delete cc;
Definition at line 841 of file Partitioner.h.