diffblue-cbmc/doc/cprover-manual/properties.md

11 KiB

Automatically Generating Properties

What is a "Property"?

We have mentioned properties several times so far, but we never explained what kind of properties CBMC can verify. We cover this topic in more detail in this section.

CBMC uses assertions to specify program properties. Assertions are properties of the state of the program when the program reaches a particular program location. Assertions are often written by the programmer using the assert macro.

In addition to the assertions written by the programmer, assertions for specific properties can also be generated automatically by CBMC, often relieving the programmer from expressing properties that should hold in any well-behaved program.

CBMC comes with an assertion generator, which performs a conservative static analysis to determine program locations that potentially contain a bug. Due to the imprecision of the static analysis, it is important to emphasize that these generated assertions are only potential bugs, and that the Model Checker first needs to confirm that they are indeed genuine bugs.

The assertion generator can generate assertions for the verification of the following properties:

  • Buffer overflows. For each array access, check whether the upper and lower bounds are violated.

  • Pointer safety. Search for NULL-pointer dereferences or dereferences of other invalid pointers.

  • Memory leaks. Check whether the program constructs dynamically allocated data structures that are subsequently inaccessible.

  • Division by zero. Check whether there is a division by zero in the program.

  • Not-a-Number. Check whether floating-point computation may result in NaNs.

  • Arithmetic overflow. Check whether a numerical overflow occurs during an arithmetic operation.

  • Undefined shifts. Check for shifts with excessive distance.

We won't explain the properties in detail. Most of them relate to behaviors that are left undefined by the respective language semantics. For a discussion on why these behaviors are usually very undesirable, read this blog post by John Regehr.

All the properties described above are reachability properties. They are always of the form

"Is there a path through the program such that some property is violated?"

The counterexamples to such properties are always program paths. Users of the Eclipse plugin can step through these counterexamples in a way that is similar to debugging programs. The installation of this plugin is explained here.

Using goto-instrument

The goto-instrument static analyzer operates on goto-binaries, which is a binary representation of control-flow graphs. The goto-binary is extracted from program source code using goto-cc, which is explained here. Given a goto-program, goto-instrument operates as follows:

  1. A goto-binary is read in.
  2. The specified static analyses are performed.
  3. Any potential bugs found are transformed into corresponding assertions, and are added into the program.
  4. A new goto-binary (with assertions) is written to disc.

As an example, we begin with small C program we call expr.c (taken from here):

int *ptr;

int main(void) {
  if (ptr)
    *ptr = 0;
  if (!ptr)
    *ptr = 1;
}

The program contains an obvious NULL-pointer dereference. We first compile the example program with goto-cc and then instrument the resulting goto-binary with pointer checks.

goto-cc expr.c -o in.gb   goto-instrument in.gb out.gb --pointer-check

We can now get a list of the assertions that have been generated:

goto-instrument out.gb --show-properties

Using CBMC on out.gb, we can obtain a counterexample trace for the NULL-pointer dereference:

cbmc out.gb

The goto-instrument program supports these checks:

Flag Check
--no-assertions ignore user assertions
--bounds-check add array bounds checks
--div-by-zero-check add division by zero checks
--pointer-check add pointer checks
--signed-overflow-check add arithmetic over- and underflow checks
--unsigned-overflow-check add arithmetic over- and underflow checks
--undefined-shift-check add range checks for shift distances
--nan-check add floating-point NaN checks
--uninitialized-check add checks for uninitialized locals (experimental)
--error-label label check that given label is unreachable

As all of these checks apply across the entire input program, we may wish to disable them for selected statements in the program. For example, unsigned overflows can be expected and acceptable in certain instructions even when elsewhere we do not expect them. To selectively disable automatically generated properties use #pragma CPROVER check disable "<name_of_check>", which remains in effect until a #pragma CPROVER check pop (to re-enable all properties disabled before or since the last #pragma CPROVER check push) is provided. For example, for unsigned overflow checks, use

unsigned foo(unsigned x)
{
#pragma CPROVER check push
#pragma CPROVER check disable "unsigned-overflow"
  x = x + 1; // immediately follows the pragma, no unsigned overflow check here
#pragma CPROVER check pop
  x = x + 2; // unsigned overflow checks are generated here

Generating function bodies

Sometimes implementations for called functions are not available in the goto program, or it is desirable to replace bodies of functions with certain predetermined stubs (for example to confirm that these functions are never called, or to indicate that these functions will never return). For this purpose goto-instrument provides the --generate-function-body option, that takes a regular expression (in ECMAScript syntax) that describes the names of the functions to generate. Note that this will only generate bodies for functions that do not already have one; If one wishes to replace the body of a function with an existing definition, the --remove-function-body option can be used to remove the body of the function prior to generating a new one.

The shape of the stub itself can be chosen with the --generate-function-body-options parameter, which can take these values:

Option Result
nondet-return Do nothing and return a nondet result (this is the default)
assert-false Make the body contain an assert(false)
assume-false Make the body contain an assume(false)
assert-false-assume-false Combines assert-false and assume-false
havoc Set the contents of parameters and globals to nondet

The various combinations of assert-false and assume-false can be used to indicate that functions shouldn't be called, that they will never return or both.

Example: We have a program like this:

// error_example.c
#include <stdlib.h>

void api_error(void);
void internal_error(void);

int main(void)
{
  int arr[10] = {1,2,3,4,5, 6, 7, 8, 9, 10};
  int sum = 0;
  for(int i = 1; i < 10; ++i)
  {
    sum += arr[i];
  }
  if(sum != 55)
  {
    // we made a mistake when calculating the sum
    internal_error();
  }
  if(rand() < 0)
  {
    // we think this cannot happen
    api_error();
  }
  return 0;
}

Now, we can compile the program and detect that the error functions are indeed called by invoking these commands:

goto-cc error_example.c -o error_example.goto
# Replace all functions ending with _error
# (Excluding those starting with __)
# With ones that have an assert(false) body
goto-instrument error_example.goto error_example_replaced.goto \
  --generate-function-body '(?!__).*_error' \
  --generate-function-body-options assert-false
cbmc error_example_replaced.goto

This generates the following output:

** Results:
error_example.c function api_error
[api_error.assertion.1] line 4 assertion false: FAILURE

error_example.c function internal_error
[internal_error.assertion.1] line 5 assertion false: FAILURE

** 2 of 2 failed (2 iterations)
VERIFICATION FAILED

Without the instrumentation step we would have seen "VERIFICATION SUCCESSFUL".

The havoc option takes further parameters globals and params with this syntax: havoc[,globals:<regex>][,params:<regex>] (where the square brackets indicate an optional part). The regular expressions have the same format as the those for the --generate-function-body option and indicate which globals and function parameters should be set to nondet. All regular expressions require exact matches (i.e. the regular expression a|b will match 'a' and 'b' but not 'adrian' or 'bertha').

Example: With a C program like this

struct Complex {
  double real;
  double imag;
};

struct Complex AGlobalComplex;
int do_something_with_complex(struct Complex *complex);

And the command line

goto-instrument in.goto out.goto
  --generate-function-body do_something_with_complex
  --generate-function-body-options
    'havoc,params:.*,globals:AGlobalComplex'

The goto code equivalent of the following will be generated:

int do_something_with_complex(struct Complex *complex)
{
  if(complex)
  {
    complex->real = nondet_double();
    complex->imag = nondet_double();
  }
  AGlobalComplex.real = nondet_double();
  AGlobalComplex.imag = nondet_double();
  return nondet_int();
}

A note on limitations: Because only static information is used for code generation, arrays of unknown size and pointers will not be affected by this. Which means that for code like this:

struct Node {
  int val;
  struct Node *next;
};

void do_something_with_node(struct Node *node);

Code like this will be generated:

void do_something_with_node(struct Node *node)
{
   if(node)
   {
     node->val = nondet_int();
     node->next = nondet_0();
   }
}

Note that no attempt to follow the next pointer is made. If an array of unknown (or 0) size is encountered, a diagnostic is emitted and the array is not further examined.

Some care must be taken when choosing the regular expressions for globals and functions. Names starting with __ are reserved for internal purposes; For example, replacing functions or setting global variables with the __CPROVER prefix might make analysis impossible. To avoid doing this by accident, negative lookahead can be used. For example, (?!__).* matches all names not starting with __.