715 lines
30 KiB
TeX
715 lines
30 KiB
TeX
\documentclass{article}
|
|
|
|
\newcommand{\dir}[1]{\texttt{#1}}
|
|
\newcommand{\file}[1]{\texttt{#1}}
|
|
\newcommand{\code}[1]{\texttt{#1}}
|
|
\newcommand{\prog}[1]{\texttt{#1}}
|
|
|
|
\title{Beginner's Guide to CPROVER}
|
|
\author{Martin Brain\thanks{But most of the content is from Michael Tautschnig}}
|
|
|
|
\begin{document}
|
|
|
|
\maketitle
|
|
|
|
\section{Background Information}
|
|
|
|
First off; read the CPROVER manual. It describes how to get, build
|
|
and use CBMC and SATABS. This document covers the internals of the
|
|
system and how to get started on development.
|
|
|
|
|
|
\subsection{Documentation}
|
|
|
|
Apart from the (user-orientated) CPROVER manual and this document,
|
|
most of the rest of the documentation is inline in the code
|
|
as \texttt{doxygen} and some comments. A man page for CBMC, goto-cc
|
|
and goto-instrument is contained in the \dir{doc/} directory and gives
|
|
some options for these tools. All of these could be improved
|
|
and patches are very welcome. In some cases the algorithms used are
|
|
described in the relevant papers.
|
|
|
|
\subsection{Architecture}
|
|
|
|
CPROVER is structured in a similar fashion to a compiler. It has
|
|
language specific front-ends which perform limited syntactic analysis
|
|
and then convert to an intermediate format. The intermediate format
|
|
can be output to files (this is what \texttt{goto-cc} does) and are
|
|
(informally) referred to as ``goto binaries'' or ``goto programs''.
|
|
The back-end are tools process this format, either directly from the
|
|
front-end or from it's saved output. These include a wide range of
|
|
analysis and transformation tools (see Section \ref{section:other-apps}).
|
|
|
|
\subsection{Coding Standards}
|
|
|
|
CPROVER is written in a fairly minimalist subset of C++; templates and
|
|
meta-programming are avoided except where necessary. The standard
|
|
library is used but in many cases there are alternatives provided in
|
|
\dir{util/} (see Section \ref{section:util}) which are preferred.
|
|
Boost is not used.
|
|
|
|
Patches should be formatted so that code is indented with two space
|
|
characters, not tab and wrapped to 75 or 72 columns. Headers for
|
|
doxygen should be given (and preferably filled!) and the author will
|
|
be the person who first created the file.
|
|
|
|
Identifiers should be lower case with underscores to separate words.
|
|
Types (classes, structures and typedefs) names must\footnote{There are
|
|
a couple of exceptions, including the graph classes} end with a
|
|
\code{t}. Types that model types (i.e. C types in the program that is
|
|
being interpreted) are named with \code{\_typet}.
|
|
For example \code{ui\_message\_handlert} rather than
|
|
\code{UI\_message\_handlert} or \code{UIMessageHandler} and
|
|
\code{union\_typet}.
|
|
|
|
|
|
|
|
\subsection{How to Contribute}
|
|
|
|
Fixes, changes and enhancements to the CPROVER code base should be
|
|
developed against the \texttt{trunk} version and submitted to Daniel
|
|
as patches produced by \texttt{diff -Naur} or \texttt{svn diff}.
|
|
Entire applications are best developed independently (\texttt{git svn}
|
|
is a popular choice for tracking the main trunk but also having local
|
|
development) until it is clear what their utility, future and
|
|
maintenance is likely to be.
|
|
|
|
|
|
\subsection{Other Useful Code}
|
|
\label{section:other-apps}
|
|
|
|
The CPROVER subversion archive contains a number of separate
|
|
programs. Others are developed separately as patches or separate
|
|
branches.% New applications are initially developed in their version
|
|
%control system and may be merged into the main subversion system
|
|
%depending on their utility, popularity and maintenance.
|
|
Interfaces are have been and are continuing to stablise but older code
|
|
may require work to compile and function correctly.
|
|
|
|
In the main archive:
|
|
|
|
\begin{description}
|
|
\item[\prog{CBMC}]{A bounded model checking tool for C and C++. See
|
|
Section \ref{section:CBMC}.}
|
|
\item[\prog{goto-cc}]{A drop-in, flag compatible replacement for GCC
|
|
and other compilers that produces goto-programs rather than
|
|
executable binaries. See Section \ref{section:goto-cc}.}
|
|
\item[\prog{goto-instrument}]{A collection of functions for
|
|
instrumenting and modifying goto-programs. See Section
|
|
\ref{section:goto-instrument}.}
|
|
\end{description}
|
|
|
|
Model checkers and similar tools:
|
|
|
|
\begin{description}
|
|
\item[\prog{SatABS}]{A CEGAR model checker using predicate
|
|
abstraction. Is roughly 10,000 lines of code (on top of the CPROVER
|
|
code base) and is developed in its own subversion archive. It
|
|
uses an external model checker to find potentially feasible paths.
|
|
Key limitations are related to code with pointers and there is
|
|
scope for significant improvement.}
|
|
|
|
\item[\prog{Scratch}]{Alistair Donaldson's k-induction based tool.
|
|
The front-end is in the old project CVS and some of the
|
|
functionality is in \prog{goto-instrument}.}
|
|
|
|
\item[\prog{Wolverine}]{An implementation of Ken McMillan's IMPACT
|
|
algorithm for sequential programs. In the old project CVS.}
|
|
|
|
\item[\prog{C-Impact}]{An implementation of Ken McMillan's IMPACT
|
|
algorithm for parallel programs. In the old project CVS.}
|
|
|
|
\item[\prog{LoopFrog}]{A loop summarisation tool.}
|
|
|
|
\item[\prog{???}]{Christoph's termination analyser.}
|
|
|
|
\end{description}
|
|
|
|
|
|
Test case generation:
|
|
|
|
\begin{description}
|
|
\item[\prog{cover}]{A basic test-input generation tool. In the old
|
|
project CVS.}
|
|
|
|
\item[\prog{FShell}]{A test-input generation tool that allows the
|
|
user to specify the desired coverage using a custom language
|
|
(which includes regular expressions over paths). It uses
|
|
incremental SAT and is thus faster than the na\"ive ``add
|
|
assertions one at a time and use the counter-examples''
|
|
approach. Is developed in its own subversion.}
|
|
\end{description}
|
|
|
|
|
|
|
|
Alternative front-ends and input translators:
|
|
|
|
\begin{description}
|
|
\item[\prog{Scoot}]{A System-C to C translator. Probably in the old
|
|
project CVS.}
|
|
|
|
\item[\prog{???}]{A Simulink to C translator. In the old project CVS.}
|
|
|
|
\item[\prog{???}]{A Verilog front-end. In the old project CVS.}
|
|
|
|
\item[\prog{???}]{A converter from Codewarrior project files to
|
|
Makefiles. In the old project CVS.}
|
|
\end{description}
|
|
|
|
|
|
Other tools:
|
|
|
|
\begin{description}
|
|
\item[\prog{ai}]{Leo's hybrid abstract interpretation / CEGAR tool.}
|
|
|
|
\item[\prog{DeltaCheck?}]{Ajitha's slicing tool, aimed at locating
|
|
changes and differential verification. In the old project CVS.}
|
|
\end{description}
|
|
|
|
|
|
There are tools based on the CPROVER framework from other research
|
|
groups which are not listed here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\section{Source Walkthrough}
|
|
|
|
This section walks through the code bases in a rough order of interest
|
|
/ comprehensibility to the new developer.
|
|
|
|
|
|
|
|
\subsection{\dir{doc}}
|
|
|
|
At the moment just contains the CBMC man page.
|
|
|
|
\subsection{\dir{regression/}}
|
|
|
|
The regression tests are currently being moved from CVS. The
|
|
\dir{regression/} directory contains all of those that have been
|
|
moved. They are grouped into directories for each of the tools. Each
|
|
of these contains a directory per test case, containing a C or C++
|
|
file that triggers the bug and a \file{.dsc} file that describes the
|
|
tests, expected output and so on. There is a Perl script,
|
|
\file{test.pl} that is used to invoke the tests as:
|
|
|
|
\begin{center}
|
|
\code{../test.pl -c PATH\_TO\_CBMC}
|
|
\end{center}
|
|
|
|
The \code{--help} option gives instructions for use and the format of
|
|
the description files.
|
|
|
|
|
|
|
|
\subsection{\dir{src/}}
|
|
|
|
The source code is divided into a number of sub-directories, each
|
|
containing the code for a different part of the system. In the top
|
|
level files there are only a few files:
|
|
|
|
\begin{description}
|
|
\item[\file{config.inc}]{The user-editable configuration parameters
|
|
for the build process. The main use of this file is setting the
|
|
paths for the various external SAT solvers that are used. As
|
|
such, anyone building from source will likely need to edit this.}
|
|
\item[\file{Makefile}]{The main systems Make file. Parallel builds
|
|
are supported and encouraged; please don't break them!}
|
|
\item[\file{common}]{System specific magic required to get the
|
|
system to build. This should only need to be edited if porting
|
|
CBMC to a new platform / build environment.}
|
|
\item[\file{doxygen.cfg}]{The config file for doxygen.cfg}
|
|
\end{description}
|
|
|
|
|
|
|
|
\subsubsection{\dir{util/}}
|
|
\label{section:util}
|
|
|
|
\dir{util/} contains the low-level data structures and manipulation
|
|
functions that are used through-out the CPROVER code-base. For almost
|
|
any low-level task, the code required is probably in \dir{util/}. Key
|
|
files include:
|
|
|
|
\begin{description}
|
|
\item[\file{irep.h}]{This contains the definition of \code{irept},
|
|
the basis of many of the data structures in the project. They
|
|
should not be used directly; one of the derived classes should be
|
|
used. For more information see Section \ref{section:irept}.}
|
|
\item[\file{expr.h}]{The parent class for all of the expressions.
|
|
Provides a number of generic functions, \code{exprt} can be used
|
|
with these but when creating data, subclasses of \code{exprt}
|
|
should be used.}
|
|
\item[\file{std\_expr.h}]{Provides subclasses of \code{exprt} for
|
|
common kinds of expression for example \code{plus\_exprt},
|
|
\code{minus\_exprt}, \code{dereference\_exprt}. These are the
|
|
intended interface for creating expressions.}
|
|
\item[\file{std\_types.h}]{Provides subclasses of \code{typet} (a
|
|
subclass of \code{irept}) to model C and C++ types. This is one
|
|
of the preferred interfaces to \code{irept}. The front-ends handle
|
|
type promotion and most coercision so the type system and checking
|
|
goto-programs is simpler than C.}
|
|
\item[\file{dstring.h}]{The CPROVER string class. This enables
|
|
sharing between strings which significantly reduces the amount of
|
|
memory required and speeds comparison. \code{dstring} should not
|
|
be used directly, \code{irep\_idt} should be used instead, which
|
|
(dependent on build options) is an alias for \code{dstring}.}
|
|
\item[\file{mp\_arith.h}]{The wrapper class for multi-precision
|
|
arithmetic within CPROVER. Also see \file{arith\_tools.h}.}
|
|
\item[\file{ieee\_float.h}]{The arbitrary precision float model used
|
|
within CPROVER. Based on \code{mp\_integer}s.}
|
|
\item[\file{context.h}]{A generic container for symbol table like
|
|
constructs such as namespaces. Lookup gives type, location of
|
|
declaration, name, `pretty name', whether it is static or not.}
|
|
\item[\file{namespace.h}]{The preferred interface for the context
|
|
class. The key function is \code{lookup} which converts a string
|
|
(\code{irep\_idt}) to a symbol which gives the scope of
|
|
declaration, type and so on. This works for functions as well as variables.}
|
|
\end{description}
|
|
|
|
|
|
|
|
|
|
\subsubsection{\dir{langapi/}}
|
|
|
|
This contains the basic interfaces and support classes for programming
|
|
language front ends. Developers only really need look at this if they
|
|
are adding support for a new language. It's main users are the two
|
|
(in trunk) language front-ends; \dir{ansi-c/} and \dir{cpp/}.
|
|
|
|
|
|
\subsubsection{\dir{ansi-c/}}
|
|
|
|
Contains the front-end for ANSI C, plus a variety of common
|
|
extensions. This parses the file, performs some basic sanity checks
|
|
(this is one area in which the UI could be improved; patches most
|
|
welcome) and then produces a goto-program (see below). The parser is
|
|
a traditional Flex / Bison system.
|
|
|
|
\file{internal\_addition.c} contains the implementation of various
|
|
`magic' functions that are that allow control of the analysis from the
|
|
source code level. These include assertions, assumptions, atomic
|
|
blocks, memory fences and rounding modes.
|
|
|
|
The \dir{library/} subdirectory contains versions of some of the C
|
|
standard header files that make use of the CPROVER built-in
|
|
functions. This allows CPROVER programs to be `aware' of the
|
|
functionality and model it correctly. Examples include
|
|
\file{stdio.c}, \file{string.c}, \file{setjmp.c} and various threading
|
|
interfaces.
|
|
|
|
|
|
\subsubsection{\dir{cpp/}}
|
|
|
|
This directory contains the C++ front-end. It supports the subset of
|
|
C++ commonly found in embedded and system applications.
|
|
Consequentially it doesn't have full support for templates and many of
|
|
the more advanced and obscure C++ features. The subset of the
|
|
language that can be handled is being extended over time so bug
|
|
reports of programs that cannot be parsed are useful.
|
|
|
|
The functionality is very similar to the ANSI C front end; parsing the
|
|
code and converting to goto-programs. It makes use of code from
|
|
\dir{langapi} and \dir{ansi-c}.
|
|
|
|
|
|
|
|
|
|
\subsubsection{\dir{goto-programs/}}
|
|
|
|
Goto programs are the intermediate representation of the CPROVER tool
|
|
chain. They are language independent and similar to many of the
|
|
compiler intermediate languages. Section \ref{section:goto-programs}
|
|
describes the \code{goto\_programt} and \code{goto\_functionst} data
|
|
structures in detail. However it useful to understand some of the
|
|
basic concepts. Each function is a list of instructions, each of
|
|
which has a type (one of 18 kinds of instruction), a code expression,
|
|
a guard expression and potentially some targets for the next
|
|
instruction. They are not natively in static single-assign (SSA)
|
|
form. Transitions are nondeterministic (although in practise the
|
|
guards on the transitions normally cover form a disjoint cover of all
|
|
possibilities). Local variables have non-deterministic values if they
|
|
are not initialised. Variables and data within the program is
|
|
commonly one of three types (parameterised by width):
|
|
\code{unsignedbv\_typet}, \code{signedbv\_typet} and
|
|
\code{floatbv\_typet}, see \file{util/std\_types.h} for more
|
|
information. Goto programs can be serialised in a binary (wrapped in
|
|
ELF headers) format or in XML (see the various \code{\_serialization}
|
|
files).
|
|
|
|
|
|
The \prog{cbmc} option \code{--show-goto-programs} is often a
|
|
good starting point as it outputs goto-programs in a human
|
|
readable form. However there are a few things to be aware of.
|
|
Functions have an internal name (for example \code{c::f00}) and a
|
|
`pretty name' (for example \code{f00}) and which is used depends on
|
|
whether it is internal or being presented to the user. The
|
|
\code{main} method is the `logical' main which is not necessarily the
|
|
main method from the code. In the output \code{NONDET} is use to
|
|
represent a nondeterministic assignment to a variable. Likewise
|
|
\code{IF} as a beautified \code{GOTO} instruction where the guard
|
|
expression is used as the condition. \code{RETURN} instructions may
|
|
be dropped if they precede an \code{END\_FUNCTION} instruction. The
|
|
comment lines are generated from the \code{locationt} field of the
|
|
\code{instructiont} structure.
|
|
|
|
\dir{goto-programs/} is one of the few places in the CPROVER codebase
|
|
that templates are used. The intention is to allow the general
|
|
architecture of program and functions to be used for other
|
|
formalisms. At the moment most of the templates have a single
|
|
instantiation; for example \code{goto\_functionst} and
|
|
\code{goto\_function\_templatet} and \code{goto\_programt} and \code{goto\_program\_templatet}.
|
|
|
|
|
|
|
|
\subsubsection{\dir{goto-symex/}}
|
|
|
|
This directory contains a symbolic evaluation system for
|
|
goto-programs. This takes a goto-program and translates it to an
|
|
equation system by traversing the program, branching and merging and
|
|
unwinding loops as needed. Each reverse goto has a separate counter
|
|
(the actual counting is handled by \prog{cbmc}, see the \code{--unwind}
|
|
and \code{--unwind-set} options). When a counter limit
|
|
is reach, an assertion can be added to explicitly show when analysis
|
|
is incomplete. The symbolic execution includes constant folding so
|
|
loops that have a constant number of iterations will be handled
|
|
completely (assuming the unwinding limit is sufficient).
|
|
|
|
The output of the symbolic execution is a system of equations; an
|
|
object containing a list of \code{symex\_target\_elements}, each of
|
|
which are equalities between \prog{expr} expressions. See
|
|
\file{symex\_target\_equation.h}. The output is in static, single
|
|
assignment (SSA) form, which is \emph{not} the case for goto-programs.
|
|
|
|
|
|
|
|
\subsubsection{\dir{pointer-analysis/}}
|
|
|
|
To perform symbolic execution on programs with dereferencing of
|
|
arbitrary pointers, some alias analysis is needed.
|
|
\dir{pointer-analysis} contains the three levels of analysis; flow and
|
|
context insensitive, context sensitive and flow and context
|
|
sensitive. The code needed is subtle and sophisticated and thus there
|
|
may be bugs.
|
|
|
|
|
|
|
|
|
|
\subsubsection{\dir{solvers/}}
|
|
|
|
The \dir{solvers/} directory contains interfaces to a number of
|
|
different decision procedures, roughly one per directory.
|
|
|
|
\begin{description}
|
|
|
|
\item[prop/]{The basic and common functionality. The key file is
|
|
\file{prop\_conv.h} which defines \code{prop\_convt}. This is the
|
|
base class that is used to interface to the decision procedures.
|
|
The key functions are \code{convert} which takes an \code{exprt}
|
|
and converts it to the appropriate, solver specific, data
|
|
structures and \code{dec\_solve} (inherited from
|
|
\code{decision\_proceduret}) which invokes the actual decision
|
|
procedures. Individual decision procedures (named
|
|
\code{*\_dect}) objects can be created but \code{prop\_convt} is
|
|
the preferred interface for code that uses them.}
|
|
|
|
\item[flattening/]{A library that converts operations to
|
|
bit-vectors, including calling the conversions in \dir{floatbv} as
|
|
necessary. Is implemented as a simple conversion (with caching)
|
|
and then a post-processing function that adds extra constraints.
|
|
This is not used by the SMT or CVC back-ends.}
|
|
|
|
%%%%
|
|
|
|
\item[dplib/]{Provides the \code{dplib\_dect} object which used the
|
|
decision procedure library from ``Decision Procedures : An
|
|
Algorithmic Point of View''.}
|
|
|
|
\item[cvc/]{Provides the \code{cvc\_dect} type which interfaces to
|
|
the old (pre SMTLib) input format for the CVC family of solvers.
|
|
This format is still supported by depreciated in favour of SMTLib 2.}
|
|
|
|
\item[smt1/]{Provides the \code{smt1\_dect} type which converts the
|
|
formulae to SMTLib version 1 and then invokes one of Boolector,
|
|
CVC3, OpenSMT, Yices, MathSAT or Z3. Again, note that this format
|
|
is depreciated.}
|
|
|
|
\item[smt2/]{Provides the \code{smt2\_dect} type which functions in
|
|
a similar way to \code{smt1\_dect}, calling Boolector, CVC3,
|
|
MathSAT, Yices or Z3. Note that the interaction with the solver
|
|
is batched and uses temporary files rather than using the
|
|
interactive command supported by SMTLib 2. With the \code{--fpa}
|
|
option, this output mode will not flatten the floating point
|
|
arithmetic and instead output the proposed SMTLib floating point
|
|
standard.}
|
|
|
|
\item[qbf/]{Back-ends for a variety of QBF solvers. Appears to be
|
|
no longer used or maintained.}
|
|
|
|
\item[sat/]{Back-ends for a variety of SAT solvers and DIMACS
|
|
output.}
|
|
\end{description}
|
|
|
|
|
|
|
|
|
|
|
|
\subsubsection{\dir{cbmc/}}
|
|
\label{section:CBMC}
|
|
|
|
This contains the first full application. CBMC is a bounded model
|
|
checker that uses the front ends (\dir{ansi-c}, \dir{cpp}, goto-program or others)
|
|
to create a goto-program, \dir{goto-symex} to unwind the loops the given
|
|
number of times and to produce and equation system and finally
|
|
\dir{solvers} to find a counter-example (technically, \dir{goto-symex}
|
|
is then used to construct the counter-example trace).
|
|
|
|
|
|
|
|
\subsubsection{\dir{goto-cc/}}
|
|
\label{section:goto-cc}
|
|
|
|
\dir{goto-cc} is a compiler replacement that just performs the first
|
|
step of the process; converting C or C++ programs to goto-binaries.
|
|
It is intended to be dropped in to an existing build procedure in
|
|
place of the compiler, thus it emulates flags that would affect the
|
|
semantics of the code produced. Which set of flags are emulated
|
|
depends on the naming of the \dir{goto-cc/} binary. If it is called
|
|
\prog{goto-cc} then it emulates GCC flags, \prog{goto-armcc} emulates
|
|
the ARM compiler, \prog{goto-cl} emulates VCC and \prog{goto-cw}
|
|
emulates the Code Warrior compiler. The output of this tool can then
|
|
be used with \prog{cbmc} or \prog{goto-instrument}.
|
|
|
|
|
|
|
|
|
|
\subsubsection{\dir{goto-instrument/}}
|
|
\label{section:goto-instrument}
|
|
|
|
The \dir{goto-instrument/} directory contains a number of tools, one
|
|
per file, that are built into the \prog{goto-instrument} program. All
|
|
of them take in a goto-program (produced by \prog{goto-cc}) and either
|
|
modify it or perform some analysis. Examples include
|
|
\file{nondet\_static.cpp} which initialises static variables to a
|
|
non-deterministic value, \file{nondet\_volatile.cpp} which assigns
|
|
a non-deterministic value to any volatile variable before it is read
|
|
and \file{weak\_memory.h} which performs the necessary transformations
|
|
to reason about weak memory models. The exception to the ``one file
|
|
for each piece of functionality'' rule are the program instrumentation
|
|
options (mostly those given as ``Safety checks'' in the
|
|
\prog{goto-instrument} help text) which are included in the
|
|
\prog{goto-program/} directory. An example of this is
|
|
\file{goto-program/stack\_depth.h} and the general rule seems to be
|
|
that transformations and instrumentation that \prog{cbmc} uses should
|
|
be in \dir{goto-program/}, others should be in \dir{goto-instrument}.
|
|
|
|
\prog{goto-instrument} is a very good template for new analysis
|
|
tools. New developers are advised to copy the directory, remove all
|
|
files apart from \file{main.*}, \file{parseoptions.*} and the
|
|
\file{Makefile} and use these as the skeleton of their application.
|
|
The \code{doit()} method in \file{parseoptions.cpp} is the preferred
|
|
location for the top level control for the program.
|
|
|
|
|
|
|
|
\subsubsection{\dir{linking/}}
|
|
|
|
Probably the code to emulate a linker. This allows multiple `object
|
|
files' (goto-programs) to be linked into one `executable' (another
|
|
goto-program), thus allowing existing build systems to be used to
|
|
build complete goto-program binaries.
|
|
|
|
|
|
\subsubsection{\dir{big-int/}}
|
|
|
|
CPROVER is distributed with its own multi-precision arithmetic
|
|
library; mainly for historical and portability reasons. The library is externally
|
|
developed and thus \dir{big-int} contains the source as it is
|
|
distributed. This should not be used directly, see
|
|
\file{util/mp\_arith.h} for the CPROVER interface.
|
|
|
|
|
|
|
|
\subsubsection{\dir{xmllang/}}
|
|
|
|
CPROVER has optional XML output for results and there is an XML format
|
|
for goto-programs. It is used to interface to various IDEs. The
|
|
\dir{xmllang/} directory contains the parser and helper functions for
|
|
handling this format.
|
|
|
|
|
|
|
|
\subsubsection{\dir{floatbv/}}
|
|
|
|
This library contains the code that is used to convert floating point
|
|
variables (\code{floatbv}) to bit vectors (\code{bv}). This is
|
|
referred to as `bit-blasting' and is called in the \dir{solver} code
|
|
during conversion to SAT or SMT. It also contains the abstraction
|
|
code described in the FMCAD09 paper.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\section{Data Structures}
|
|
|
|
This section discusses some of the key data-structures used in the
|
|
CPROVER codebase.
|
|
|
|
\subsection{\code{irept}}
|
|
\label{section:irept}
|
|
|
|
There are a large number of kind of tree structured or tree-like data
|
|
in CPROVER. \code{irept} provides a single, unified representation for
|
|
all of these, allowing structure sharing and reference counting of
|
|
data. As such \code{irept} is the basic unit of data in CPROVER.
|
|
Each \code{irept} contains\footnote{Or references, if reference
|
|
counted data sharing is enabled. It is enabled by default; see the
|
|
\code{SHARING} macro.} a basic unit of data (of type \code{dt})
|
|
which contains four things:
|
|
|
|
\begin{description}
|
|
\item[\code{data}]{A string\footnote{When \code{USE\_DSTRING} is enabled (it
|
|
is by default), this is actually a \code{dstring} and thus an
|
|
integer which is a reference into a string table}, which is
|
|
returned when the \code{id()} function is used.}
|
|
\item[\code{named\_sub}]{A map from \code{irep\_namet} (a string) to
|
|
an \code{irept}. This is used for named children,
|
|
i.e. subexpressions, parameters, etc.}
|
|
\item[\code{comments}]{Another map from \code{irep\_namet} to
|
|
\code{irept} which is used for annotations and other `non-semantic' information}
|
|
\item[\code{sub}]{A vector of \code{irept} which is used to store
|
|
ordered but unnamed children.}
|
|
\end{description}
|
|
|
|
The \code{irept::pretty} function outputs the contents of an
|
|
\code{irept} directly and can be used to understand an debug problems
|
|
with \code{irept}s.
|
|
|
|
|
|
On their own \code{irept}s do not ``mean'' anything; they are
|
|
effectively generic tree nodes. Their interpretation depends on the
|
|
contents of result of the \code{id} function (the \code{data}) field.
|
|
\file{util/irep\_ids.txt} contains the complete list of \code{id}
|
|
values. During the build process it is used to generate
|
|
\file{util/irep\_ids.h} which gives constants for each id (named
|
|
\code{ID\_*}). These can then be used to identify what kind of data
|
|
\code{irept} stores and thus what can be done with it.
|
|
|
|
To simplify this process, there are a variety of classes that inherit
|
|
from \code{irept}, roughly corresponding to the ids listed
|
|
(i.e. \code{ID\_or} (the string \code{"or''}) corresponds to the class
|
|
\code{or\_exprt}). These give semantically relevant accessor
|
|
functions for the data; effectively different APIs for the same
|
|
underlying data structure. None of these classes add fields (only
|
|
methods) and so static casting can be used. The inheritance graph of
|
|
the subclasses of \code{irept} is a useful starting point for working
|
|
out how to manipulate data.
|
|
|
|
There are three main groups of classes (or APIs); those derived from
|
|
\code{typet}, \code{codet} and \code{exprt} respectively. Although
|
|
all of these inherit from \code{irept}, these are the most abstract
|
|
level that code should handle data. If code is manipulating plain
|
|
\code{irept}s then something is wrong with the architecture of the
|
|
code.
|
|
|
|
Many of the key descendent of \code{exprt} are declared in
|
|
\file{std\_expr.h}. All expressions have a named subfield /
|
|
annotation which gives the type of the expression (slightly
|
|
simplified from C/C++ as \code{unsignedbv\_typet},
|
|
\code{signedbv\_typet}, \code{floatbv\_typet}, etc.). All type
|
|
conversions are explicit with an expression with \code{id() ==
|
|
ID\_typecast} and an `interface class' named
|
|
\code{typecast\_exprt}. One key descendent of \code{exprt} is
|
|
\code{symbol\_exprt} which creates \code{irept} instances with the id
|
|
of ``symbol''. These are used to represent variables; the name of
|
|
which can be found using the \code{get\_identifier} accessor function.
|
|
|
|
|
|
\code{codet} inherits from \code{exprt} and is defined in
|
|
\file{std\_code.h}. They represent executable code; statements in C
|
|
rather than expressions. In the front-end there are versions of these
|
|
that hold whole code blocks, but in goto-programs these have been
|
|
flattened so that each \code{irept} represents one sequence point
|
|
(almost one line of code / one semi-colon). The most common
|
|
descendents of \code{codet} are \code{code\_assignt} so a common
|
|
pattern is to cast the \code{codet} to an assignment and then recurse
|
|
on the expression on either side.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{\code{goto-programs}}
|
|
\label{section:goto-programs}
|
|
|
|
The common starting point for working with goto-programs is the
|
|
\code{read\_goto\_binary} function which populates an object of
|
|
\code{goto\_functionst} type. This is defined in
|
|
\file{goto\_functions.h} and is an instantiation of the template
|
|
\code{goto\_functions\_templatet} which is contained in
|
|
\file{goto\_functions\_template.h}. They are wrappers around a map from
|
|
strings to \code{goto\_programt}'s and iteration macros are provided.
|
|
Note that \code{goto\_function\_templatet} (no \code{s}) is defined in
|
|
the same header as \code{goto\_functions\_templatet} and is gives the
|
|
C type for the function and Boolean which indicates whether the body
|
|
is available (before linking this might not always be true). Also
|
|
note the slightly counter-intuitive naming; \code{goto\_functionst}
|
|
instances are the top level structure representing the program and
|
|
contain \code{goto\_programt} instances which represent the individual
|
|
functions. At the time of writing \code{goto\_functionst} is the only
|
|
instantiation of the template \code{goto\_functions\_templatet} but
|
|
other could be produced if a different data-structures / kinds of models
|
|
were needed for functions.
|
|
|
|
\code{goto\_programt} is also an instantiation of a template. In a
|
|
similar fashion it is \code{goto\_program\_templatet} and allows the
|
|
types of the guard and expression used in instructions to be
|
|
parameterised. Again, this is currently the only use of the template.
|
|
As such there are only really helper functions in
|
|
\file{goto\_program.h} and thus \code{goto\_program\_template.h} is
|
|
probably the key file that describes the representation of (C)
|
|
functions in the goto-program format. It is reasonably stable and
|
|
reasonably documented and thus is a good place to start looking at the code.
|
|
|
|
An instance of \code{goto\_program\_templatet} is effectively a list
|
|
of instructions (and inner template called \code{instructiont}). It
|
|
is important to use the copy and insertion functions that are provided
|
|
as iterators are used to link instructions to their predecessors and
|
|
targets and careless manipulation of the list could break these.
|
|
Likewise there are helper macros for iterating over the instructions
|
|
in an instance of \code{goto\_program\_templatet} and the use of these
|
|
is good style and strongly encouraged.
|
|
|
|
Individual instructions are instances of type \code{instructiont}.
|
|
They represent one step in the function. Each has a type, an instance
|
|
of \code{goto\_program\_instruction\_typet} which denotes what kind of
|
|
instruction it is. They can be computational (such as \code{ASSIGN}
|
|
or \code{FUNCTION\_CALL}), logical (such as \code{ASSUME} and
|
|
\code{ASSERT}) or informational (such as \code{LOCATION} and
|
|
\code{DEAD}). At the time of writing there are 18 possible values for
|
|
\code{goto\_program\_instruction\_typet} / kinds of instruction.
|
|
Instructions also have a guard field (the condition under which it is
|
|
executed) and a code field (what the instruction does). These may be
|
|
empty depending on the kind of instruction. In the default
|
|
instantiations these are of type \code{exprt} and \code{codet}
|
|
respectively and thus covered by the previous discussion of
|
|
\code{irept} and its descendents. The next instructions (remembering
|
|
that transitions are guarded by non-deterministic) are given by the
|
|
list \code{targets} (with the corresponding list of labels
|
|
\code{labels}) and the corresponding set of previous instructions is
|
|
get by \code{incoming\_edges}. Finally \code{instructiont} have
|
|
informational \code{function} and \code{location} fields that indicate
|
|
where they are in the code.
|
|
|
|
|
|
|
|
\end{document}
|