12 Jun 2019 - tsp
Last update 27 Nov 2019
19 mins
Disclaimer: This is only a short primer that shows basic principles of formal verification of C programs. This article may be updates at a later point in time. Note that formal verification is very different than most programmers think (as one has to clearly specify behaviour - as one should normally think about software but - let’s be fair - normally programms don’t do). Formal verification takes a lot more time than just hacking something and is not really required in most circumstances where basic testing (unit tests, smoke tests, etc.) is sufficient. But it adds the ability to be really sure about properties about program behaviour which is interesting in critical applications.
Frama-C is a collection of tools to help with static and dynamic analysis of software. It can be used on plain C code but is mostly used with ACSL annotated C code that defines some expected behaviour (Hoare logic). Frama-C is a tool developed at Inria and by the French Alternative Energies and Atomic Energy Commission (CEA). It has a modular plugin architecture that provides
It depends on the C intermediate language during parsing - Frama-C does support most of but not all of ANSI C. There is one major feature often used in C that currently cannot be handeled by Frama-C: Function pointers.
Note that formal verification does not proof that your program is errorless. It can only proof that the program fulfills some properties that you require. If you specify the wrong properties wrong behaving programs are of course also verified correctly to conform to the wrong specification.
The weakest precondition calculus is used to proof that a codeblock C
indeed transforms
a set of preconditions P
into a set of postconditions Q
. These three parts form
the Horae tripple {P,C,Q}
. For forward calculations one could in theory calculate if the
specified postconditions is as strong or weaker than the calculated postcondition (based on the
preconditions P
transformed by the transformation C
) to reason that the proof is
satisfied. For weakest precondition the proof is normally done backwards using Dijkstras approach - the
postcondition is inversely transformed. If the specified precondition is at least as strong as the
calculated required precondition the proof is satisfied.
Currently the usage of Frama-C is easy on most operating systems despite Microsoft Windows. It is
runnable on Windows in some cases but it’s hard to achieve. To install it on FreeBSD one can use
the devel/frama-c
port or package, on most Linux systems it’s available via the frama-c
package (i.e. apt-get install frama-c
, yum install frama-c
, etc.).
If one already has a running opam installation one can use opam to install frama-c and alt-ergo together with all dependencies:
opam install frama-c
opam install alt-ergo
If one decides to do that separately one also has to install either math/coq
or math/alt-ergo
(i.e. on Linux the coq and alt-ergo packages) or both to be used as a proof assistant.
One can use a really simple example to perform some basic arithmetics and proof the program is correct to verify that the installation worked indeed. We’ll write a short program that proofs that a swap and an add function performs correctly:
#include <stdio.h>
/*@
requires \valid(a) && \valid(b);
assigns *a, *b;
ensures *a == \old(*b);
ensures *b == \old(*a);
*/
static void swap(int* a, int* b) {
int temp;
temp = (*a);
(*a) = (*b);
(*b) = temp;
}
/*@
assigns \result;
ensures \result == (a - b);
*/
static int difference(int a, int b) {
return a-b;
}
int main(int argc, char* argv[]) {
int a = 21;
int b = 42;
swap(&a, &b);
//@ assert a == 42 && b == 21;
int c = difference(b, a);
//@ assert c == 22;
return 0;
}
First one can add the sourcefile to the Frama-C project when using the GUI
As one can see frama-c already parses and somewhat preprocesses the original code shown on the right into a slightly modified form shown on the left. If no proof assistant has been run all ACSL statements are marked with blue circles - that means that these are just statements that have not been proofen / processed until now.
Then one can use the WP plugin to proof the behaviour of the code. To run analysis one selects the Analyses item from the Analyses menu. Then one can configure the WP plugin to use the WP module under WPs goal selection.
As one can see all ACSL statements have been processed successfully except the error we’ve made in our assertion.
After we’ve spotted both errors (hint: There is one in the assertion and one logical error) and fixed it we can re-run the proofer after re-parsing the code with File/Reparse
To demonstrate that Frama-C / Alt-Ergo helps us to catch errors that are most often overlooked we add the
postcondition, that the subtraction result of a-b
should be smaller than a
.
This is of course wrong because of two conditions. The first one is easy to spot: b may be negative or zero.
Let’s just assume that b
always has to be positive as well as non-zero and add the requirement to the
subtraction function.
As we expect this is returns success.
Now let’s demonstrate another possible error that’s not easy to spot. Let’s implement a absolute value function. This is mathematically defined to be the value itself for positive or zero input and the value times -1 for negative input data - i.e. this is a function that always returns the positive numeric value and discards any sign. Let’s implement that naively in C - and think about if you spot a bug here (interestingly it’s the same kind of bug that lead to the anomality during the Ariane 5 launch which caused the rocket to explode):
#include <limits.h>
/*@
ensures \result >= 0;
behavior inPositive:
assumes inValue >= 0;
ensures \result == inValue;
assigns \result;
behavior inNegative:
assumes inValue < 0;
ensures \result == -inValue;
assigns \result;
complete behaviors inPositive, inNegative;
disjoint behaviors inPositive, inNegative;
*/
int absValue(int inValue) {
if(inValue >= 0) {
return inValue;
} else {
return -1 * inValue;
}
}
int main(int argc, char* argv[]) {
int a = +1;
int b = -1;
int c = -9223372036854775808;
int tmp = absValue(a);
//@ assert(tmp == 1);
tmp = absValue(b);
//@ assert(tmp == 1);
tmp = absValue(c);
//@ assert(tmp == 9223372036854775808);
}
One has to enable RTE guards this time.
As we can see we spot an error in our absValue
function. The RTE guards inserted
automatically assertions that clamp our signed integer to the supported range (on 32 bit
systems normally in the range −2147483648 to 2147483647 and on 64 bit systems normally
in the range from −9223372036854775808 to 9223372036854775807). As we can see there exists
the minimal signed integer value that is not projectable onto any positive number. This is
because of the encoding of integers on most platforms.
If one re-visits the previous example - especially the difference function - with RTE guards in place
/*@
requires b > 0;
assigns \result;
ensures \result == (a - b);
ensures (\result < a);
*/
static int difference(int a, int b) {
return a-b;
}
one can also spot that this function does not satisfy the specified requirements!
Indeed this can only be fixed if we limit the input range of the function inside our
precondition (the requires
statement) - which will of course be checked against each
of our calling functions as well.
As we’ve already seen with the examples function contracts specify preconditions and postconditions as well as (not seen) axioms for functions.
The postcondition is specified via the ensures
statement. As all contracts it’s terminated
with a semicolon at the end. As usual there can be multiple ensures
statements per contract,
globally specified postconditions as well as per-behaviour conditions. Note that one could
also simply join postconditions with logical ands (since these are simple mathematical statements
and not code that’s run in sequence) but using multiple ensure statements is often more readable.
As usual the normal well known connectors are available:
!
~
&
, Bitwise or |
, Implication -->
as well as equivalence relation <-->
&&
, Boolean or ||
, Boolean exclusive or ^^
==
, inqeuality !=
and of course well known larger and smaller comparistons*
and address-of operators &
are of course also supported.One can always name the properties one is defining:
/*@
requires \valid(a) && \valid(b);
assigns *a, *b;
ensures elements_swapped : (*a == \old(*b)) && (*b == \old(*a));
*/
static void swap(int* a, int* b) {
int temp;
temp = (*a);
(*a) = (*b);
(*b) = temp;
}
Naming is especially useful for longer contracts.
Preconditions are specified via requires
. Normally a C programmer would also check them
with parameter validation statements - but if the program is always proofen one can discard these
checks and rely on the proof alone - since the proofer proofs the whole program.
One can use typical mathematical notation:
/*@
requires 0 < seed <= 100;
*/
void exampleFunctionRandomSeed1To100(int seed) {
/* ... */
}
As we have also seen we can access the values before applying the function via \old
as well
as the values after the execution without any modifier. If one wants to add statements about intermediate
state at the end one can add a label there and use the \at
modifier:
int a = 21;
notthewholetruth:
a = 42;
//@ assert a == 42;
//@ assert \at(a, notthewholetruth) == 21;
There are 6 predefined labels in ACSL:
Pre
and Old
is the value before the function callPost
is the value after the function callLoopEntry
is the value at the loop entry (i.e. before the first run of a loop)LoopCurrent
is the value at the beginning of the current step in the loopHere
is the value of the current program point - which is how we think of variables in a sequential
program.One can also require that pointers point to valid memory locations. This is done via the \valid
function.
Note that the memory model of proofers cannot handle arbitrary casts - in fact it normally models disjoint
memory locations for different types! One can verify a single pointer location with \valid(a)
or a whole
memory range with a statement like \valid(a + (0 .. 100))
which would require a[0]
up to a[100]
to be valid. If one only accesses a variable with reads one can also use \valid_read
that’s also true
for const pointers.
Side effects are specified via the assigns
clause. One has to list all memory locations that
get modified by the function. If a function is side-effect free one can use assigns \nothing
To ensure that pointers do not point to overlapping memory locations one can use the \separated
statement
that receives a list of memory locations that are not allowed to overlap:
/*@
requires \separated(a,b,...,c);
*/
As has also been seen above one can specify different behaviours that have different preconditions that lead
to different postconditions. Note that behaviours have to be complete
(i.e. describe all possible outcomes)
and disjoint
which means they do not overlap (i.e. are true alternatives).
Note that for all examples we have annotated the functions directly at their implementation. To use modular verification one should add the specifications to the definitions (i.e. the header files). This allows to run the verification against the implementation (i.e. the source file) as well as use the definitions to validate users of the function. In case one verifies using the modular structure fetching definitions from the header files Frama-C assumes that the postconditions are true (this is shown when using the UI using half-green, half-blue bullets besides the annotations).
This is also how ANSI-C standard library is realized for Frama-C. It uses it’s own set of header files that contain standards conformant annotations for all standards library functions.
Up until now we’ve only seen the usage of ACSL to annotate functions one can also use it to provide statements
about loops. There is the loop invariant
and loop assigns
value. With loop invariant
we can
specify the range of our loop variable. For example when emulating a for
loop with an while
:
int i = 1;
int h = 10;
/*
Note: The following program contains an error!
*/
/*@
loop invariant 1 <= i < 100;
loop invariant h == \at(h, Pre) + (i-1);
loop assigns i;
*/
while(i < 100) {
i++;
h++;
}
//@ assert i == 99
One can also specify a loop variant
- that is an expression that has to be strictly decreasing
on each loop iteration (and thus guarantees termination of the loop)
/*@
loop assigns \nothing;
loop variant n-i;
*/
for(int i = 0; i < n; i++) {
// ...
}
In mathematics one often uses the forall
operator to specify that a statement applies to
a number of cases. This can be done in ACSL too. For example to specify that all entries in an
array will be set to a value in the range from 0 to 10 inclusive:
/*@
requires \valid(data + (0 .. len-1));
assigns data[0 .. len-1];
ensures \forall size_t i; 0 <= i < len ==> 0 <= data[i] <= 10;
*/
void setToRange(int* data, size_t len);
If we do that with an random statement:
/*@
requires \valid(data + (0 .. len-1));
assigns data[0 .. len-1];
ensures \forall size_t i; 0 <= i < len ==> 0 <= data[i] <= 10;
*/
void setToRange(int* data, size_t len) {
/*@
loop invariant 0 <= i <= len;
loop invariant \forall size_t j; 0 <= j < len ==> 0 <= data[j] <= 10;
loop assigns i, data[0 .. len-1];
loop variant len-i;
*/
for(size_t i = 0; i < len; i++) {
data[i] = rand() % 11;
}
}
To annotate a more complex clamping function that should limit an array of input integers to values between 0 to 255:
/*@
requires \valid(data + (0 .. len-1));
assigns data[0 .. len-1];
ensures \forall size_t i; 0 <= i < len && \old(data[i]) > 255 ==> data[i] == 255;
ensures \forall size_t i; 0 <= i < len && \old(data[i]) < 0 ==> data[i] == 0;
ensures \forall size_t i; 0 <= i < len && \old(data[i]) >= 0 && \old(data[i] <= 255) ==> data[i] == \old(data[i]);
*/
void clampData(int* data, size_t len) {
/*@
loop invariant 0 <= i <= len;
loop invariant \forall size_t j; 0 <= j < i && \at(data[j], Pre) > 255 ==> data[j] == 255;
loop invariant \forall size_t j; 0 <= j < i && \at(data[j], Pre) < 0 ==> data[j] == 0;
loop invariant \forall size_t j; 0 <= j < i && \at(data[j], Pre) >= 0 && \at(data[j], Pre) <= 255 ==> data[j] == \at(data[j], Pre);
loop assigns i, data[0 .. len-1];
loop variant len-i;
*/
for(size_t i = 0; i < len; i++) {
if(data[i] > 255) { data[i] = 255; }
else if(data[i] < 0) { data[i] = 0; }
}
}
Note that assertions about unreachable code are - as normal with mathematical implication - always true since it’s not deducible that a statement that never runs is false or an event that never happens leads to an error.
The following example function does a binary search (by reducing a bracket around the target value step by step) inside an ordered array:
/*@
requires len > 0;
requires \valid_read(data + (0 .. len-1));
requires \forall int i,j; 0 <= i <= j < len ==> data[i] < data[j];
behavior found:
requires \exists int i; 0 <= i < len ==> data[i] == key;
ensures \result >= 0 && \result < len;
behavior notfound:
requires \forall int i; 0 <= i < len ==> data[i] != key;
ensures \result == -1;
assigns \nothing;
*/
int searchData(int* data, int len, int key) {
if(len <= 0) { return -1; }
int low = 0;
int up = len-1;
/*@
loop assigns low,up;
loop invariant up < len;
loop invariant low < len;
loop invariant low >= 0;
loop invariant up >= 0;
*/
while(low < up) {
int mid = low + (up - low)/2;
if(data[mid] > key) {
low = mid;
} else if(data[mid] < key) {
up = mid;
} else {
return mid;
}
}
if(data[low] == key) { return low; }
if(data[up] == key) { return up; }
return -1;
}
Predicates are the (pure) function equivalent to C functions. They allow to write the conditions we have already used and name them. They can receive arguments as functions from programming languages and have to be either true or false. These can then be used later on in different places.
For example the previous searchData function can be rewritten with predicates:
/*@
predicate orderedIntArray(int* data, size_t len, int val) =
len > 0 &&
\valid_read(data + (0 .. len-1)) &&
\forall int i,j; 0 <= i <= j < len ==> data[i] < data[j];
predicate elementInIntArray(int* data, size_t len, int val) =
len > 0 &&
\valid_read(data + (0 .. len-1)) &&
\exists int i; 0 <= i < len ==> data[i] == key;
*/
/*@
requires orderedIntArray(data, len, key);
behavior found:
requires elementInIntArray(data, len, key);
ensures \result >= 0 && \result < len;
behavior notfound:
requires !elementInIntArray(data, len, key);
ensures \result == -1;
assigns \nothing;
*/
int searchData(int* data, int len, int key) {
if(len <= 0) { return -1; }
int low = 0;
int up = len-1;
/*@
loop assigns low,up;
loop invariant up < len;
loop invariant low < len;
loop invariant low >= 0;
loop invariant up >= 0;
*/
while(low < up) {
int mid = low + (up - low)/2;
if(data[mid] > key) {
low = mid;
} else if(data[mid] < key) {
up = mid;
} else {
return mid;
}
}
if(data[low] == key) { return low; }
if(data[up] == key) { return up; }
return -1;
}
As with C++ predicates can be overloaded as long as they differ in their type signature.
Logic functions on the other hand can be functions that describe program behaviour in a logical expression (for example the calculation performed by an function). These can be used in any comparison or similar context:
/*@
logic vectorInProduct2D(int* v1, int* v2) = v1[0] * v2[0] + v1[1] + v2[1];
*/
One can of course use recursion to realize loops but has to take care of the limits of the proof assistant used to verify the expression.
This article is tagged: Programming, Math, Formal, Frama C, Security
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/