How to use Frama-C to proof C code (short primer)

12 Jun 2019 - tsp
Last update 27 Nov 2019
Reading time 19 mins

Disclaimer: This is only a short primer that shows basic principles of formal verification of C programs. This article may be updates at a later point in time. Note that formal verification is very different than most programmers think (as one has to clearly specify behaviour - as one should normally think about software but - let’s be fair - normally programms don’t do). Formal verification takes a lot more time than just hacking something and is not really required in most circumstances where basic testing (unit tests, smoke tests, etc.) is sufficient. But it adds the ability to be really sure about properties about program behaviour which is interesting in critical applications.

Frama-C is a collection of tools to help with static and dynamic analysis of software. It can be used on plain C code but is mostly used with ACSL annotated C code that defines some expected behaviour (Hoare logic). Frama-C is a tool developed at Inria and by the French Alternative Energies and Atomic Energy Commission (CEA). It has a modular plugin architecture that provides

Value analysis
Deductive proofes (based on Why that uses theorem proofers like Alt-Ergo or Coq)
Weakest precondition based proofs
Impact analysis
Slicing of programs (i.e. generating subsets of programs that preserve some of the original properties)
Dead code / spare code removal
Dominator computation
Functional dependency computation

It depends on the C intermediate language during parsing - Frama-C does support most of but not all of ANSI C. There is one major feature often used in C that currently cannot be handeled by Frama-C: Function pointers.

Note that formal verification does not proof that your program is errorless. It can only proof that the program fulfills some properties that you require. If you specify the wrong properties wrong behaving programs are of course also verified correctly to conform to the wrong specification.

The weakest precondition calculus is used to proof that a codeblock C indeed transforms a set of preconditions P into a set of postconditions Q. These three parts form the Horae tripple {P,C,Q}. For forward calculations one could in theory calculate if the specified postconditions is as strong or weaker than the calculated postcondition (based on the preconditions P transformed by the transformation C) to reason that the proof is satisfied. For weakest precondition the proof is normally done backwards using Dijkstras approach - the postcondition is inversely transformed. If the specified precondition is at least as strong as the calculated required precondition the proof is satisfied.

Installation

Currently the usage of Frama-C is easy on most operating systems despite Microsoft Windows. It is runnable on Windows in some cases but it’s hard to achieve. To install it on FreeBSD one can use the devel/frama-c port or package, on most Linux systems it’s available via the frama-c package (i.e. apt-get install frama-c, yum install frama-c, etc.).

If one already has a running opam installation one can use opam to install frama-c and alt-ergo together with all dependencies:

opam install frama-c
opam install alt-ergo

If one decides to do that separately one also has to install either math/coq or math/alt-ergo (i.e. on Linux the coq and alt-ergo packages) or both to be used as a proof assistant.

A simple example

One can use a really simple example to perform some basic arithmetics and proof the program is correct to verify that the installation worked indeed. We’ll write a short program that proofs that a swap and an add function performs correctly:

#include <stdio.h>

/*@
    requires \valid(a) && \valid(b);
    assigns *a, *b;
    ensures *a == \old(*b);
    ensures *b == \old(*a);
*/
static void swap(int* a, int* b) {
    int temp;
    temp = (*a);
    (*a) = (*b);
    (*b) = temp;
}

/*@
    assigns \result;
    ensures \result == (a - b);
*/
static int difference(int a, int b) {
    return a-b;
}

int main(int argc, char* argv[]) {
    int a = 21;
    int b = 42;

    swap(&a, &b);
    //@ assert a == 42 && b == 21;

    int c = difference(b, a);
    //@ assert c == 22;

    return 0;
}

First one can add the sourcefile to the Frama-C project when using the GUI

Opening a file with Frama-C using the File/Sourcefiles menu, selecting the files and adding all of them using add

As one can see frama-c already parses and somewhat preprocesses the original code shown on the right into a slightly modified form shown on the left. If no proof assistant has been run all ACSL statements are marked with blue circles - that means that these are just statements that have not been proofen / processed until now.

Then one can use the WP plugin to proof the behaviour of the code. To run analysis one selects the Analyses item from the Analyses menu. Then one can configure the WP plugin to use the WP module under WPs goal selection.

As one can see all ACSL statements have been processed successfully except the error we’ve made in our assertion.

After we’ve spotted both errors (hint: There is one in the assertion and one logical error) and fixed it we can re-run the proofer after re-parsing the code with File/Reparse

To demonstrate that Frama-C / Alt-Ergo helps us to catch errors that are most often overlooked we add the postcondition, that the subtraction result of a-b should be smaller than a.

This is of course wrong because of two conditions. The first one is easy to spot: b may be negative or zero. Let’s just assume that b always has to be positive as well as non-zero and add the requirement to the subtraction function.

As we expect this is returns success.

Now let’s demonstrate another possible error that’s not easy to spot. Let’s implement a absolute value function. This is mathematically defined to be the value itself for positive or zero input and the value times -1 for negative input data - i.e. this is a function that always returns the positive numeric value and discards any sign. Let’s implement that naively in C - and think about if you spot a bug here (interestingly it’s the same kind of bug that lead to the anomality during the Ariane 5 launch which caused the rocket to explode):

#include <limits.h>

/*@
    ensures \result >= 0;
    behavior inPositive:
        assumes inValue >= 0;
        ensures \result == inValue;
        assigns \result;
    behavior inNegative:
        assumes inValue < 0;
        ensures \result == -inValue;
        assigns \result;
    complete behaviors inPositive, inNegative;
    disjoint behaviors inPositive, inNegative;
*/
int absValue(int inValue) {
    if(inValue >= 0) {
        return inValue;
    } else {
        return -1 * inValue;
    }
}

int main(int argc, char* argv[]) {
    int a = +1;
    int b = -1;
    int c = -9223372036854775808;

    int tmp = absValue(a);
    //@ assert(tmp == 1);
    tmp = absValue(b);
    //@ assert(tmp == 1);
    tmp = absValue(c);
    //@ assert(tmp == 9223372036854775808);
}

One has to enable RTE guards this time.

As we can see we spot an error in our absValue function. The RTE guards inserted automatically assertions that clamp our signed integer to the supported range (on 32 bit systems normally in the range −2147483648 to 2147483647 and on 64 bit systems normally in the range from −9223372036854775808 to 9223372036854775807). As we can see there exists the minimal signed integer value that is not projectable onto any positive number. This is because of the encoding of integers on most platforms.

If one re-visits the previous example - especially the difference function - with RTE guards in place

/*@
    requires b > 0;
    assigns \result;
    ensures \result == (a - b);
    ensures (\result < a);
*/
static int difference(int a, int b) {
    return a-b;
}

one can also spot that this function does not satisfy the specified requirements!

Indeed this can only be fixed if we limit the input range of the function inside our precondition (the requires statement) - which will of course be checked against each of our calling functions as well.

Function contracts

As we’ve already seen with the examples function contracts specify preconditions and postconditions as well as (not seen) axioms for functions.

The postcondition is specified via the ensures statement. As all contracts it’s terminated with a semicolon at the end. As usual there can be multiple ensures statements per contract, globally specified postconditions as well as per-behaviour conditions. Note that one could also simply join postconditions with logical ands (since these are simple mathematical statements and not code that’s run in sequence) but using multiple ensure statements is often more readable.

As usual the normal well known connectors are available:

Negation via !
Bitwise complement via ~
Bitwise and &, Bitwise or |, Implication --> as well as equivalence relation <-->
Boolean and &&, Boolean or ||, Boolean exclusive or ^^
All of the well known operations like addition, subtraction, multiplication, division, modulo, bitshifts
Equality via double equal signs ==, inqeuality != and of course well known larger and smaller comparistons
Pointer dereferencing via * and address-of operators & are of course also supported.

One can always name the properties one is defining:

/*@
    requires \valid(a) && \valid(b);
    assigns *a, *b;
    ensures elements_swapped : (*a == \old(*b)) && (*b == \old(*a));
*/
static void swap(int* a, int* b) {
    int temp;
    temp = (*a);
    (*a) = (*b);
    (*b) = temp;
}

Naming is especially useful for longer contracts.

Preconditions are specified via requires. Normally a C programmer would also check them with parameter validation statements - but if the program is always proofen one can discard these checks and rely on the proof alone - since the proofer proofs the whole program.

One can use typical mathematical notation:

/*@
    requires 0 < seed <= 100;
*/
void exampleFunctionRandomSeed1To100(int seed) {
    /* ... */
}

As we have also seen we can access the values before applying the function via \old as well as the values after the execution without any modifier. If one wants to add statements about intermediate state at the end one can add a label there and use the \at modifier:

    int a = 21;
notthewholetruth:
    a = 42;

    //@ assert a == 42;
    //@ assert \at(a, notthewholetruth) == 21;

There are 6 predefined labels in ACSL:

Pre and Old is the value before the function call
Post is the value after the function call
LoopEntry is the value at the loop entry (i.e. before the first run of a loop)
LoopCurrent is the value at the beginning of the current step in the loop
Here is the value of the current program point - which is how we think of variables in a sequential program.

One can also require that pointers point to valid memory locations. This is done via the \valid function. Note that the memory model of proofers cannot handle arbitrary casts - in fact it normally models disjoint memory locations for different types! One can verify a single pointer location with \valid(a) or a whole memory range with a statement like \valid(a + (0 .. 100)) which would require a[0] up to a[100] to be valid. If one only accesses a variable with reads one can also use \valid_read that’s also true for const pointers.

Side effects are specified via the assigns clause. One has to list all memory locations that get modified by the function. If a function is side-effect free one can use assigns \nothing

To ensure that pointers do not point to overlapping memory locations one can use the \separated statement that receives a list of memory locations that are not allowed to overlap:

/*@
    requires \separated(a,b,...,c);
*/

As has also been seen above one can specify different behaviours that have different preconditions that lead to different postconditions. Note that behaviours have to be complete (i.e. describe all possible outcomes) and disjoint which means they do not overlap (i.e. are true alternatives).

Where to put ACSL annotations

Note that for all examples we have annotated the functions directly at their implementation. To use modular verification one should add the specifications to the definitions (i.e. the header files). This allows to run the verification against the implementation (i.e. the source file) as well as use the definitions to validate users of the function. In case one verifies using the modular structure fetching definitions from the header files Frama-C assumes that the postconditions are true (this is shown when using the UI using half-green, half-blue bullets besides the annotations).

This is also how ANSI-C standard library is realized for Frama-C. It uses it’s own set of header files that contain standards conformant annotations for all standards library functions.

Verifying loops

Up until now we’ve only seen the usage of ACSL to annotate functions one can also use it to provide statements about loops. There is the loop invariant and loop assigns value. With loop invariant we can specify the range of our loop variable. For example when emulating a for loop with an while:

    int i = 1;
    int h = 10;
    /*
        Note: The following program contains an error!
    */
    /*@
        loop invariant 1 <= i < 100;
        loop invariant h == \at(h, Pre) + (i-1);
        loop assigns i;
    */
    while(i < 100) {
        i++;
        h++;
    }
    //@ assert i == 99

One can also specify a loop variant - that is an expression that has to be strictly decreasing on each loop iteration (and thus guarantees termination of the loop)

    /*@
        loop assigns \nothing;
        loop variant n-i;
    */
    for(int i = 0; i < n; i++) {
        // ...
    }

Forall

In mathematics one often uses the forall operator to specify that a statement applies to a number of cases. This can be done in ACSL too. For example to specify that all entries in an array will be set to a value in the range from 0 to 10 inclusive:

/*@
    requires \valid(data + (0 .. len-1));
    assigns data[0 .. len-1];
    ensures \forall size_t i; 0 <= i < len ==> 0 <= data[i] <= 10;
*/
void setToRange(int* data, size_t len);

If we do that with an random statement:

/*@
    requires \valid(data + (0 .. len-1));
    assigns data[0 .. len-1];
    ensures \forall size_t i; 0 <= i < len ==> 0 <= data[i] <= 10;
*/
void setToRange(int* data, size_t len) {
    /*@
        loop invariant 0 <= i <= len;
        loop invariant \forall size_t j; 0 <= j < len ==> 0 <= data[j] <= 10;
        loop assigns i, data[0 .. len-1];
        loop variant len-i;
    */
    for(size_t i = 0; i < len; i++) {
        data[i] = rand() % 11;
    }
}

To annotate a more complex clamping function that should limit an array of input integers to values between 0 to 255:

/*@
    requires \valid(data + (0 .. len-1));
    assigns data[0 .. len-1];
    ensures \forall size_t i; 0 <= i < len && \old(data[i]) > 255 ==> data[i] == 255;
    ensures \forall size_t i; 0 <= i < len && \old(data[i]) < 0 ==> data[i] == 0;
    ensures \forall size_t i; 0 <= i < len && \old(data[i]) >= 0 && \old(data[i] <= 255) ==> data[i] == \old(data[i]);
*/
void clampData(int* data, size_t len) {
    /*@
        loop invariant 0 <= i <= len;
        loop invariant \forall size_t j; 0 <= j < i && \at(data[j], Pre) > 255 ==> data[j] == 255;
        loop invariant \forall size_t j; 0 <= j < i && \at(data[j], Pre) < 0 ==> data[j] == 0;
        loop invariant \forall size_t j; 0 <= j < i && \at(data[j], Pre) >= 0 && \at(data[j], Pre) <= 255 ==> data[j] == \at(data[j], Pre);

        loop assigns i, data[0 .. len-1];

        loop variant len-i;
    */
    for(size_t i = 0; i < len; i++) {
        if(data[i] > 255) { data[i] = 255; }
        else if(data[i] < 0) { data[i] = 0; }
    }
}

Unreachable code

Note that assertions about unreachable code are - as normal with mathematical implication - always true since it’s not deducible that a statement that never runs is false or an event that never happens leads to an error.

Example: Searching inside sorted array

The following example function does a binary search (by reducing a bracket around the target value step by step) inside an ordered array:

/*@
    requires len > 0;
    requires \valid_read(data + (0 .. len-1));
    requires \forall int i,j; 0 <= i <= j < len ==> data[i] < data[j];
    behavior found:
        requires \exists int i; 0 <= i < len ==> data[i] == key;
        ensures \result >= 0 && \result < len;
    behavior notfound:
        requires \forall int i; 0 <= i < len ==> data[i] != key;
        ensures \result == -1;
    assigns \nothing;
*/
int searchData(int* data, int len, int key) {
    if(len <= 0) { return -1; }

    int low = 0;
    int up = len-1;

    /*@
        loop assigns low,up;
        loop invariant up < len;
        loop invariant low < len;
        loop invariant low >= 0;
        loop invariant up >= 0;
    */
    while(low < up) {
        int mid = low + (up - low)/2;
        if(data[mid] > key) {
            low = mid;
        } else if(data[mid] < key) {
            up = mid;
        } else {
            return mid;
        }
    }
    if(data[low] == key) { return low; }
    if(data[up] == key) { return up; }
    return -1;
}

Predicates and logic functions

Predicates are the (pure) function equivalent to C functions. They allow to write the conditions we have already used and name them. They can receive arguments as functions from programming languages and have to be either true or false. These can then be used later on in different places.

For example the previous searchData function can be rewritten with predicates:

/*@
    predicate orderedIntArray(int* data, size_t len, int val) =
        len > 0 &&
        \valid_read(data + (0 .. len-1)) &&
        \forall int i,j; 0 <= i <= j < len ==> data[i] < data[j];
    predicate elementInIntArray(int* data, size_t len, int val) =
        len > 0 &&
        \valid_read(data + (0 .. len-1)) &&
        \exists int i; 0 <= i < len ==> data[i] == key;
*/
/*@
    requires orderedIntArray(data, len, key);
    behavior found:
        requires elementInIntArray(data, len, key);
        ensures \result >= 0 && \result < len;
    behavior notfound:
        requires !elementInIntArray(data, len, key);
        ensures \result == -1;
    assigns \nothing;
*/
int searchData(int* data, int len, int key) {
    if(len <= 0) { return -1; }

    int low = 0;
    int up = len-1;

    /*@
        loop assigns low,up;
        loop invariant up < len;
        loop invariant low < len;
        loop invariant low >= 0;
        loop invariant up >= 0;
    */
    while(low < up) {
        int mid = low + (up - low)/2;
        if(data[mid] > key) {
            low = mid;
        } else if(data[mid] < key) {
            up = mid;
        } else {
            return mid;
        }
    }
    if(data[low] == key) { return low; }
    if(data[up] == key) { return up; }
    return -1;
}

As with C++ predicates can be overloaded as long as they differ in their type signature.

Logic functions on the other hand can be functions that describe program behaviour in a logical expression (for example the calculation performed by an function). These can be used in any comparison or similar context:

    /*@
        logic vectorInProduct2D(int* v1, int* v2) = v1[0] * v2[0] + v1[1] + v2[1];
    */

One can of course use recursion to realize loops but has to take care of the limits of the proof assistant used to verify the expression.