The Eiffel Compiler / Interpreter (tecomp)

Nearly all languages with imperative elements have some kind of an array. In C an array is just a typed pointer to a memory region. You can address the different elements by giving an offset to the start of the memory region. The compiler multiplies the offset with the size of the objects contained in the array to get the actual offset in bytes.

The C view of arrays has certain advantages: It is memory efficient because it uses just one pointer. Accessing elements is easy because it just needs some simple address arithmetic.

But the C view has some pitfalls as well. First you might fail to allocate memory for the array. Then the pointer to the memory region has some undefined value. Second you might read elements of the array which have not yet been initialized. Third you might read from or write to elements of the array which are not within the allocated region.

Modern languages avoid this pitfalls by certain methods which can be executed at compile time or at runtime. The failure to initialize the pointer to the memory region is usually handled by initializing all pointers with zero. Any access to a zero pointer results in a runtime exception. Addressing elements outside the region is usually handled at runtime as well by storing with the array its capacity and generating an exception if elements outside the region are addressed.

All these solutions avoid memory corruption but they have a cost. Additional code is required to check null pointer accesses and to check out of bound accesses. The single pointer implementation is no longer sufficient because the size of the allocated region has to be stored in the runtime object.

A language which allows formal verification can avoid all these pitfalls without any cost. Without any cost? Well -- without any runtime and memory footprint cost. But nothing is free. Some assertions must be provable at compile time. I.e. whenever you want to access some element of the array you have to be sure that the index is within the array bounds. And it is not sufficient that you are sure. You have to convince the verifier of this fact.

In the following article we show an array structure in our programming language which allows us to convince the verifier that we make no illegal accesses.

Since an array is a mutable structure we have to address the framing problem as well. The framing problem is addressed appropriately if for any modifying procedure it is clear not only what it does change but also what it leaves unchanged.

A lot of effort is done currently in the verification community to address the framing problem. Frame specifications like modify and use clauses for each procedure, separation logic, different sophisticated heap models etc.

In this article we demonstrate that the framing problem can be solved without all these complicated structures. It is sufficient to regard access to data structures as functions and have a sufficient powerful function model. It is amazing to see that proper abstraction makes some difficult seeming problems a lot easier.

As a warm up let us see what kind of operations we expect from arrays. First of all we want to be able to declare variables of type array which hold elements of a certain type.

local s: STRING a: ARRAY[STRING] do ...

The declaration does not allocate the object. Therefore accessing any element of the array has to be illegal at this point.

a[4] := "Hello" -- Illegal, array not yet allocated!

To allocate the object we need some kind of object creation.

create a.with_capacity(10)

This instruction creates an array object with the possibility to store up to 10 strings. The creation just allocates the structure but does not populate the array with any string.

At this point it is illegal to read elements.

s := a[0] -- Illegal, access to an uninitialized object!

But we can write some elements of the array and access these elements

a[0] := "first string" a[1] := "second string" s := a[1] s := a[2] -- Illegal, access to an uninitialized object! s := a[100] -- Illegal, out of bound access! ... end

If the verifying compiler is able to detect these illegal accesses and let pass only the legal accesses, then the actual runtime code can have the same size as the corresponding C code and the actual array object can be represented by a single pointer to the memory region.

The type ARRAY is declared within the module array

```
-- module: array
```

We need a formal generic which represents the type of the elements.

G: ANY -- Element type

There are no constraints on the element type. Any type can be used as element type.

The generic class ARRAY has three basic access functions and an invariant.

class ARRAY[G] feature capacity: ghost NATURAL -- The allocated capacity of the array. domain: ghost NATURAL? -- The set of indices of elements which have been initialized. [] (i:NATURAL): G -- Element at position `i'. require i in domain note built_in end invariant domain <= {i: i<capacity} -- Note: '<=' for sets is the subset relation end

'capacity' returns the allocated capacity of the array. Since it is a ghost attribute it can be used only in assertions and cannot flow into actual computations. The attribute 'domain' represents the set of indices of array elements which have already been initialized. This is a ghost attribute as well.

The bracket operator ([]) represents the function which returns the element at position 'i'. It uses 'domain' in its precondition to express that only elements at initialized positions can be allocated.

The invariant states that only elements within the allocated capacity can be initialized.

We have presented the above declaration of the class ARRAY in an object oriented manner. Programmers familiar with object oriented languages like C++, Java, C# should have no problems to read such a declaration. But it should be noted that such a form of declaration hides one important thing. All routines declared within the class ARRAY have one implicit argument which is not explicitly mentioned, namely the array object (called 'this' in C++ and Java). In order to make this argument more explicit we present this declaration in a different but equivalent form.

class ARRAY[G] end feature capacity(a:CURRENT): ghost NATURAL domain(a:CURRENT): ghost NATURAL? [](a:CURRENT, i:NATURAL): G require i in a.domain note built_in end end invariant all(a:CURRENT) ensure a.domain <= {i: i < a.capacity} end end

This form of the declaration makes transparent that the used identifiers and operators have the following types:

capacity: ghost CURRENT->NATURAL domain: ghost CURRENT->NATURAL? ([]): [CURRENT,NATURAL] -> G

Remark: In our programming language we use the Haskell convention that an operator (unary or binary) put between parentheses can be used like an ordinary function. I.e. for numbers the expressions '7+3' and '(+)(7,3)' are equivalent, but the first one is preferable because it is much more readable.

If something looks like an attribute in object oriented notation it is actually a function because it needs an argument. Therefore an assignment to an attribute is something fundamentally different from the assignment to a local variable. An assignment of a new value to an attribute is a modification of the corresponding function.

Note that the unfolded form of the class invariant makes evident that the invariant is not only an object invariant but an invariant for the whole system. In the case of such simple classes as ARRAY this makes no difference. But if we look at linked structures the difference is fundamental.

The function domain has been used in the precondition of the function ([]). This creates the following implicit invariant.

invariant ([]).domain = {a:CURRENT, i:NATURAL: i in a.domain} end

Since procedures might modify the functions ([]) and domain, they have to respect the invariant, the explicit and the implicit invariant.

The basic access functions capacity, domain and ([]) are independent in the sense that the value of one function does not depend on the value of any other or these basic functions. There is a consistency condition between domain and ([]) because the function domain is used in the precondition of ([]) (see previous chapter). However the value returned by ([]) does not depend on the function domain.

The basic access functions capacity and domain have another consistency condition expressed in the invariant. Note that consistency conditions do not establish a dependence.

Based on the basic access functions we can define other functions. E.g. we might define a function is_full which returns true if all positions in the array are accessible.

is_full(a:CURRENT): ghost -- Are all positions of the array `a' accessible? ensure Result = ({i: i < a.capacity} = a.domain) end

This function has a clear dependency: It depends on the functions capacity and domain. This implies that any modification of the functions capacity and/or domain modifies implicitly the function is_full as well.

In order to be able to create an object of type array we need a creation procedure (or a constructor called in languages like C++, Java and C#).

class ARRAY[G] create with_capacity(c:NATURAL) -- Create an array with capacity `c'. note built_in ensure domain = 0 -- '0' is the empty set capacity = c end end

In order to make the hidden argument more transparent we take the declaration out of the class and get the following equivalent declaration.

create with_capacity(a:CURRENT, c:NATURAL) note built_in ensure a.domain = 0 a.capacity = c end end

Each creation procedure receives as its first argument an allocated but uninitialized object of the corresponding type. The task of the creation procedure is to initialize the object such that it satisfies the consistency conditions expressed in the invariant.

The postcondition states that the creation procedure has an effect on the functions 'domain' and 'capacity'. Remember that the corresponding types are CURRENT->NATURAL? and CURRENT->NATURAL.

Expressions like 'a.domain=0' in postconditions of procedures are interpreted in our language rather strictly. It reads: the identifier 'domain' identifies a new function which is the same as the function which it identified before the call except for the argument 'a'. Only functions mentioned in the postcondition (and functions which depend on them) are modified by a procedure. I.e. the declaration of the creation procedure is interpreted in the following manner.

create with_capacity(a:CURRENT, c:NATURAL) note built_in ensure domain = old (domain + [a,0]) capacity = old (capacity + [a,c]) ([]) = old ([]) end end

For any function f:A->B the expression 'f+[a,b]' is defined as

f + [a,b] = (x -> if x=a then b else f(x) end)

with the consequences

(f+[a,b]).domain = f.domain + {a} x=a => (f+[a,b])(x) = b x/=a => (f+[a,b])(x) = f(x)

I.e. we view functions as sets of key value pairs then the expression 'f+[a,b]' is the set of key value pairs of the function 'f' except for the key 'a' where the pair [a,b] either overrides the one already contained in 'f' or is added to the set of key value pairs.

The strict interpretation of the postcondition of 'with_capacity' corresponds to our intuition if we read expressions like 'a.domain=0' in the postcondition of a procedure. Our intuition tells us the 'a.domain' has been set to '0' but 'b.domain', 'c.domain', ... for 'b' and 'c' different from 'a' have still the same value.

The strict interpretation of postconditions of procedures allows us to interpret the postconditions as framing conditions. A postcondition of a procedure contains everything the procedure has modified and all information needed to derived the things which have not been modified. Postconditions of procedures are treated as implicit framing conditions.

We need a procedure to put elements at certain positions.

class ARRAY[G] feature put(i:NATURAL, v:G) -- Update (or set) the value `v' at position `i'. require i < capacity note built_in ensure domain = old (domain + {i}) Current[i] = v end end

In order to make the arguments more transparent we use the following equivalent declaration of the procedure 'put'.

feature put(a:CURRENT, i:NATURAL, v:G) require i < a.capacity note built_in ensure a.domain = old (a.domain + {i}) Current[i] = v end end

The procedure 'put' modifies two functions, the ghost function 'domain' and the element access function represented by the bracket operator ([]). The verifier extracts the following strict interpretation of this procedure.

feature put(a:CURRENT, i:NATURAL, v:G) require i < a.capacity note built_in ensure domain = old (domain + [a, a.domain + {i}]) ([]) = old (([]) + [[a,i],v]) capacity = old capacity end end

I.e. the procedure put assigns new values to the functions domain and ([]). Since the function capacity is independent of the functions domain and ([]) and the postcondition does not mention neither the function capacity nor any function which depend on it, the strict postcondition is extracted that capacity remains unchanged.

There is one interesting thing to note about the procedure put and the function ([]). The procedure put modifies the function ([]) at just one position and updates the domain appropriately. The signatures of ([]) and put are consistent in the sense that they share the same arguments. The procedure put has just one argument more which is the value to be set. These facts are sufficient for the compiler to derive that put is an assigner for the function ([]). We can use this to write more readable code. Instead of writing

a.put(i,v)

we can use the more suggestive notation

a[i] := v

It is easy to write a procedure which swaps two elements of an array.

feature swap(a:CURRENT, i,j:NATURAL) require i in a.domain j in a.domain ensure a[i] = old a[j] a[j] = old a[i] end end

Because of the fact that the function ([]) has the assigner command put the procedure swap needs no explicit implementation. The postcondition is sufficient because it states that the function ([]) gets new values at the argument positions [a,i] and [a,j]. The assigner command put can be used to derive the implementation. The automatically derived implementation looks like

local tmp1, tmp2:G := a[i],a[j] do a.put(i,tmp2) a.put(j,tmp1) end

As for the previous procedures we give the strict interpretation of the postcondition.

feature swap(a:CURRENT, i,j:NATURAL) require i in a.domain j in a.domain ensure domain = old domain capacity = old capacity ([]) = old (([]) <+ {i -> a[j], j -> a[j]}) end end

In this postcondition we have used a notation which needs some explanation. In our programming language we can define a function f:A->B by giving a set of key value pairs.

local f:STRING->NATURAL do ... f := {"one" -> 1, "two" -> 2} -- An illegal assignment f := {"one" -> 1, "two" -> 2, "one" -> 2} ... end

The verifier generates the proof obligation that the same key must have the same value. Therefore the second assignment in the example is illegal. Let us look at the proof obligation generated for the second assignment.

a1: "one" = "two" => 1 = 2 a2: "one" = "one" => 1 = 2

The assertion a1 is provable because the antecedent of the implication is false and from a false antecedent anything can be in the consequent. The assertion a2 is definitely not provable because it requires the verifier to prove '1=2' which is not possible.

Going back to the procedure swap we can see that '{i->a[j],j->[a[i]}' is a function of type NATURAL->G with generates the proof obligation

i = j => a[i] = a[j]

which is trivially valid.

Furthermore in the postcondition we have used the function override operator '<+' which has the following definition for two functions 'f,g:A->B'.

f <+ g = (x -> if x in g.domain then g(x) elseif x in f.domain then f(x) end)

with the consequences

(f<+g).domain = f.domain + g.domain x in g.domain => (f<+g)(x) = g(x) x in f.domain-g.domain => (f<+g)(x) = f(x)

Using these definitions one can verify that the above mentioned strict postcondition has the intended meaning.

Local Variables: mode: outline coding: iso-latin-1 outline-regexp: "=\\(=\\)*" End: