SourceForge.net LogoThe Eiffel Compiler / Interpreter (tecomp)

doc/papers/mutable/array

Arrays and framing

Introduction

Nearly all languages with imperative elements have some kind of an array. In C an array is just a typed pointer to a memory region. You can address the different elements by giving an offset to the start of the memory region. The compiler multiplies the offset with the size of the objects contained in the array to get the actual offset in bytes.

The C view of arrays has certain advantages: It is memory efficient because it uses just one pointer. Accessing elements is easy because it just needs some simple address arithmetic.

But the C view has some pitfalls as well. First you might fail to allocate memory for the array. Then the pointer to the memory region has some undefined value. Second you might read elements of the array which have not yet been initialized. Third you might read from or write to elements of the array which are not within the allocated region.

Modern languages avoid this pitfalls by certain methods which can be executed at compile time or at runtime. The failure to initialize the pointer to the memory region is usually handled by initializing all pointers with zero. Any access to a zero pointer results in a runtime exception. Addressing elements outside the region is usually handled at runtime as well by storing with the array its capacity and generating an exception if elements outside the region are addressed.

All these solutions avoid memory corruption but they have a cost. Additional code is required to check null pointer accesses and to check out of bound accesses. The single pointer implementation is no longer sufficient because the size of the allocated region has to be stored in the runtime object.

A language which allows formal verification can avoid all these pitfalls without any cost. Without any cost? Well -- without any runtime and memory footprint cost. But nothing is free. Some assertions must be provable at compile time. I.e. whenever you want to access some element of the array you have to be sure that the index is within the array bounds. And it is not sufficient that you are sure. You have to convince the verifier of this fact.

In the following article we show an array structure in our programming language which allows us to convince the verifier that we make no illegal accesses.

Since an array is a mutable structure we have to address the framing problem as well. The framing problem is addressed appropriately if for any modifying procedure it is clear not only what it does change but also what it leaves unchanged.

A lot of effort is done currently in the verification community to address the framing problem. Frame specifications like modify and use clauses for each procedure, separation logic, different sophisticated heap models etc.

In this article we demonstrate that the framing problem can be solved without all these complicated structures. It is sufficient to regard access to data structures as functions and have a sufficient powerful function model. It is amazing to see that proper abstraction makes some difficult seeming problems a lot easier.

As a warm up let us see what kind of operations we expect from arrays. First of all we want to be able to declare variables of type array which hold elements of a certain type.

      local
          s: STRING
          a: ARRAY[STRING]
      do
          ...
 

The declaration does not allocate the object. Therefore accessing any element of the array has to be illegal at this point.

          a[4] := "Hello"     -- Illegal, array not yet allocated!
 

To allocate the object we need some kind of object creation.

          create a.with_capacity(10)
 

This instruction creates an array object with the possibility to store up to 10 strings. The creation just allocates the structure but does not populate the array with any string.

At this point it is illegal to read elements.

          s := a[0]           -- Illegal, access to an uninitialized object!
 

But we can write some elements of the array and access these elements

          a[0] := "first string"
          a[1] := "second string"
 
          s    := a[1]
 
          s    := a[2]         -- Illegal, access to an uninitialized object!
          s    := a[100]       -- Illegal, out of bound access!
          ...
      end
 

If the verifying compiler is able to detect these illegal accesses and let pass only the legal accesses, then the actual runtime code can have the same size as the corresponding C code and the actual array object can be represented by a single pointer to the memory region.

Basic declarations and access functions

The type ARRAY is declared within the module array

  -- module: array
 

We need a formal generic which represents the type of the elements.

  G: ANY              -- Element type
 

There are no constraints on the element type. Any type can be used as element type.

The generic class ARRAY has three basic access functions and an invariant.

  class
      ARRAY[G]
  feature
      capacity: ghost NATURAL
          -- The allocated capacity of the array.
 
      domain:   ghost NATURAL?
          -- The set of indices of elements which have been initialized.
 
      [] (i:NATURAL): G
              -- Element at position `i'.
          require
              i in domain
          note
              built_in
          end
  invariant
      domain <= {i: i<capacity}    
             -- Note: '<=' for sets is the subset relation
  end
 

'capacity' returns the allocated capacity of the array. Since it is a ghost attribute it can be used only in assertions and cannot flow into actual computations. The attribute 'domain' represents the set of indices of array elements which have already been initialized. This is a ghost attribute as well.

The bracket operator ([]) represents the function which returns the element at position 'i'. It uses 'domain' in its precondition to express that only elements at initialized positions can be allocated.

The invariant states that only elements within the allocated capacity can be initialized.

We have presented the above declaration of the class ARRAY in an object oriented manner. Programmers familiar with object oriented languages like C++, Java, C# should have no problems to read such a declaration. But it should be noted that such a form of declaration hides one important thing. All routines declared within the class ARRAY have one implicit argument which is not explicitly mentioned, namely the array object (called 'this' in C++ and Java). In order to make this argument more explicit we present this declaration in a different but equivalent form.

  class
      ARRAY[G]
  end
 
  feature
      capacity(a:CURRENT): ghost NATURAL
 
      domain(a:CURRENT):   ghost NATURAL?
 
      [](a:CURRENT, i:NATURAL): G
          require
              i in a.domain
          note
              built_in
          end
  end
 
  invariant
      all(a:CURRENT)
          ensure
              a.domain <= {i: i < a.capacity}
          end
  end
 

This form of the declaration makes transparent that the used identifiers and operators have the following types:

      capacity: ghost CURRENT->NATURAL
 
      domain:   ghost CURRENT->NATURAL?
 
      ([]):     [CURRENT,NATURAL] -> G
 

Remark: In our programming language we use the Haskell convention that an operator (unary or binary) put between parentheses can be used like an ordinary function. I.e. for numbers the expressions '7+3' and '(+)(7,3)' are equivalent, but the first one is preferable because it is much more readable.

If something looks like an attribute in object oriented notation it is actually a function because it needs an argument. Therefore an assignment to an attribute is something fundamentally different from the assignment to a local variable. An assignment of a new value to an attribute is a modification of the corresponding function.

Note that the unfolded form of the class invariant makes evident that the invariant is not only an object invariant but an invariant for the whole system. In the case of such simple classes as ARRAY this makes no difference. But if we look at linked structures the difference is fundamental.

Implicit invariant

The function domain has been used in the precondition of the function ([]). This creates the following implicit invariant.

  invariant
      ([]).domain = {a:CURRENT, i:NATURAL: i in a.domain}
  end
 

Since procedures might modify the functions ([]) and domain, they have to respect the invariant, the explicit and the implicit invariant.

Dependence of the access functions

The basic access functions capacity, domain and ([]) are independent in the sense that the value of one function does not depend on the value of any other or these basic functions. There is a consistency condition between domain and ([]) because the function domain is used in the precondition of ([]) (see previous chapter). However the value returned by ([]) does not depend on the function domain.

The basic access functions capacity and domain have another consistency condition expressed in the invariant. Note that consistency conditions do not establish a dependence.

Based on the basic access functions we can define other functions. E.g. we might define a function is_full which returns true if all positions in the array are accessible.

      is_full(a:CURRENT): ghost
              -- Are all positions of the array `a' accessible?
          ensure
              Result = ({i: i < a.capacity} = a.domain)
          end
 

This function has a clear dependency: It depends on the functions capacity and domain. This implies that any modification of the functions capacity and/or domain modifies implicitly the function is_full as well.

Creation of an object

In order to be able to create an object of type array we need a creation procedure (or a constructor called in languages like C++, Java and C#).

  class
      ARRAY[G]
  create
      with_capacity(c:NATURAL)
              -- Create an array with capacity `c'.
          note
              built_in
          ensure
              domain   = 0          -- '0' is the empty set
              capacity = c
          end
  end
 

In order to make the hidden argument more transparent we take the declaration out of the class and get the following equivalent declaration.

  create
      with_capacity(a:CURRENT, c:NATURAL)
          note
              built_in
          ensure
              a.domain   = 0
              a.capacity = c
          end
  end
 

Each creation procedure receives as its first argument an allocated but uninitialized object of the corresponding type. The task of the creation procedure is to initialize the object such that it satisfies the consistency conditions expressed in the invariant.

The postcondition states that the creation procedure has an effect on the functions 'domain' and 'capacity'. Remember that the corresponding types are CURRENT->NATURAL? and CURRENT->NATURAL.

Expressions like 'a.domain=0' in postconditions of procedures are interpreted in our language rather strictly. It reads: the identifier 'domain' identifies a new function which is the same as the function which it identified before the call except for the argument 'a'. Only functions mentioned in the postcondition (and functions which depend on them) are modified by a procedure. I.e. the declaration of the creation procedure is interpreted in the following manner.

  create
      with_capacity(a:CURRENT, c:NATURAL)
          note
              built_in
          ensure
              domain   = old (domain + [a,0])
              capacity = old (capacity + [a,c])
              ([])     = old ([])
          end
  end
 

For any function f:A->B the expression 'f+[a,b]' is defined as

      f + [a,b] =  (x -> if x=a then b else f(x) end)
 

with the consequences

      (f+[a,b]).domain = f.domain + {a}
 
      x=a  => (f+[a,b])(x) = b
 
      x/=a => (f+[a,b])(x) = f(x)
 

I.e. we view functions as sets of key value pairs then the expression 'f+[a,b]' is the set of key value pairs of the function 'f' except for the key 'a' where the pair [a,b] either overrides the one already contained in 'f' or is added to the set of key value pairs.

The strict interpretation of the postcondition of 'with_capacity' corresponds to our intuition if we read expressions like 'a.domain=0' in the postcondition of a procedure. Our intuition tells us the 'a.domain' has been set to '0' but 'b.domain', 'c.domain', ... for 'b' and 'c' different from 'a' have still the same value.

The strict interpretation of postconditions of procedures allows us to interpret the postconditions as framing conditions. A postcondition of a procedure contains everything the procedure has modified and all information needed to derived the things which have not been modified. Postconditions of procedures are treated as implicit framing conditions.

Put elements into the array

We need a procedure to put elements at certain positions.

  class
      ARRAY[G]
  feature
      put(i:NATURAL, v:G)
              -- Update (or set) the value `v' at position `i'.
          require
              i < capacity
          note
              built_in
          ensure
              domain     = old (domain + {i})
              Current[i] = v
          end
  end
 

In order to make the arguments more transparent we use the following equivalent declaration of the procedure 'put'.

  feature
      put(a:CURRENT, i:NATURAL, v:G)
          require
              i < a.capacity
          note
              built_in
          ensure
              a.domain   = old (a.domain + {i})
              Current[i] = v
          end
  end
 

The procedure 'put' modifies two functions, the ghost function 'domain' and the element access function represented by the bracket operator ([]). The verifier extracts the following strict interpretation of this procedure.

  feature
      put(a:CURRENT, i:NATURAL, v:G)
          require
              i < a.capacity
          note
              built_in
          ensure
              domain   = old (domain + [a, a.domain + {i}])
              ([])     = old (([])   + [[a,i],v])
              capacity = old capacity
          end
  end
 

I.e. the procedure put assigns new values to the functions domain and ([]). Since the function capacity is independent of the functions domain and ([]) and the postcondition does not mention neither the function capacity nor any function which depend on it, the strict postcondition is extracted that capacity remains unchanged.

There is one interesting thing to note about the procedure put and the function ([]). The procedure put modifies the function ([]) at just one position and updates the domain appropriately. The signatures of ([]) and put are consistent in the sense that they share the same arguments. The procedure put has just one argument more which is the value to be set. These facts are sufficient for the compiler to derive that put is an assigner for the function ([]). We can use this to write more readable code. Instead of writing

      a.put(i,v)
 

we can use the more suggestive notation

      a[i] := v
 

Swap elements

It is easy to write a procedure which swaps two elements of an array.

  feature
      swap(a:CURRENT, i,j:NATURAL)
          require
              i in a.domain
              j in a.domain
          ensure
              a[i] = old a[j]
              a[j] = old a[i]
          end
  end
 

Because of the fact that the function ([]) has the assigner command put the procedure swap needs no explicit implementation. The postcondition is sufficient because it states that the function ([]) gets new values at the argument positions [a,i] and [a,j]. The assigner command put can be used to derive the implementation. The automatically derived implementation looks like

      local
          tmp1, tmp2:G := a[i],a[j]
      do
          a.put(i,tmp2)
          a.put(j,tmp1)
      end
 

As for the previous procedures we give the strict interpretation of the postcondition.

  feature
      swap(a:CURRENT, i,j:NATURAL)
          require
              i in a.domain
              j in a.domain
          ensure
              domain   = old domain
              capacity = old capacity
 
              ([]) = old (([]) <+ {i -> a[j], j -> a[j]})
          end
  end
 

In this postcondition we have used a notation which needs some explanation. In our programming language we can define a function f:A->B by giving a set of key value pairs.

      local
           f:STRING->NATURAL
      do
           ...
           f := {"one" -> 1, "two" -> 2}
 
           -- An illegal assignment
           f := {"one" -> 1, "two" -> 2, "one" -> 2}
           ...
      end
 

The verifier generates the proof obligation that the same key must have the same value. Therefore the second assignment in the example is illegal. Let us look at the proof obligation generated for the second assignment.

      a1: "one" = "two" => 1 = 2
      a2: "one" = "one" => 1 = 2
 

The assertion a1 is provable because the antecedent of the implication is false and from a false antecedent anything can be in the consequent. The assertion a2 is definitely not provable because it requires the verifier to prove '1=2' which is not possible.

Going back to the procedure swap we can see that '{i->a[j],j->[a[i]}' is a function of type NATURAL->G with generates the proof obligation

      i = j  =>  a[i] = a[j]
 

which is trivially valid.

Furthermore in the postcondition we have used the function override operator '<+' which has the following definition for two functions 'f,g:A->B'.

      f <+ g = (x -> if     x in g.domain then g(x)
                     elseif x in f.domain then f(x)
                     end)
 

with the consequences

      (f<+g).domain = f.domain + g.domain
 
      x in g.domain          => (f<+g)(x) = g(x)
 
      x in f.domain-g.domain => (f<+g)(x) = f(x)
 

Using these definitions one can verify that the above mentioned strict postcondition has the intended meaning.

Emacs suffix

 Local Variables:
 mode: outline
 coding: iso-latin-1
 outline-regexp: "=\\(=\\)*"
 End:
Table of contents

- Introduction

- Basic declarations and access functions

- Implicit invariant

- Dependence of the access functions

- Creation of an object

- Put elements into the array

- Swap elements

- Emacs suffix


ip-location