SourceForge.net LogoThe Eiffel Compiler / Interpreter (tecomp)

doc/papers/proof/insertion sort

The insertion sort algorithm

Introduction

Insertion sort is one of the simplest sorting algorithms. It is quite effective for small arrays. Its runtime increases with the square of the number of elements to be sorted. Therefore it should not be used with large data sets. For large data sets algorithms like merge sort or heap sort are far more efficient.

All sort algorithms work on abstract arrays. An abstract array can have different implementations. The sort algorithm just need random access to the elements.

An abstract array has the following interface.

  deferred class
    ABSTRACT_ARRAY[G]
  feature
    content: ghost LIST[G]
 
    count: NATURAL
      deferred
      ensure
        Result = content.count
      end
 
    [](i:NATURAL)
      require
        i < count
      deferred
      ensure
        Result ~ content[i]
      end
 
    swap(i,j:NATURAL)
      require
        i < j
        j < a.count
      deferred
      ensure
        a.content = old a.content.swapped(i,j)
      end
  end
 

For the purposes of sorting we need access only to the content (ghost access is sufficient), the length of the array, the individual elements in a random access manner. The only procedure we need is to swap two elements.

Needed list functions

The ghost function "content" returns the elements as a list. This function is needed to describe the properties well. We never need the list in actual computations. But the data type is such a powerful type that it can serve as a "swiss army kife" in many algorithms. The following subsections describe what properties are needed from lists.

Univerally quantified expressions

If we need to verify that all elements of the concatenation of two lists "u+v" satisfy a certain condition "e", then it is sufficient to prove that all elements of the lists "u" and "v" individually satisfy the corresponding condition.

  all(u,v:LIST[G], a:G, e:BOOLEAN)
    require
      all(x:G) x in u => e
      all(x:G) x in v => e
      e[x:=a]
    ensure
      all(x:G) x in (u+v) => e
      all(x:G) x in (u+a) => e
      all(x:G) x in (a::u) => e
    end
 

Take and drop

The function "l.take(i)" returns a list with the first "i" elements of the list "l" or the whole list in case that "i" exceeds the number of elements in the list. "l.drop(i)" returns the list "l" where the first "i" elements have been removed. The functions "take" and "drop" satisfy the following properties.

  all(l:LIST[G], i:NATURAL)
    ensure
      l.take(i).count = min(i,l.count)
 
      l.take(i) + l.drop(i) = l
 
      l.take(l.count) = l
      l.drop(l.count) = nil
 
      i/=0 => i<=l.count
      => l.take(i) = l.take(i-1) + l[i-1]
 
      i<l.count
      => l.drop(i) = l[i]::l.drop(i+1)
 
      i/=0 => i<l.count
      => l = l.take(i-1) + l[i-1]::l[i]::l.drop(i-1)
    end
 

Tail zipping

A list can be zipped with its own tail. Tail zipping enjoys the following properties.

  all(u,v:LIST[G], e1,e2:G)
    ensure
      nil.tail_zipped = nil
 
      (e1::nil).tail_zipped = nil
 
      (e1::e2::u).tail_zipped = [e1,e2]::(e2::u).tail_zipped
 
      u=nil => (u+v).tail_zipped = v.tail_zipped
 
      v=nil => (u+v).tail_zipped = u.tail_zipped
 
      (u+e1+v).tail_zipped = (u+e1).tail_zipped + (e1::v).tail_zipped
 
      u/=nil
      => v/=nil
      => (u+v).tail_zipped
         = u.tail_zipped + [u.last,v.first] + v.tail_zipped
 
    end
 

Example:

  ([1,2,3,4]).tail_zipped = [[1,2],[2,3],[3,4]]
 

A list is sorted if in the tail zipped list all tuples have a first component which is less or equal than its second component.

Swapping

The expression "l.swapped(i,j)" returns a list which is equal to "l" except that the elements in position "i" and "j" are swapped.

  swapped:
  all(l:LIST[G], j:NATURAL)
    require
      j /= 0
      j < l.count
    ensure
      swap1: l.swapped(j-1,j).is_permutation(l)
          -- "is_permutation" is an equivalence relation
 
      swap2: l.swapped(j-1,j).take(j-1) = l.take(j-1)
      swap3: l.swapped(j-1,j).drop(j)   = l[j-1]::l.drop(j+1)
      swap4: l.swapped(j-1,j).drop(j-1) = l[j]::l[j-1]::l.drop(j+1)
    end
 

The insertion sort algorithm

The algorithm starts with an empty prefix of the content. An empty list is always sorted. Then it loops over all elements successively and inserts the new elements into the correct position of the already sorted prefix of the list. The algorithm terminates if all elements are inserted at their proper location.

We put the needed functions and procedures in a module "insertion_sorter".

  -- module: insertion_sorter
 
  immutable class
    INSERTION_SORTER[G:ORDER]
  end
 

The class INSERTION_SORTER is not needed to create objects from it. It just serves to express the constraint that the formal generic G has to satisfy the concept of a total order relation.

Predicate "is_sorted"

We need a predicate which tells whether a list is sorted and whether an array is sorted.

  is_sorted(l:LIST[G]): ghost BOOLEAN
    ensure
      Result = all(x,y:G) [x,y] in l.tail_zipped =>  (x<=y)
    end
 
 
  is_sorted(a:ABSTRACT_ARRAY[G]): ghost BOOLEAN
    ensure
      Result = a.content.is_sorted
    end
 

Note that the predicates need not be computable. It is easy to make them computable. But it is not worth the effort.

If we have two lists "u+e" and "e::v" which are individually sorted the sum "u+e+v" is sorted as well.

  glueing:
  all(u,v:LIST[G], e:G)
    require
      (u+e).is_sorted
      (e::v).is_sorted
    ensure
      (u+e+v).is_sorted
    end
 

This claim can be proved by

  all(u,v:LIST[G], e:G)
    require
      r1: (u+e).is_sorted
      r2: (e::v).is_sorted
    check
      all(x,y:G)
        require
          [x,y] in (u+e+v).tail_zipped
        check
          (u+e+v).tail_zipped = (u+e).tail_zipped + (e::v).tail_zipped
          if [x,y] in (u+e).tail_zipped then
            x<=y  -- r1
          else
            [x,y] in (e::v).tail_zipped
            x<=y  -- r2
          end
        ensure
          x<=y
        end
    ensure
      (u+e+v).is_sorted
    end
 

The basic algorithm

The procedure "sort" has to satisfy the contract:

  feature
    sort(a:ABSTRACT_ARRAY[G])
      ensure
        a.is_sorted
        a.content.is_permutation(old a.content)
      end
  end
 

The algorithm needs a loop which keeps some prefix of length "i" of "a" sorted. I.e. the loop works with the invariant

  i <= a.count
  a.content.take(i).is_sorted
  a.content.is_permutation(old a.content)
 

The invariant is satisfied if we initialize "i" with zero. At each step of the algorithm we have to make the sorted prefix of "a" one element larger. The following algorithm does the job.

  feature
    sort(a:ABSTRACT_ARRAY[G])
      local
        i: NATURAL := 0
      do
        from invariant
          i <= a.count
          a.content.take(i).is_sorted
          a.content.is_permutation(old a.content)
        variant
          a.count - i
        until
          i = a.count
        loop
          insert_ith(a,i)  -- see below
          i := i + 1
        end
      ensure
        a.is_sorted
      end
  end
 

It is easy to verify that after termination of the loop the result is valid. The exit condition states "i=a.count" and the invariant guarantees "a.content.take(i).is_sorted". Both imply the postcondition.

In order to prove the algorithm correct we need to make sure that the invariant is maintained in each iteration of the loop. In order to prove this we need the assignment axiom.

  -- Assigment axiom: "x" must be a local variable and
  --                  must not be an attribute of the surrounding class!
  all[G](x,exp:G, cond:BOOLEAN)
    require
      cond[x:=exp]
    do
      x := exp
    ensure
      cond
    end
 

It basically states the following: "If we want to have a boolean condition "cond" valid after the assignment statement "x:=exp" then we have to guarantee that "cond[x:=exp]" (i.e. cond with all free occurrences of "x" substituted by "exp") is valid before the assignment.

With the assignment axiom we can backtransform the invariant through the statement "i:=i+1" and get

  require
    r1: i+1 <= a.count
    r2: a.content.take(i).is_sorted
    r3: a.content.is_permutation(old a.content)
  do
    i := i + 1
  ensure
    i <= a.count
    a.content.take(i+1).is_sorted
    a.content.is_permutation(old a.content)
  end
 

The procedure "insert_ith" cannot modify "i" (NATURAL is immutable). Therefore the condition "r1" can be derived from the invariant "i<=a.count" and the negated exit condition "i/=a.count".

The condition "a.content.take(i+1).is_sorted" must be established by "insert_ith" and the condition "a.content.is_permutation(old a.content)" must be maintained.

Therefore the procedure "insert_ith" has to satisfy the contract

  require
    i < a.count                  -- from the loop invariant and negated exit
                                 -- condition
 
    a.content.take(i).is_sorted  -- from the loop invariant
 
    a.content.is_permutation(old a.content) -- from the loop invariant
  do
 
    insert_ith(a,i)
 
  ensure
    a.content.take(i+1).is_sorted
    a.content.is_permutation(old a.content)
  end
 

The procedure "insert_ith"

Contract of "insert_ith"

We need a procedure "insert_ith" which satisfies the contract

  feature{NONE}
    insert_ith(a:ABSTRACT_ARRAY[G], i:NATURAL)
      require
        i < a.count
        a.content.take(i).is_sorted
      ensure
        a.content.take(i+1).is_sorted
        a.content.is_permutation(old a.content)
      end
  end
 

The procedure can be private, therefore the "feature{NONE}".

Description of the loop

We need a loop to establish the postcondition, because the element "a[i]" has to be put in the correct position. We swap it with the previous element as long as the previous element greater than the element to be inserted. The loop terminates if the previous element is not greater than the element to be inserted.

In order to specify the algorithm we need a predicate that a list is sorted except at one position.

  is_sorted_except(l:LIST[G], j:NATURAL): ghost BOOLEAN
    ensure
      Result = check
        sorted_without_j: (l.take(j)+l.drop(j+1)).is_sorted
        sorted_from_j:    l.drop(j).is_sorted
      end
    end
 

At the enty point of the routine the condition

  a.content.take(i+1).is_sorted_except(i)
 

is valid because "a.content.take(i).is_sorted" is already valid and "a.content.take(i+1)" has just one element more (the element "i"). Therfore it is sorted except at position "i". Note that "a.content.take(i+1).drop(i+1)" is empty and "a.content.take(i+1).drop(i)" has just one element.

The algorithm uses a loop with the loop variable "j". The variable "j" has initially the value "i". The loop body decreases the variable "j" at each iteration maintaining the invariant.

We use the following invariant

  j <= i
  a.content.is_permutation(old a.content)
  a.content.take(i+1).is_sorted_except(j)
 

and the following pattern for the algorithm:

  insert_ith(a:ABSTRACT_ARRAY[G], i:NATURAL)
    require
      i < a.count
      a.content.take(i).is_sorted
    local
      j:NATURAL := i
    do
      from invariant
        inv1: j <= i
        inv2: a.content.is_permutation(old a.content)
        inv3: a.content.take(i+1).is_sorted_except(j)
      variant
        j
      until
        -- a.content.take(i+1).is_sorted
      loop
        -- establish the invariant for decreased j
        j := j-1
      end
    ensure
      a.content.take(i+1).is_sorted
      a.content.is_permutation(old a.content)
    end
 

The algorithm is already correct, we just need to fill in the missing parts.

Exit condition

Since we have the invariant "inv3" we know that "a.content.take(i+1).drop(j)" is always sorted. In case of "j=0" this is identical to "a.content.take(i+1)" and we can terminate the loop. Therefore "j=0" is one possible exit condition.

But there is a second possibility. The element at the exception position "j" might fit already well. This is the case if "a[j-1]<=a[j]" is valid. I.e. the complete exit condition is

  j=0 or a[j-1]<=a[j]  -- exit condition
 

We have to prove that the exit condition and the invariant guarantee the desired result. The following lemma verifies exactly that condition.

  all(l:LIST[G], j:NATURAL)
    require
      r1: j < l.count
      r2: l.is_sorted_except(j)
      r3: j=0 or l[j-1]<=l[j]
    check
      if j=0 then
        c1: l.drop(0).is_sorted  -- r2
        c2: l.is_sorted
      else
        c3: l[j-1] <= l[j]                         -- r3 and "j/=0"
        c4: (l[j]::l.drop(j+1)).is_sorted          -- r3, "sorted_from_j"
        c5: (l[j-1]::l[j]::l.drop(j+1).is_sorted   -- c3,c4
        c6: (l.take(j-1)+l[j-1]).is_sorted         -- r3, "sorted_without_j"
        c7: (l.take(j-1)+l[j-1]::l[j]::l.drop(j+1)).is_sorted
            -- paste c5,c6
        c8: l.is_sorted  -- c7
      end
    ensure
      l.is_sorted -- c2,c8
    end
 

Establish the invariant for decreased "j"

At the entry point of the loop body the exit condition is invalid (otherwise the loop would be terminated). The negated exit condition is

  j/=0 and a[j] < a[j-1]  -- negated exit condition
 

i.e. "a[j-1]" and a[j] are out of order. The element a[j] is still to small for its position. We simply swap the two elements. Since the sequence without "a[j]" is perfectly sorted, the invariant is kept. Just the possible out of order element has been pushed one position down.

We get the complete loop

  from invariant
    inv1: j <= i
    inv2: a.content.is_permutation(old a.content)
    inv3: a.content.take(i+1).is_sorted_except(j)
  variant
    j
  until
    j=0  or a[j-1]<=a[j]
  loop
    a.swap(j-1,j)
    j := j-1
  end
 

The following lemma verifies that the invariant "inv3" is maintained.

  all(l,ls:LIST[G], i,j:NATURAL)
    require
      r1: j /= 0
      r2: j < l.count
      r3: l.is_sorted_except(j)
      r4: l[j]<=l[j-1]
    check
      c01: (l.take(j) + l.drop(j+1)).is_sorted  -- r3, def "is_sorted_except"
      c02: l.drop(j).is_sorted                  -- r3, def "is_sorted_except"
      c03: (l[j]::l.drop(j+1)).is_sorted        -- c02
      c04: (l[j-1]::l.drop(j+1)).is_sorted      -- c01
 
      c09: (l.take(j-1) + l[j-1]::l.drop(j+1)).is_sorted  -- c01, assoc
      c10: (l[j]::l[j-1]::l.drop(j+1)).is_sorted          -- c03,c04,r4
    ensure
      l.swapped(j-1,j).is_sorted_except(j-1)
               -- c09, c10, swap2, swap3, swap4, def "is_sorted_except"
    end
  end
 

The complete algorithm

  -- module: insertion_sorter
 
  immutable class
      INSERTION_SORTER[G:ORDER]
  end
 
 
  feature
      is_sorted(l:LIST[G]): ghost BOOLEAN
          ensure
              Result = all(x,y:G) [x,y] in l.tail_zipped =>  (x<=y)
          end
 
 
      is_sorted(a:ABSTRACT_ARRAY[G]): ghost BOOLEAN
          ensure
              Result = a.content.is_sorted
          end
 
      sort(a:ABSTRACT_ARRAY[G])
          local
              i: NATURAL := 0
          do
              from invariant
                  i <= a.count
                  a.content.take(i).is_sorted
                  a.content.is_permutation(old a.content)
              variant
                  a.count - i
              until
                  i = a.count
              loop
                  insert_ith(a,i)  -- see below
                  i := i + 1
              end
          ensure
              a.is_sorted
          end
  end
 
 
  feature {NONE}
      is_sorted_except(l:LIST[G], j:NATURAL): ghost BOOLEAN
          ensure
              Result = check
                  sorted_without_j: (l.take(j)+l.drop(j+1)).is_sorted
                  sorted_from_j:    l.drop(j).is_sorted
              end
          end
 
      insert_ith(a:ABSTRACT_ARRAY[G], i:NATURAL)
          require
              i < a.count
              a.content.take(i).is_sorted
          local
              j:NATURAL := i
          do
              from invariant
                  inv1: j <= i
                  inv2: a.content.is_permutation(old a.content)
                  inv3: a.content.take(i+1).is_sorted_except(j)
              variant
                  j
              until
                  j=0  or a[j-1]<=a[j]
              loop
                  a.swap(j-1,j)
                  j := j-1
              end
          ensure
              a.content.take(i+1).is_sorted
              a.content.is_permutation(old a.content)
          end
  end
 

Emacs suffix

 Local Variables:
 mode: outline
 coding: iso-latin-1
 outline-regexp: "=\\(=\\)*"
 End:
Table of contents

- Introduction

- Needed list functions

- Univerally quantified expressions

- Take and drop

- Tail zipping

- Swapping

- The insertion sort algorithm

- Predicate "is_sorted"

- The basic algorithm

- The procedure "insert_ith"

- Contract of "insert_ith"

- Description of the loop

- Exit condition

- Establish the invariant for decreased "j"

- The complete algorithm

- Emacs suffix


ip-location