SourceForge.net LogoThe Eiffel Compiler / Interpreter (tecomp)

doc/papers/proof/heapsort

The heap sort algorithm

Introduction

The heapsort algorithm is a sorting algorithm which has time complexity O(n log n) and does not need additional space. Merge sort has the same time complexity but usually needs additional space. As opposed to merge sort, the heap sort algorithm is not stable i.e. it might reverse the order of equal elements.

The basic program elements of the heapsort algorithm not only work for sorting. They can be applied to implement priority queues. A priority queue is a collection of elements where the largest element is accessible very fast.

The basic definitions

Abstract array

The program elements of heapsort work on abstract arrays, i.e. mutable structures which inherit conformantly from

  deferred class ABSTRACT_ARRAY[G] feature
      content: ghost LIST[G]
              -- List representing the content of the array.
          deferred end
 
      count: NATURAL
              -- The number of elements in the array
          deferred ensure Result = content.count end
 
      [] (i:NATURAL): G
              -- The ith element of the array
          require i < count
          deferred ensure  Result ~ content[i]
          end
 
      swap(i,j:NATURAL)
              -- Swap the elements at position `i' and `j'.
          require i<count; j<count deferred ensure
              content = old content.swapped(i,j)
          end
  end
 

The module

We put all code for heap sorting into a module call "heap_sorter". The module defines a class HEAP_SORTER with a formal generic G which has the constraint that the type G must have an order structure (i.e. a total order).

  -- module: heap_sorter
 
  immutable class HEAP_SORTER[G:ORDER] end
 
  ...
 

The only purpose of the class HEAP_SORTER is to define the formal generic G and its constraint. With this technique it can be avoided that each function, procedure and asssertion within the module needs to define its own formal generic.

Heap structure

The content of the abstract array will be viewed as a binary tree. The root of the tree is the first element. The element at index position "i" has a left child at position "2*i+1" and a right child at position "2*i+2".

The binary tree is depicted in the following drawing.

                       l[0]
                     /      \
                    /        \
                   /          \
                 l[1]         l[2]
                 /  \         /
                /    \       /
              l[3]  l[4]    l[5]   ....
 
              ...........
 

Note that each level except the last one of the binary tree is full.

We define some index functions to compute the index of the left and right child and the index of the parent.

  feature      -- Index functions
      left(i:NATURAL): NATURAL     ensure Result = 2*i+1 end
 
      right(i:NATURAL): NATURAL    ensure Result = 2*i+2 end
 
      is_left(i:NATURAL): BOOLEAN  ensure Result = (i mod 2 = 1) end
 
      is_right(i:NATURAL): BOOLEAN ensure Result = (i>0 and i mod 2 = 0) end
 
      parent(i:NATURAL): NATURAL
          require i>0 ensure Result = (i-1)/2 end
 
      all(i:NATURAL)
          ensure
              i.is_left  => i>0
              i.is_right => i>0
 
              i.left.is_left
              i.right.is_right
 
              i.left.parent  = i
-             i.right.parent = i
 
              i.is_left  => i.parent.left  = i
              i.is_right => i.parent.right = i
          end
  end
 

A sequence has a heap structure if each parent has a higher value than its both childs. In order to define this property formally we use a ghost predicate "is_heap".

  feature
      is_heap(l:LIST[G]): ghost BOOLEAN
               -- Does the list `l' has a heap structure, i.e. each child
               -- is less or equal than its parent?
           ensure
               Result = all(i:NATURAL)
                            require 
                                0<i
                                i<l.count
                            ensure
                                l[i]<=l[i.parent]
                            end
           end
  end
 

Since the order relation is transitive not only the direct childs of a parent have lower values but also all the children of the children and so on.

If a sequence has a heap structure then the element with the highest value is always the first element of the sequence.

Build a heap structure

Initially our array is completely unsorted. Before sorting the array we rearrange the elements such that the content has a heap structure.

Outline of the algorithm

We need a loop to achieve a heap structure. Therefore we need an invariant which is easy to establish and to maintain and which together with the exit condition guarantees the heap structure.

Let us look at the sequence

  [5, 0, 1, 5, 3, 4]
 

which is represented by the binary tree (2:1 is value 1 at index 2)

           .. 0:5 ..
          /         \
         /           \
       1:0           2:1
       /  \          /
      /    \        /
    3:5    4:3    5:4
 

The leave nodes at index 3, 4 and 5 have already a heap structure because they have no children. In order to describe this situation we define the predicate "is_heap_from".

  feature
      is_heap_from(l:LIST[G], i:NATURAL): ghost BOOLEAN
              -- Are all elements of the list `l' from index `i' on valid
              -- parents of a heap structure?
          ensure
              Result = all(j:NATURAL)
                           require
                               0 <  j
                               j <  l.count
                               i <= j.parent
                           ensure
                               l[j]<=l[j.parent]
                           end
          end
 
      all(l:LIST[G], c:NATURAL)
          ensure
              (c/2).left >= c
              l.is_heap_from(l.count/2)
              l.is_heap_from(0) => l.is_heap
              l.is_heap_from(i) => l.is_heap_from(i+1)
          end
  end  
 

The predicate "l.is_heap_from(i)" is a good canditate for a loop invariant since it is valid for "i=l.count/2" and the condition "i=0" implies the goal that the list has a heap structure. I.e. the procedure "make_heap" has the outline

  make_heap(a:ABSTRACT_ARRAY[G])
      local
          i: NATURAL
      do
          from
              i := a.count/2
          invariant
              i <= a.count
              a.content.is_heap_from(i)
              a.content.is_permutation(old a.content)
          variant
              i
          until
              i = 0
          loop
              i := i-1                -- decrease the variant
              a.sift_down(i,a.count)  -- reestablish invariant (see below)
          end
      ensure
          a.content.is_heap
          a.content.is_permutation(old a.content)
      end
 

Reestablish the invariant: "sift_down"

If the invariant "l.is_heap_from(i)" is valid and we decrease the index "i" we might temporarily violate the invariant because we add a root element to the partial heap which is not yet a valid parent for its children because its value might be less or equal than the left or the right child.

E.g. in the above tree

           .. 0:5 ..
          /         \
         /           \
       1:0           2:1
       /  \          /
      /    \        /
    3:5    4:3    5:4
 

we have "l.is_heap_from(3)" because the elements at index 3, 4 and 5 are valid parents. However the condition "l.is_heap_from(2) is not valid, because the value at position 2 is less than the value of its left child at position 5 (the node at position 2 does not have a right child in that case).

This defect can be corrected easily. Just identify the child which has a greater value than the root and swap the two values. This pushes the defect one level down the tree. I.e. we have to continue until the node is a valid parent or until it has no children. This can be easily done with a loop.

In order to describe the invariant we have to find a predicate which describes a partial heap with one defect position.

  feature           -- partial heap with one defect position
      is_nearly_heap_from(l:LIST[G], i,k:NATURAL)
              -- Are all elements of the list `l' from index `i' on except at
              -- position `k' valid parents of a heap structure?
          ensure
              Result = all(j:NATURAL)
                           require
                               0 <  j
                               j <  l.count
                               i <= j.parent
                               k /= j.parent
                           ensure
                               l[j] <= l[j.parent]
                           end
          end
 
      is_valid_parent(p:NATURAL, l:LIST[G]: ghost BOOLEAN
              -- Is the element at position `p' within the list `l' a valid
              -- parent of a heap structure (i.e. left and right child have
              -- lower values)?
          ensure
              Result = ((p.left<l.count => l[p.left]<=l[p])
                        and
                        (p.right<l.count => l[p.right]<=l[p]))
          end
 
      all(l:LIST[G], i,k:NATURAL)
          require
              l.is_nearly_heap_from(i,k)
              l.count<=k.left or k.is_valid_parent(l)
          ensure
              l.is_heap_from(i)
          end
 
      all(l:LIST[G], i:NATURAL)
          ensure
              l.is_heap_from(i+1) => l.is_nearly_heap_from(i,i)
          end
  end
 

With this definition and assertion we can formulate the procedure "sift_down".

  feature
      sift_down(a:ABSTRACT_ARRAY[G], i,cnt:NATURAL)
              -- Sift down a defect at position `i' of a partial heap of the
              -- array `a'. Only consider the elements below `cnt'.
          require
              cnt <= a.count
              i < cnt
              a.content.take(cnt).is_heap_from(i+1)
          local
              parent,child: NATURAL
              good_parent:  BOOLEAN
          do
              from
                  parent,good_parent := i,False
              invariant
                  parent <  cnt
                  a.content.take(cnt).is_nearly_heap_from(i,parent)
                  a.content.is_permutation(old a.content)
                  good_parent => parent.is_valid_parent(a.content.take(cnt))
              variant
                  cnt - parent
              until
                  parent.left>=cnt or good_parent
              loop
                  child := parent.left
                  if child+1<a.count and a[child]<a[child+1]
                            -- find the child with the highest value
                  then
                      child := child+1
                  end
                  if a[child] > a[parent] then
                      a.swap(parent,child)   -- Swap if necessary
                  else
                      good_parent := True
                  end
                  parent := child
              end
          ensure
              a.content.take(cnt).is_heap_from(i)
              a.content.is_permutation(old a.content)
          end
  end
 

Sort the heap

If the array has already a heap structure (i.e. all parents have larger values than their children) it is easy to sort the array.

We maintain the condition that the first part of the array has a heap structure and the second part of the array is sorted. We describe sortedness by the following predicate.

  feature
      is_sorted(l:LIST[G]): ghost BOOLEAN
          ensure
              Result = all(a,b:G) [a,b] in l.tail_zipped => a<=b
          end
  end
 

Then conditions

  a.content.take(i).is_heap
  a.content.drop(i).is_sorted
  all(e,f:G) e in a.content.take(i) => f in a.content.drop(i) => e<=f
 

are easy to establish with "i=a.count". The condition "i=0" guarantees that the whole array is sorted.

If we decrement "i" we violate the second part of the invariant. The condition can be reestablished by swapping the i-th element with the first (the first is the largest in the heap and all elements in the heap are smaller than the elements of the already sorted part).

But this violates the first condition of the invariant because we put an element at the top of the tree which has usually a too small value to be the top of the tree. With a call to the procedure "sift_down" it is possible to reestablish this condition as well.

  feature
      sort_heap(a:ABSTRACT_ARRAY[G])
          require
              a.content.is_heap
          local
              i:NATURAL
          do
              from
                  i := a.count
              invariant
                  i_1: i <= a.count
                  i_2: a.content.take(i).is_heap
                  i_3: a.content.drop(i).is_sorted
                  i_4: all(e,f:G) e in a.content.take(i) 
                                  => f in a.content.drop(i) 
                                  => e<=f
                  i_5: a.content.is_permutation(old a.content)
              variant
                  i
              until
                  i = 0
              loop
                  i := i-1         -- decrease variant
                  a.swap(0,i)      -- reestablish i_3
                  a.sift_down(0,i) -- reestablish i_2
                  -- i_1, i_4 and i_5 are never violated
              end
          ensure
              a.content.is_sorted
          end
  end
 

The complete heap sort

  feature
      sort(a:ABSTRACT_ARRAY[G])
          do
              a.make_heap
              a.sort_heap
          ensure
              a.content.is_sorted
              a.content.is_permutation(old a.content)
          end
  end
 

Priority queue

A priority queue is a sequence of elements which is maintained to have a heap structure. With this condition it is possible to access the largest element very fast since it is always the first element.

An empty sequence satisfies the condition of being a heap. Since arrayed containers allow to insert elements rapidly at the end an insertion violates the heap condition in a different manner as in the above algorithm "make_heap". We do not introduce a defect at the top of the tree, we introduce a defect at the bottom of the tree.

A single defect in a heap

In order to reestablish the heap condition we need to sift the element up in the tree until it reaches its appropriate position. The following predicate describes a singe defect in a heap structure.

  feature
      is_heap_except_child(l:LIST[G], i:NATURAL) ghost BOOLEAN
              -- Does `l' have a heap structure except for the child at `i'?
          ensure
              Result = all(j:NATURAL)
                           require
                               0 < j
                               j < l.count
                               j/= i
                           ensure
                               l[j]<=l[j.parent]
                           end
          end
 
      all(l:LIST[G], e:G)
          ensure
              l.is_heap_except_child(0) => l.is_heap
              l.is_heap => (l+e).is_heap_except_child(l.count)
          end
 
      all(l:LIST[G], i:NATURAL)
          require
              0 < i
              i < l.count
              l.is_heap_except_child(i)
              l[i] <= l[i.parent]
          ensure
              l.is_heap
          end
  end
 

The procedure "sift_up"

The procedure "sift_up" accepts an array which has a heap structure when neglecting the last element. The algorithm consists of a loop which pushes the defect one level up until there is no longer any defect. I.e. the algorithm reestablishes the heap condition for the full array including the last element.

  feature
      sift_up(a:ABSTRACT_ARRAY[G])
          require
              0 < a.count
              a.content.without_last.is_heap
          local
              i:NATURAL
          do
              from
                  i := a.count-1
              invariant
                  i < a.count
                  a.content.is_heap_except_child(i)
                  a.content.is_permutation(old a.content)
              variant
                  i
              until
                  i=0 or a[i]<=a[i.parent]
              loop
                  a.swap(i.parent,i)
                  i := i-1
              end
          ensure
              a.content.is_heap
              a.content.is_permutation(old a.content)
          end
  end
 

Inserting into and deleting from a priority queue

With the procedures "sift_up" and "sift_down" it is easy to maintain a priority queue with a arrayed structure.

Emacs suffix

 Local Variables:
 mode: outline
 coding: iso-latin-1
 outline-regexp: "=\\(=\\)*"
 End:
Table of contents

- Introduction

- The basic definitions

- Abstract array

- The module

- Heap structure

- Build a heap structure

- Outline of the algorithm

- Reestablish the invariant: "sift_down"

- Sort the heap

- The complete heap sort

- Priority queue

- A single defect in a heap

- The procedure "sift_up"

- Inserting into and deleting from a priority queue

- Emacs suffix


ip-location