heapq.py revision fbb299226d486b63459c0ec084e191c7eee6ddb9
1# -*- coding: Latin-1 -*-
2
3"""Heap queue algorithm (a.k.a. priority queue).
4
5Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
6all k, counting elements from 0.  For the sake of comparison,
7non-existing elements are considered to be infinite.  The interesting
8property of a heap is that a[0] is always its smallest element.
9
10Usage:
11
12heap = []            # creates an empty heap
13heappush(heap, item) # pushes a new item on the heap
14item = heappop(heap) # pops the smallest item from the heap
15item = heap[0]       # smallest item on the heap without popping it
16heapify(heap)        # transform list into a heap, in-place, in linear time
17
18Our API differs from textbook heap algorithms as follows:
19
20- We use 0-based indexing.  This makes the relationship between the
21  index for a node and the indexes for its children slightly less
22  obvious, but is more suitable since Python uses 0-based indexing.
23
24- Our heappop() method returns the smallest item, not the largest.
25
26These two make it possible to view the heap as a regular Python list
27without surprises: heap[0] is the smallest item, and heap.sort()
28maintains the heap invariant!
29"""
30
31# Original code by Kevin O'Connor, augmented by Tim Peters
32
33__about__ = """Heap queues
34
35[explanation by Fran�ois Pinard]
36
37Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
38all k, counting elements from 0.  For the sake of comparison,
39non-existing elements are considered to be infinite.  The interesting
40property of a heap is that a[0] is always its smallest element.
41
42The strange invariant above is meant to be an efficient memory
43representation for a tournament.  The numbers below are `k', not a[k]:
44
45                                   0
46
47                  1                                 2
48
49          3               4                5               6
50
51      7       8       9       10      11      12      13      14
52
53    15 16   17 18   19 20   21 22   23 24   25 26   27 28   29 30
54
55
56In the tree above, each cell `k' is topping `2*k+1' and `2*k+2'.  In
57an usual binary tournament we see in sports, each cell is the winner
58over the two cells it tops, and we can trace the winner down the tree
59to see all opponents s/he had.  However, in many computer applications
60of such tournaments, we do not need to trace the history of a winner.
61To be more memory efficient, when a winner is promoted, we try to
62replace it by something else at a lower level, and the rule becomes
63that a cell and the two cells it tops contain three different items,
64but the top cell "wins" over the two topped cells.
65
66If this heap invariant is protected at all time, index 0 is clearly
67the overall winner.  The simplest algorithmic way to remove it and
68find the "next" winner is to move some loser (let's say cell 30 in the
69diagram above) into the 0 position, and then percolate this new 0 down
70the tree, exchanging values, until the invariant is re-established.
71This is clearly logarithmic on the total number of items in the tree.
72By iterating over all items, you get an O(n ln n) sort.
73
74A nice feature of this sort is that you can efficiently insert new
75items while the sort is going on, provided that the inserted items are
76not "better" than the last 0'th element you extracted.  This is
77especially useful in simulation contexts, where the tree holds all
78incoming events, and the "win" condition means the smallest scheduled
79time.  When an event schedule other events for execution, they are
80scheduled into the future, so they can easily go into the heap.  So, a
81heap is a good structure for implementing schedulers (this is what I
82used for my MIDI sequencer :-).
83
84Various structures for implementing schedulers have been extensively
85studied, and heaps are good for this, as they are reasonably speedy,
86the speed is almost constant, and the worst case is not much different
87than the average case.  However, there are other representations which
88are more efficient overall, yet the worst cases might be terrible.
89
90Heaps are also very useful in big disk sorts.  You most probably all
91know that a big sort implies producing "runs" (which are pre-sorted
92sequences, which size is usually related to the amount of CPU memory),
93followed by a merging passes for these runs, which merging is often
94very cleverly organised[1].  It is very important that the initial
95sort produces the longest runs possible.  Tournaments are a good way
96to that.  If, using all the memory available to hold a tournament, you
97replace and percolate items that happen to fit the current run, you'll
98produce runs which are twice the size of the memory for random input,
99and much better for input fuzzily ordered.
100
101Moreover, if you output the 0'th item on disk and get an input which
102may not fit in the current tournament (because the value "wins" over
103the last output value), it cannot fit in the heap, so the size of the
104heap decreases.  The freed memory could be cleverly reused immediately
105for progressively building a second heap, which grows at exactly the
106same rate the first heap is melting.  When the first heap completely
107vanishes, you switch heaps and start a new run.  Clever and quite
108effective!
109
110In a word, heaps are useful memory structures to know.  I use them in
111a few applications, and I think it is good to keep a `heap' module
112around. :-)
113
114--------------------
115[1] The disk balancing algorithms which are current, nowadays, are
116more annoying than clever, and this is a consequence of the seeking
117capabilities of the disks.  On devices which cannot seek, like big
118tape drives, the story was quite different, and one had to be very
119clever to ensure (far in advance) that each tape movement will be the
120most effective possible (that is, will best participate at
121"progressing" the merge).  Some tapes were even able to read
122backwards, and this was also used to avoid the rewinding time.
123Believe me, real good tape sorts were quite spectacular to watch!
124From all times, sorting has always been a Great Art! :-)
125"""
126
127def heappush(heap, item):
128    """Push item onto heap, maintaining the heap invariant."""
129    pos = len(heap)
130    heap.append(None)
131    while pos:
132        parentpos = (pos - 1) >> 1
133        parent = heap[parentpos]
134        if item >= parent:
135            break
136        heap[pos] = parent
137        pos = parentpos
138    heap[pos] = item
139
140# The child indices of heap index pos are already heaps, and we want to make
141# a heap at index pos too.
142def _siftdown(heap, pos):
143    endpos = len(heap)
144    assert pos < endpos
145    item = heap[pos]
146    # Sift item into position, down from pos, moving the smaller
147    # child up, until finding pos such that item <= pos's children.
148    childpos = 2*pos + 1    # leftmost child position
149    while childpos < endpos:
150        # Set childpos and child to reflect smaller child.
151        child = heap[childpos]
152        rightpos = childpos + 1
153        if rightpos < endpos:
154            rightchild = heap[rightpos]
155            if rightchild < child:
156                childpos = rightpos
157                child = rightchild
158        # If item is no larger than smaller child, we're done, else
159        # move the smaller child up.
160        if item <= child:
161            break
162        heap[pos] = child
163        pos = childpos
164        childpos = 2*pos + 1
165    heap[pos] = item
166
167def heappop(heap):
168    """Pop the smallest item off the heap, maintaining the heap invariant."""
169    lastelt = heap.pop()    # raises appropriate IndexError if heap is empty
170    if heap:
171        returnitem = heap[0]
172        heap[0] = lastelt
173        _siftdown(heap, 0)
174    else:
175        returnitem = lastelt
176    return returnitem
177
178def heapify(heap):
179    """Transform list heap into a heap, in-place, in O(len(heap)) time."""
180    n = len(heap)
181    # Transform bottom-up.  The largest index there's any point to looking at
182    # is the largest with a child index in-range, so must have 2*i + 1 < n,
183    # or i < (n-1)/2.  If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
184    # j-1 is the largest, which is n//2 - 1.  If n is odd = 2*j+1, this is
185    # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
186    for i in xrange(n//2 - 1, -1, -1):
187        _siftdown(heap, i)
188
189if __name__ == "__main__":
190    # Simple sanity test
191    heap = []
192    data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
193    for item in data:
194        heappush(heap, item)
195    sort = []
196    while heap:
197        sort.append(heappop(heap))
198    print sort
199