heapq.py revision fbb299226d486b63459c0ec084e191c7eee6ddb9
1# -*- coding: Latin-1 -*- 2 3"""Heap queue algorithm (a.k.a. priority queue). 4 5Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for 6all k, counting elements from 0. For the sake of comparison, 7non-existing elements are considered to be infinite. The interesting 8property of a heap is that a[0] is always its smallest element. 9 10Usage: 11 12heap = [] # creates an empty heap 13heappush(heap, item) # pushes a new item on the heap 14item = heappop(heap) # pops the smallest item from the heap 15item = heap[0] # smallest item on the heap without popping it 16heapify(heap) # transform list into a heap, in-place, in linear time 17 18Our API differs from textbook heap algorithms as follows: 19 20- We use 0-based indexing. This makes the relationship between the 21 index for a node and the indexes for its children slightly less 22 obvious, but is more suitable since Python uses 0-based indexing. 23 24- Our heappop() method returns the smallest item, not the largest. 25 26These two make it possible to view the heap as a regular Python list 27without surprises: heap[0] is the smallest item, and heap.sort() 28maintains the heap invariant! 29""" 30 31# Original code by Kevin O'Connor, augmented by Tim Peters 32 33__about__ = """Heap queues 34 35[explanation by Fran�ois Pinard] 36 37Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for 38all k, counting elements from 0. For the sake of comparison, 39non-existing elements are considered to be infinite. The interesting 40property of a heap is that a[0] is always its smallest element. 41 42The strange invariant above is meant to be an efficient memory 43representation for a tournament. The numbers below are `k', not a[k]: 44 45 0 46 47 1 2 48 49 3 4 5 6 50 51 7 8 9 10 11 12 13 14 52 53 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 54 55 56In the tree above, each cell `k' is topping `2*k+1' and `2*k+2'. In 57an usual binary tournament we see in sports, each cell is the winner 58over the two cells it tops, and we can trace the winner down the tree 59to see all opponents s/he had. However, in many computer applications 60of such tournaments, we do not need to trace the history of a winner. 61To be more memory efficient, when a winner is promoted, we try to 62replace it by something else at a lower level, and the rule becomes 63that a cell and the two cells it tops contain three different items, 64but the top cell "wins" over the two topped cells. 65 66If this heap invariant is protected at all time, index 0 is clearly 67the overall winner. The simplest algorithmic way to remove it and 68find the "next" winner is to move some loser (let's say cell 30 in the 69diagram above) into the 0 position, and then percolate this new 0 down 70the tree, exchanging values, until the invariant is re-established. 71This is clearly logarithmic on the total number of items in the tree. 72By iterating over all items, you get an O(n ln n) sort. 73 74A nice feature of this sort is that you can efficiently insert new 75items while the sort is going on, provided that the inserted items are 76not "better" than the last 0'th element you extracted. This is 77especially useful in simulation contexts, where the tree holds all 78incoming events, and the "win" condition means the smallest scheduled 79time. When an event schedule other events for execution, they are 80scheduled into the future, so they can easily go into the heap. So, a 81heap is a good structure for implementing schedulers (this is what I 82used for my MIDI sequencer :-). 83 84Various structures for implementing schedulers have been extensively 85studied, and heaps are good for this, as they are reasonably speedy, 86the speed is almost constant, and the worst case is not much different 87than the average case. However, there are other representations which 88are more efficient overall, yet the worst cases might be terrible. 89 90Heaps are also very useful in big disk sorts. You most probably all 91know that a big sort implies producing "runs" (which are pre-sorted 92sequences, which size is usually related to the amount of CPU memory), 93followed by a merging passes for these runs, which merging is often 94very cleverly organised[1]. It is very important that the initial 95sort produces the longest runs possible. Tournaments are a good way 96to that. If, using all the memory available to hold a tournament, you 97replace and percolate items that happen to fit the current run, you'll 98produce runs which are twice the size of the memory for random input, 99and much better for input fuzzily ordered. 100 101Moreover, if you output the 0'th item on disk and get an input which 102may not fit in the current tournament (because the value "wins" over 103the last output value), it cannot fit in the heap, so the size of the 104heap decreases. The freed memory could be cleverly reused immediately 105for progressively building a second heap, which grows at exactly the 106same rate the first heap is melting. When the first heap completely 107vanishes, you switch heaps and start a new run. Clever and quite 108effective! 109 110In a word, heaps are useful memory structures to know. I use them in 111a few applications, and I think it is good to keep a `heap' module 112around. :-) 113 114-------------------- 115[1] The disk balancing algorithms which are current, nowadays, are 116more annoying than clever, and this is a consequence of the seeking 117capabilities of the disks. On devices which cannot seek, like big 118tape drives, the story was quite different, and one had to be very 119clever to ensure (far in advance) that each tape movement will be the 120most effective possible (that is, will best participate at 121"progressing" the merge). Some tapes were even able to read 122backwards, and this was also used to avoid the rewinding time. 123Believe me, real good tape sorts were quite spectacular to watch! 124From all times, sorting has always been a Great Art! :-) 125""" 126 127def heappush(heap, item): 128 """Push item onto heap, maintaining the heap invariant.""" 129 pos = len(heap) 130 heap.append(None) 131 while pos: 132 parentpos = (pos - 1) >> 1 133 parent = heap[parentpos] 134 if item >= parent: 135 break 136 heap[pos] = parent 137 pos = parentpos 138 heap[pos] = item 139 140# The child indices of heap index pos are already heaps, and we want to make 141# a heap at index pos too. 142def _siftdown(heap, pos): 143 endpos = len(heap) 144 assert pos < endpos 145 item = heap[pos] 146 # Sift item into position, down from pos, moving the smaller 147 # child up, until finding pos such that item <= pos's children. 148 childpos = 2*pos + 1 # leftmost child position 149 while childpos < endpos: 150 # Set childpos and child to reflect smaller child. 151 child = heap[childpos] 152 rightpos = childpos + 1 153 if rightpos < endpos: 154 rightchild = heap[rightpos] 155 if rightchild < child: 156 childpos = rightpos 157 child = rightchild 158 # If item is no larger than smaller child, we're done, else 159 # move the smaller child up. 160 if item <= child: 161 break 162 heap[pos] = child 163 pos = childpos 164 childpos = 2*pos + 1 165 heap[pos] = item 166 167def heappop(heap): 168 """Pop the smallest item off the heap, maintaining the heap invariant.""" 169 lastelt = heap.pop() # raises appropriate IndexError if heap is empty 170 if heap: 171 returnitem = heap[0] 172 heap[0] = lastelt 173 _siftdown(heap, 0) 174 else: 175 returnitem = lastelt 176 return returnitem 177 178def heapify(heap): 179 """Transform list heap into a heap, in-place, in O(len(heap)) time.""" 180 n = len(heap) 181 # Transform bottom-up. The largest index there's any point to looking at 182 # is the largest with a child index in-range, so must have 2*i + 1 < n, 183 # or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so 184 # j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is 185 # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1. 186 for i in xrange(n//2 - 1, -1, -1): 187 _siftdown(heap, i) 188 189if __name__ == "__main__": 190 # Simple sanity test 191 heap = [] 192 data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0] 193 for item in data: 194 heappush(heap, item) 195 sort = [] 196 while heap: 197 sort.append(heappop(heap)) 198 print sort 199