183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# -*- coding: latin-1 -*- 283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh"""Heap queue algorithm (a.k.a. priority queue). 483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehHeaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for 683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehall k, counting elements from 0. For the sake of comparison, 783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehnon-existing elements are considered to be infinite. The interesting 883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehproperty of a heap is that a[0] is always its smallest element. 983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 1083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehUsage: 1183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 1283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehheap = [] # creates an empty heap 1383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehheappush(heap, item) # pushes a new item on the heap 1483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehitem = heappop(heap) # pops the smallest item from the heap 1583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehitem = heap[0] # smallest item on the heap without popping it 1683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehheapify(x) # transforms list into a heap, in-place, in linear time 1783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehitem = heapreplace(heap, item) # pops and returns smallest item, and adds 1883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # new item; the heap size is unchanged 1983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 2083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehOur API differs from textbook heap algorithms as follows: 2183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 2283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh- We use 0-based indexing. This makes the relationship between the 2383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh index for a node and the indexes for its children slightly less 2483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh obvious, but is more suitable since Python uses 0-based indexing. 2583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 2683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh- Our heappop() method returns the smallest item, not the largest. 2783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 2883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehThese two make it possible to view the heap as a regular Python list 2983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehwithout surprises: heap[0] is the smallest item, and heap.sort() 3083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehmaintains the heap invariant! 3183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh""" 3283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 3383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger 3483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 3583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh__about__ = """Heap queues 3683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 3783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh[explanation by Fran�ois Pinard] 3883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 3983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehHeaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for 4083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehall k, counting elements from 0. For the sake of comparison, 4183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehnon-existing elements are considered to be infinite. The interesting 4283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehproperty of a heap is that a[0] is always its smallest element. 4383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 4483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehThe strange invariant above is meant to be an efficient memory 4583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehrepresentation for a tournament. The numbers below are `k', not a[k]: 4683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 4783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 0 4883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 4983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 1 2 5083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 5183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 3 4 5 6 5283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 5383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 7 8 9 10 11 12 13 14 5483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 5583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 5683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 5783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 5883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehIn the tree above, each cell `k' is topping `2*k+1' and `2*k+2'. In 5983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehan usual binary tournament we see in sports, each cell is the winner 6083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehover the two cells it tops, and we can trace the winner down the tree 6183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehto see all opponents s/he had. However, in many computer applications 6283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehof such tournaments, we do not need to trace the history of a winner. 6383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehTo be more memory efficient, when a winner is promoted, we try to 6483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehreplace it by something else at a lower level, and the rule becomes 6583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehthat a cell and the two cells it tops contain three different items, 6683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehbut the top cell "wins" over the two topped cells. 6783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 6883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehIf this heap invariant is protected at all time, index 0 is clearly 6983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehthe overall winner. The simplest algorithmic way to remove it and 7083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehfind the "next" winner is to move some loser (let's say cell 30 in the 7183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdiagram above) into the 0 position, and then percolate this new 0 down 7283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehthe tree, exchanging values, until the invariant is re-established. 7383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehThis is clearly logarithmic on the total number of items in the tree. 7483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehBy iterating over all items, you get an O(n ln n) sort. 7583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 7683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehA nice feature of this sort is that you can efficiently insert new 7783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehitems while the sort is going on, provided that the inserted items are 7883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehnot "better" than the last 0'th element you extracted. This is 7983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehespecially useful in simulation contexts, where the tree holds all 8083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehincoming events, and the "win" condition means the smallest scheduled 8183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehtime. When an event schedule other events for execution, they are 8283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehscheduled into the future, so they can easily go into the heap. So, a 8383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehheap is a good structure for implementing schedulers (this is what I 8483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehused for my MIDI sequencer :-). 8583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 8683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehVarious structures for implementing schedulers have been extensively 8783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehstudied, and heaps are good for this, as they are reasonably speedy, 8883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehthe speed is almost constant, and the worst case is not much different 8983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehthan the average case. However, there are other representations which 9083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehare more efficient overall, yet the worst cases might be terrible. 9183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 9283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehHeaps are also very useful in big disk sorts. You most probably all 9383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehknow that a big sort implies producing "runs" (which are pre-sorted 9483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehsequences, which size is usually related to the amount of CPU memory), 9583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehfollowed by a merging passes for these runs, which merging is often 9683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehvery cleverly organised[1]. It is very important that the initial 9783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehsort produces the longest runs possible. Tournaments are a good way 9883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehto that. If, using all the memory available to hold a tournament, you 9983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehreplace and percolate items that happen to fit the current run, you'll 10083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehproduce runs which are twice the size of the memory for random input, 10183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehand much better for input fuzzily ordered. 10283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 10383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehMoreover, if you output the 0'th item on disk and get an input which 10483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehmay not fit in the current tournament (because the value "wins" over 10583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehthe last output value), it cannot fit in the heap, so the size of the 10683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehheap decreases. The freed memory could be cleverly reused immediately 10783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehfor progressively building a second heap, which grows at exactly the 10883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehsame rate the first heap is melting. When the first heap completely 10983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehvanishes, you switch heaps and start a new run. Clever and quite 11083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieheffective! 11183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 11283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehIn a word, heaps are useful memory structures to know. I use them in 11383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieha few applications, and I think it is good to keep a `heap' module 11483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieharound. :-) 11583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 11683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh-------------------- 11783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh[1] The disk balancing algorithms which are current, nowadays, are 11883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehmore annoying than clever, and this is a consequence of the seeking 11983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehcapabilities of the disks. On devices which cannot seek, like big 12083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehtape drives, the story was quite different, and one had to be very 12183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehclever to ensure (far in advance) that each tape movement will be the 12283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehmost effective possible (that is, will best participate at 12383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh"progressing" the merge). Some tapes were even able to read 12483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehbackwards, and this was also used to avoid the rewinding time. 12583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehBelieve me, real good tape sorts were quite spectacular to watch! 12683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew HsiehFrom all times, sorting has always been a Great Art! :-) 12783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh""" 12883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 12983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh__all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge', 13083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 'nlargest', 'nsmallest', 'heappushpop'] 13183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 13283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehfrom itertools import islice, count, imap, izip, tee, chain 13383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehfrom operator import itemgetter 13483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 13583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef cmp_lt(x, y): 13683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Use __lt__ if available; otherwise, try __le__. 13783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # In Py3.x, only __lt__ will be called. 13883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return (x < y) if hasattr(x, '__lt__') else (not y <= x) 13983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 14083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef heappush(heap, item): 14183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """Push item onto heap, maintaining the heap invariant.""" 14283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heap.append(item) 14383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _siftdown(heap, 0, len(heap)-1) 14483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 14583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef heappop(heap): 14683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """Pop the smallest item off the heap, maintaining the heap invariant.""" 14783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh lastelt = heap.pop() # raises appropriate IndexError if heap is empty 14883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if heap: 14983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh returnitem = heap[0] 15083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heap[0] = lastelt 15183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _siftup(heap, 0) 15283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh else: 15383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh returnitem = lastelt 15483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return returnitem 15583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 15683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef heapreplace(heap, item): 15783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """Pop and return the current smallest value, and add the new item. 15883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 15983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh This is more efficient than heappop() followed by heappush(), and can be 16083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh more appropriate when using a fixed-size heap. Note that the value 16183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh returned may be larger than item! That constrains reasonable uses of 16283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh this routine unless written as part of a conditional replacement: 16383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 16483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if item > heap[0]: 16583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh item = heapreplace(heap, item) 16683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """ 16783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh returnitem = heap[0] # raises appropriate IndexError if heap is empty 16883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heap[0] = item 16983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _siftup(heap, 0) 17083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return returnitem 17183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 17283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef heappushpop(heap, item): 17383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """Fast version of a heappush followed by a heappop.""" 17483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if heap and cmp_lt(heap[0], item): 17583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh item, heap[0] = heap[0], item 17683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _siftup(heap, 0) 17783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return item 17883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 17983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef heapify(x): 18083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """Transform list into a heap, in-place, in O(len(x)) time.""" 18183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh n = len(x) 18283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Transform bottom-up. The largest index there's any point to looking at 18383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # is the largest with a child index in-range, so must have 2*i + 1 < n, 18483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so 18583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is 18683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1. 18783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh for i in reversed(xrange(n//2)): 18883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _siftup(x, i) 18983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 19083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef _heappushpop_max(heap, item): 19183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """Maxheap version of a heappush followed by a heappop.""" 19283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if heap and cmp_lt(item, heap[0]): 19383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh item, heap[0] = heap[0], item 19483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _siftup_max(heap, 0) 19583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return item 19683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 19783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef _heapify_max(x): 19883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """Transform list into a maxheap, in-place, in O(len(x)) time.""" 19983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh n = len(x) 20083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh for i in reversed(range(n//2)): 20183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _siftup_max(x, i) 20283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 20383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef nlargest(n, iterable): 20483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """Find the n largest elements in a dataset. 20583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 20683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh Equivalent to: sorted(iterable, reverse=True)[:n] 20783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """ 20883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if n < 0: 20983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return [] 21083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh it = iter(iterable) 21183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh result = list(islice(it, n)) 21283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if not result: 21383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return result 21483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heapify(result) 21583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _heappushpop = heappushpop 21683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh for elem in it: 21783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _heappushpop(result, elem) 21883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh result.sort(reverse=True) 21983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return result 22083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 22183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef nsmallest(n, iterable): 22283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """Find the n smallest elements in a dataset. 22383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 22483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh Equivalent to: sorted(iterable)[:n] 22583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """ 22683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if n < 0: 22783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return [] 22883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh it = iter(iterable) 22983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh result = list(islice(it, n)) 23083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if not result: 23183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return result 23283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _heapify_max(result) 23383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _heappushpop = _heappushpop_max 23483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh for elem in it: 23583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _heappushpop(result, elem) 23683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh result.sort() 23783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return result 23883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 23983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# 'heap' is a heap at all indices >= startpos, except possibly for pos. pos 24083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# is the index of a leaf with a possibly out-of-order value. Restore the 24183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# heap invariant. 24283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef _siftdown(heap, startpos, pos): 24383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh newitem = heap[pos] 24483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Follow the path to the root, moving parents down until finding a place 24583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # newitem fits. 24683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh while pos > startpos: 24783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh parentpos = (pos - 1) >> 1 24883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh parent = heap[parentpos] 24983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if cmp_lt(newitem, parent): 25083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heap[pos] = parent 25183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh pos = parentpos 25283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh continue 25383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh break 25483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heap[pos] = newitem 25583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 25683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# The child indices of heap index pos are already heaps, and we want to make 25783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# a heap at index pos too. We do this by bubbling the smaller child of 25883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# pos up (and so on with that child's children, etc) until hitting a leaf, 25983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# then using _siftdown to move the oddball originally at index pos into place. 26083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# 26183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# We *could* break out of the loop as soon as we find a pos where newitem <= 26283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# both its children, but turns out that's not a good idea, and despite that 26383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# many books write the algorithm that way. During a heap pop, the last array 26483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# element is sifted in, and that tends to be large, so that comparing it 26583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# against values starting from the root usually doesn't pay (= usually doesn't 26683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# get us out of the loop early). See Knuth, Volume 3, where this is 26783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# explained and quantified in an exercise. 26883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# 26983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# Cutting the # of comparisons is important, since these routines have no 27083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# way to extract "the priority" from an array element, so that intelligence 27183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# is likely to be hiding in custom __cmp__ methods, or in array elements 27283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# storing (priority, record) tuples. Comparisons are thus potentially 27383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# expensive. 27483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# 27583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# On random arrays of length 1000, making this change cut the number of 27683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# comparisons made by heapify() a little, and those made by exhaustive 27783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# heappop() a lot, in accord with theory. Here are typical results from 3 27883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# runs (3 just to demonstrate how small the variance is): 27983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# 28083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# Compares needed by heapify Compares needed by 1000 heappops 28183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# -------------------------- -------------------------------- 28283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# 1837 cut to 1663 14996 cut to 8680 28383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# 1855 cut to 1659 14966 cut to 8678 28483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# 1847 cut to 1660 15024 cut to 8703 28583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# 28683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# Building the heap by using heappush() 1000 times instead required 28783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# 2198, 2148, and 2219 compares: heapify() is more efficient, when 28883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# you can use it. 28983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# 29083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# The total compares needed by list.sort() on the same lists were 8627, 29183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# 8627, and 8632 (this should be compared to the sum of heapify() and 29283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# heappop() compares): list.sort() is (unsurprisingly!) more efficient 29383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# for sorting. 29483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 29583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef _siftup(heap, pos): 29683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh endpos = len(heap) 29783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh startpos = pos 29883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh newitem = heap[pos] 29983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Bubble up the smaller child until hitting a leaf. 30083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh childpos = 2*pos + 1 # leftmost child position 30183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh while childpos < endpos: 30283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Set childpos to index of smaller child. 30383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh rightpos = childpos + 1 30483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if rightpos < endpos and not cmp_lt(heap[childpos], heap[rightpos]): 30583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh childpos = rightpos 30683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Move the smaller child up. 30783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heap[pos] = heap[childpos] 30883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh pos = childpos 30983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh childpos = 2*pos + 1 31083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # The leaf at pos is empty now. Put newitem there, and bubble it up 31183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # to its final resting place (by sifting its parents down). 31283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heap[pos] = newitem 31383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _siftdown(heap, startpos, pos) 31483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 31583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef _siftdown_max(heap, startpos, pos): 31683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 'Maxheap variant of _siftdown' 31783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh newitem = heap[pos] 31883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Follow the path to the root, moving parents down until finding a place 31983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # newitem fits. 32083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh while pos > startpos: 32183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh parentpos = (pos - 1) >> 1 32283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh parent = heap[parentpos] 32383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if cmp_lt(parent, newitem): 32483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heap[pos] = parent 32583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh pos = parentpos 32683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh continue 32783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh break 32883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heap[pos] = newitem 32983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 33083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef _siftup_max(heap, pos): 33183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 'Maxheap variant of _siftup' 33283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh endpos = len(heap) 33383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh startpos = pos 33483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh newitem = heap[pos] 33583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Bubble up the larger child until hitting a leaf. 33683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh childpos = 2*pos + 1 # leftmost child position 33783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh while childpos < endpos: 33883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Set childpos to index of larger child. 33983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh rightpos = childpos + 1 34083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if rightpos < endpos and not cmp_lt(heap[rightpos], heap[childpos]): 34183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh childpos = rightpos 34283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Move the larger child up. 34383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heap[pos] = heap[childpos] 34483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh pos = childpos 34583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh childpos = 2*pos + 1 34683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # The leaf at pos is empty now. Put newitem there, and bubble it up 34783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # to its final resting place (by sifting its parents down). 34883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heap[pos] = newitem 34983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _siftdown_max(heap, startpos, pos) 35083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 35183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# If available, use C implementation 35283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehtry: 35383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh from _heapq import * 35483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehexcept ImportError: 35583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh pass 35683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 35783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef merge(*iterables): 35883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh '''Merge multiple sorted inputs into a single sorted output. 35983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 36083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh Similar to sorted(itertools.chain(*iterables)) but returns a generator, 36183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh does not pull the data into memory all at once, and assumes that each of 36283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh the input streams is already sorted (smallest to largest). 36383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 36483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh >>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 36583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh [0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25] 36683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 36783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh ''' 36883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _heappop, _heapreplace, _StopIteration = heappop, heapreplace, StopIteration 36983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 37083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh h = [] 37183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh h_append = h.append 37283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh for itnum, it in enumerate(map(iter, iterables)): 37383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh try: 37483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh next = it.next 37583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh h_append([next(), itnum, next]) 37683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh except _StopIteration: 37783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh pass 37883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heapify(h) 37983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 38083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh while 1: 38183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh try: 38283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh while 1: 38383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh v, itnum, next = s = h[0] # raises IndexError when h is empty 38483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh yield v 38583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh s[0] = next() # raises StopIteration when exhausted 38683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _heapreplace(h, s) # restore heap condition 38783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh except _StopIteration: 38883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh _heappop(h) # remove empty iterator 38983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh except IndexError: 39083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return 39183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 39283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh# Extend the implementations of nsmallest and nlargest to use a key= argument 39383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh_nsmallest = nsmallest 39483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef nsmallest(n, iterable, key=None): 39583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """Find the n smallest elements in a dataset. 39683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 39783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh Equivalent to: sorted(iterable, key=key)[:n] 39883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """ 39983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Short-cut for n==1 is to use min() when len(iterable)>0 40083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if n == 1: 40183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh it = iter(iterable) 40283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh head = list(islice(it, 1)) 40383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if not head: 40483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return [] 40583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if key is None: 40683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return [min(chain(head, it))] 40783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return [min(chain(head, it), key=key)] 40883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 40983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # When n>=size, it's faster to use sorted() 41083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh try: 41183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh size = len(iterable) 41283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh except (TypeError, AttributeError): 41383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh pass 41483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh else: 41583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if n >= size: 41683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return sorted(iterable, key=key)[:n] 41783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 41883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # When key is none, use simpler decoration 41983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if key is None: 42083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh it = izip(iterable, count()) # decorate 42183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh result = _nsmallest(n, it) 42283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return map(itemgetter(0), result) # undecorate 42383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 42483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # General case, slowest method 42583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh in1, in2 = tee(iterable) 42683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh it = izip(imap(key, in1), count(), in2) # decorate 42783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh result = _nsmallest(n, it) 42883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return map(itemgetter(2), result) # undecorate 42983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 43083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh_nlargest = nlargest 43183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehdef nlargest(n, iterable, key=None): 43283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """Find the n largest elements in a dataset. 43383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 43483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh Equivalent to: sorted(iterable, key=key, reverse=True)[:n] 43583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh """ 43683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 43783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Short-cut for n==1 is to use max() when len(iterable)>0 43883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if n == 1: 43983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh it = iter(iterable) 44083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh head = list(islice(it, 1)) 44183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if not head: 44283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return [] 44383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if key is None: 44483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return [max(chain(head, it))] 44583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return [max(chain(head, it), key=key)] 44683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 44783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # When n>=size, it's faster to use sorted() 44883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh try: 44983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh size = len(iterable) 45083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh except (TypeError, AttributeError): 45183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh pass 45283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh else: 45383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if n >= size: 45483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return sorted(iterable, key=key, reverse=True)[:n] 45583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 45683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # When key is none, use simpler decoration 45783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh if key is None: 45883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh it = izip(iterable, count(0,-1)) # decorate 45983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh result = _nlargest(n, it) 46083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return map(itemgetter(0), result) # undecorate 46183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 46283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # General case, slowest method 46383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh in1, in2 = tee(iterable) 46483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh it = izip(imap(key, in1), count(0,-1), in2) # decorate 46583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh result = _nlargest(n, it) 46683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh return map(itemgetter(2), result) # undecorate 46783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 46883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsiehif __name__ == "__main__": 46983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh # Simple sanity test 47083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heap = [] 47183760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0] 47283760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh for item in data: 47383760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh heappush(heap, item) 47483760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh sort = [] 47583760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh while heap: 47683760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh sort.append(heappop(heap)) 47783760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh print sort 47883760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh 47983760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh import doctest 48083760d213fb3bec7b4117d266fcfbf6fe2ba14abAndrew Hsieh doctest.testmod() 481