heapq.py revision 9b342c6fd4455aa5ee988007a0cac09032b3219c
1b6e112bd952c2023b95212364ed07ad9c235da41Benjamin Peterson# -*- coding: latin-1 -*- 2c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 3c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger"""Heap queue algorithm (a.k.a. priority queue). 4c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 5c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerHeaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for 6c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerall k, counting elements from 0. For the sake of comparison, 7c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingernon-existing elements are considered to be infinite. The interesting 8c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerproperty of a heap is that a[0] is always its smallest element. 9c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 10c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerUsage: 11c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 12c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerheap = [] # creates an empty heap 13c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerheappush(heap, item) # pushes a new item on the heap 14c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingeritem = heappop(heap) # pops the smallest item from the heap 15c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingeritem = heap[0] # smallest item on the heap without popping it 16c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerheapify(x) # transforms list into a heap, in-place, in linear time 17c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingeritem = heapreplace(heap, item) # pops and returns smallest item, and adds 18c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # new item; the heap size is unchanged 19c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 20c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerOur API differs from textbook heap algorithms as follows: 21c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 22c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger- We use 0-based indexing. This makes the relationship between the 23c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger index for a node and the indexes for its children slightly less 24c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger obvious, but is more suitable since Python uses 0-based indexing. 25c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 26c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger- Our heappop() method returns the smallest item, not the largest. 27c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 28c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerThese two make it possible to view the heap as a regular Python list 29c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerwithout surprises: heap[0] is the smallest item, and heap.sort() 30c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingermaintains the heap invariant! 31c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger""" 32c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 3333ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger# Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger 34c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 35c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger__about__ = """Heap queues 36c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 37c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger[explanation by Fran�ois Pinard] 38c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 39c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerHeaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for 40c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerall k, counting elements from 0. For the sake of comparison, 41c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingernon-existing elements are considered to be infinite. The interesting 42c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerproperty of a heap is that a[0] is always its smallest element. 43c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 44c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerThe strange invariant above is meant to be an efficient memory 45c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerrepresentation for a tournament. The numbers below are `k', not a[k]: 46c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 47c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 0 48c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 49c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 1 2 50c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 51c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 3 4 5 6 52c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 53c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 7 8 9 10 11 12 13 14 54c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 55c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 56c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 57c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 58c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerIn the tree above, each cell `k' is topping `2*k+1' and `2*k+2'. In 59c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingeran usual binary tournament we see in sports, each cell is the winner 60c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerover the two cells it tops, and we can trace the winner down the tree 61c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerto see all opponents s/he had. However, in many computer applications 62c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerof such tournaments, we do not need to trace the history of a winner. 63c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerTo be more memory efficient, when a winner is promoted, we try to 64c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerreplace it by something else at a lower level, and the rule becomes 65c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerthat a cell and the two cells it tops contain three different items, 66c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerbut the top cell "wins" over the two topped cells. 67c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 68c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerIf this heap invariant is protected at all time, index 0 is clearly 69c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerthe overall winner. The simplest algorithmic way to remove it and 70c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerfind the "next" winner is to move some loser (let's say cell 30 in the 71c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerdiagram above) into the 0 position, and then percolate this new 0 down 72c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerthe tree, exchanging values, until the invariant is re-established. 73c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerThis is clearly logarithmic on the total number of items in the tree. 74c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerBy iterating over all items, you get an O(n ln n) sort. 75c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 76c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerA nice feature of this sort is that you can efficiently insert new 77c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingeritems while the sort is going on, provided that the inserted items are 78c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingernot "better" than the last 0'th element you extracted. This is 79c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerespecially useful in simulation contexts, where the tree holds all 80c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerincoming events, and the "win" condition means the smallest scheduled 81c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingertime. When an event schedule other events for execution, they are 82c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerscheduled into the future, so they can easily go into the heap. So, a 83c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerheap is a good structure for implementing schedulers (this is what I 84c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerused for my MIDI sequencer :-). 85c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 86c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerVarious structures for implementing schedulers have been extensively 87c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerstudied, and heaps are good for this, as they are reasonably speedy, 88c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerthe speed is almost constant, and the worst case is not much different 89c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerthan the average case. However, there are other representations which 90c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerare more efficient overall, yet the worst cases might be terrible. 91c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 92c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerHeaps are also very useful in big disk sorts. You most probably all 93c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerknow that a big sort implies producing "runs" (which are pre-sorted 94c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingersequences, which size is usually related to the amount of CPU memory), 95c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerfollowed by a merging passes for these runs, which merging is often 96c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingervery cleverly organised[1]. It is very important that the initial 97c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingersort produces the longest runs possible. Tournaments are a good way 98c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerto that. If, using all the memory available to hold a tournament, you 99c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerreplace and percolate items that happen to fit the current run, you'll 100c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerproduce runs which are twice the size of the memory for random input, 101c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerand much better for input fuzzily ordered. 102c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 103c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerMoreover, if you output the 0'th item on disk and get an input which 104c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingermay not fit in the current tournament (because the value "wins" over 105c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerthe last output value), it cannot fit in the heap, so the size of the 106c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerheap decreases. The freed memory could be cleverly reused immediately 107c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerfor progressively building a second heap, which grows at exactly the 108c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingersame rate the first heap is melting. When the first heap completely 109c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingervanishes, you switch heaps and start a new run. Clever and quite 110c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingereffective! 111c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 112c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerIn a word, heaps are useful memory structures to know. I use them in 113c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingera few applications, and I think it is good to keep a `heap' module 114c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingeraround. :-) 115c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 116c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger-------------------- 117c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger[1] The disk balancing algorithms which are current, nowadays, are 118c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingermore annoying than clever, and this is a consequence of the seeking 119c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingercapabilities of the disks. On devices which cannot seek, like big 120c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingertape drives, the story was quite different, and one had to be very 121c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerclever to ensure (far in advance) that each tape movement will be the 122c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingermost effective possible (that is, will best participate at 123c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger"progressing" the merge). Some tapes were even able to read 124c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerbackwards, and this was also used to avoid the rewinding time. 125c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerBelieve me, real good tape sorts were quite spectacular to watch! 126c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond HettingerFrom all times, sorting has always been a Great Art! :-) 127c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger""" 128c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 12900166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger__all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge', 13053bdf093437349907807da9143f9c2bdcea9ab3aRaymond Hettinger 'nlargest', 'nsmallest', 'heappushpop'] 13133ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger 132b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettingerfrom itertools import islice, repeat, count, imap, izip, tee, chain 133be9b765c073eefcc109320b651d977ff03090f2fRaymond Hettingerfrom operator import itemgetter 134b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettingerimport bisect 135c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 1369b342c6fd4455aa5ee988007a0cac09032b3219cRaymond Hettingerdef cmp_lt(x, y): 1379b342c6fd4455aa5ee988007a0cac09032b3219cRaymond Hettinger # Use __lt__ if available; otherwise, try __le__. 1389b342c6fd4455aa5ee988007a0cac09032b3219cRaymond Hettinger # In Py3.x, only __lt__ will be called. 1399b342c6fd4455aa5ee988007a0cac09032b3219cRaymond Hettinger return (x < y) if hasattr(x, '__lt__') else (not y <= x) 1409b342c6fd4455aa5ee988007a0cac09032b3219cRaymond Hettinger 141c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerdef heappush(heap, item): 142c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger """Push item onto heap, maintaining the heap invariant.""" 143c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger heap.append(item) 144c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger _siftdown(heap, 0, len(heap)-1) 145c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 146c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerdef heappop(heap): 147c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger """Pop the smallest item off the heap, maintaining the heap invariant.""" 148c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger lastelt = heap.pop() # raises appropriate IndexError if heap is empty 149c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger if heap: 150c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger returnitem = heap[0] 151c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger heap[0] = lastelt 152c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger _siftup(heap, 0) 153c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger else: 154c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger returnitem = lastelt 155c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger return returnitem 156c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 157c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerdef heapreplace(heap, item): 158c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger """Pop and return the current smallest value, and add the new item. 159c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 160c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger This is more efficient than heappop() followed by heappush(), and can be 161c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger more appropriate when using a fixed-size heap. Note that the value 162c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger returned may be larger than item! That constrains reasonable uses of 1638158e849305d0e0ab3e19cdc93a86bb7d5fc0651Raymond Hettinger this routine unless written as part of a conditional replacement: 16428224f897a1849dd616ad82538bdda45f3351d42Raymond Hettinger 1658158e849305d0e0ab3e19cdc93a86bb7d5fc0651Raymond Hettinger if item > heap[0]: 1668158e849305d0e0ab3e19cdc93a86bb7d5fc0651Raymond Hettinger item = heapreplace(heap, item) 167c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger """ 168c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger returnitem = heap[0] # raises appropriate IndexError if heap is empty 169c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger heap[0] = item 170c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger _siftup(heap, 0) 171c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger return returnitem 172c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 17353bdf093437349907807da9143f9c2bdcea9ab3aRaymond Hettingerdef heappushpop(heap, item): 17453bdf093437349907807da9143f9c2bdcea9ab3aRaymond Hettinger """Fast version of a heappush followed by a heappop.""" 1759b342c6fd4455aa5ee988007a0cac09032b3219cRaymond Hettinger if heap and cmp_lt(heap[0], item): 17653bdf093437349907807da9143f9c2bdcea9ab3aRaymond Hettinger item, heap[0] = heap[0], item 17753bdf093437349907807da9143f9c2bdcea9ab3aRaymond Hettinger _siftup(heap, 0) 17853bdf093437349907807da9143f9c2bdcea9ab3aRaymond Hettinger return item 17953bdf093437349907807da9143f9c2bdcea9ab3aRaymond Hettinger 180c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerdef heapify(x): 181c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger """Transform list into a heap, in-place, in O(len(heap)) time.""" 182c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger n = len(x) 183c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # Transform bottom-up. The largest index there's any point to looking at 184c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # is the largest with a child index in-range, so must have 2*i + 1 < n, 185c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so 186c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is 187c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1. 188c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger for i in reversed(xrange(n//2)): 189c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger _siftup(x, i) 190c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 191e1defa4175426594be53c1bc6c3d2f02a0952baeRaymond Hettingerdef nlargest(n, iterable): 19233ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger """Find the n largest elements in a dataset. 19333ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger 19433ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger Equivalent to: sorted(iterable, reverse=True)[:n] 19533ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger """ 19633ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger it = iter(iterable) 19733ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger result = list(islice(it, n)) 19833ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger if not result: 19933ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger return result 20033ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger heapify(result) 20183aa6a3b1a6a406b3cde2fa7daa5d5b1db0cc6a7Raymond Hettinger _heappushpop = heappushpop 20233ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger for elem in it: 203fb921e2c0034e00fcc75fa23358cab4c48f6d450Benjamin Peterson _heappushpop(result, elem) 20433ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger result.sort(reverse=True) 20533ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger return result 20633ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger 207e1defa4175426594be53c1bc6c3d2f02a0952baeRaymond Hettingerdef nsmallest(n, iterable): 20833ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger """Find the n smallest elements in a dataset. 20933ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger 21033ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger Equivalent to: sorted(iterable)[:n] 21133ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger """ 212b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger if hasattr(iterable, '__len__') and n * 10 <= len(iterable): 213b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger # For smaller values of n, the bisect method is faster than a minheap. 214b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger # It is also memory efficient, consuming only n elements of space. 215b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger it = iter(iterable) 216b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger result = sorted(islice(it, 0, n)) 217b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger if not result: 218b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger return result 219b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger insort = bisect.insort 220b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger pop = result.pop 221b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger los = result[-1] # los --> Largest of the nsmallest 222b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger for elem in it: 2239b342c6fd4455aa5ee988007a0cac09032b3219cRaymond Hettinger if cmp_lt(elem, los): 2249b342c6fd4455aa5ee988007a0cac09032b3219cRaymond Hettinger insort(result, elem) 2259b342c6fd4455aa5ee988007a0cac09032b3219cRaymond Hettinger pop() 2269b342c6fd4455aa5ee988007a0cac09032b3219cRaymond Hettinger los = result[-1] 227b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger return result 228b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger # An alternative approach manifests the whole iterable in memory but 229b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger # saves comparisons by heapifying all at once. Also, saves time 230b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger # over bisect.insort() which has O(n) data movement time for every 231b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger # insertion. Finding the n smallest of an m length iterable requires 232b25aa36f83a3cd2200f2bc479e594458e27794a3Raymond Hettinger # O(m) + O(n log m) comparisons. 23333ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger h = list(iterable) 23433ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger heapify(h) 23533ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger return map(heappop, repeat(h, min(n, len(h)))) 23633ecffb65ae43ece95e4d828f95819395187d579Raymond Hettinger 237c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# 'heap' is a heap at all indices >= startpos, except possibly for pos. pos 238c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# is the index of a leaf with a possibly out-of-order value. Restore the 239c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# heap invariant. 240c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerdef _siftdown(heap, startpos, pos): 241c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger newitem = heap[pos] 242c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # Follow the path to the root, moving parents down until finding a place 243c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # newitem fits. 244c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger while pos > startpos: 245c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger parentpos = (pos - 1) >> 1 246c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger parent = heap[parentpos] 2479b342c6fd4455aa5ee988007a0cac09032b3219cRaymond Hettinger if cmp_lt(newitem, parent): 2486d7702ecd11e067f72029244b3f7aa8baa3ed2fcRaymond Hettinger heap[pos] = parent 2496d7702ecd11e067f72029244b3f7aa8baa3ed2fcRaymond Hettinger pos = parentpos 2506d7702ecd11e067f72029244b3f7aa8baa3ed2fcRaymond Hettinger continue 2516d7702ecd11e067f72029244b3f7aa8baa3ed2fcRaymond Hettinger break 252c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger heap[pos] = newitem 253c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 254c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# The child indices of heap index pos are already heaps, and we want to make 255c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# a heap at index pos too. We do this by bubbling the smaller child of 256c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# pos up (and so on with that child's children, etc) until hitting a leaf, 257c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# then using _siftdown to move the oddball originally at index pos into place. 258c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# 259c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# We *could* break out of the loop as soon as we find a pos where newitem <= 260c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# both its children, but turns out that's not a good idea, and despite that 261c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# many books write the algorithm that way. During a heap pop, the last array 262c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# element is sifted in, and that tends to be large, so that comparing it 263c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# against values starting from the root usually doesn't pay (= usually doesn't 264c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# get us out of the loop early). See Knuth, Volume 3, where this is 265c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# explained and quantified in an exercise. 266c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# 267c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# Cutting the # of comparisons is important, since these routines have no 268c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# way to extract "the priority" from an array element, so that intelligence 269c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# is likely to be hiding in custom __cmp__ methods, or in array elements 270c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# storing (priority, record) tuples. Comparisons are thus potentially 271c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# expensive. 272c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# 273c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# On random arrays of length 1000, making this change cut the number of 274c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# comparisons made by heapify() a little, and those made by exhaustive 275c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# heappop() a lot, in accord with theory. Here are typical results from 3 276c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# runs (3 just to demonstrate how small the variance is): 277c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# 278c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# Compares needed by heapify Compares needed by 1000 heappops 279c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# -------------------------- -------------------------------- 280c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# 1837 cut to 1663 14996 cut to 8680 281c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# 1855 cut to 1659 14966 cut to 8678 282c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# 1847 cut to 1660 15024 cut to 8703 283c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# 284c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# Building the heap by using heappush() 1000 times instead required 285c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# 2198, 2148, and 2219 compares: heapify() is more efficient, when 286c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# you can use it. 287c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# 288c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# The total compares needed by list.sort() on the same lists were 8627, 289c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# 8627, and 8632 (this should be compared to the sum of heapify() and 290c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# heappop() compares): list.sort() is (unsurprisingly!) more efficient 291c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# for sorting. 292c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 293c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerdef _siftup(heap, pos): 294c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger endpos = len(heap) 295c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger startpos = pos 296c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger newitem = heap[pos] 297c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # Bubble up the smaller child until hitting a leaf. 298c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger childpos = 2*pos + 1 # leftmost child position 299c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger while childpos < endpos: 300c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # Set childpos to index of smaller child. 301c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger rightpos = childpos + 1 3029b342c6fd4455aa5ee988007a0cac09032b3219cRaymond Hettinger if rightpos < endpos and not cmp_lt(heap[childpos], heap[rightpos]): 303c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger childpos = rightpos 304c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # Move the smaller child up. 305c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger heap[pos] = heap[childpos] 306c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger pos = childpos 307c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger childpos = 2*pos + 1 308c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # The leaf at pos is empty now. Put newitem there, and bubble it up 309c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # to its final resting place (by sifting its parents down). 310c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger heap[pos] = newitem 311c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger _siftdown(heap, startpos, pos) 312c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 313c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger# If available, use C implementation 314c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingertry: 315b006fcc30ce0277e5fa45fbab059cc7d3154bbccRaymond Hettinger from _heapq import * 316c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerexcept ImportError: 317c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger pass 318c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger 31900166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettingerdef merge(*iterables): 32000166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger '''Merge multiple sorted inputs into a single sorted output. 32100166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger 3223035d2397f4a6d028d7b1f87563c649457d5cbb4Raymond Hettinger Similar to sorted(itertools.chain(*iterables)) but returns a generator, 323cbac8ce5b0394fe68329ac839a07474969dd7493Raymond Hettinger does not pull the data into memory all at once, and assumes that each of 324cbac8ce5b0394fe68329ac839a07474969dd7493Raymond Hettinger the input streams is already sorted (smallest to largest). 32500166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger 32600166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger >>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 32700166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger [0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25] 32800166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger 32900166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger ''' 33045eb0f141964bf59d20949fe82bea0af124d6854Raymond Hettinger _heappop, _heapreplace, _StopIteration = heappop, heapreplace, StopIteration 33100166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger 33200166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger h = [] 33300166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger h_append = h.append 33454da9819cc74fe6091d090d12753116cfb6c6c62Raymond Hettinger for itnum, it in enumerate(map(iter, iterables)): 33500166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger try: 33600166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger next = it.next 33754da9819cc74fe6091d090d12753116cfb6c6c62Raymond Hettinger h_append([next(), itnum, next]) 33800166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger except _StopIteration: 33900166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger pass 34000166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger heapify(h) 34100166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger 34200166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger while 1: 34300166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger try: 34400166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger while 1: 34554da9819cc74fe6091d090d12753116cfb6c6c62Raymond Hettinger v, itnum, next = s = h[0] # raises IndexError when h is empty 34600166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger yield v 34754da9819cc74fe6091d090d12753116cfb6c6c62Raymond Hettinger s[0] = next() # raises StopIteration when exhausted 34845eb0f141964bf59d20949fe82bea0af124d6854Raymond Hettinger _heapreplace(h, s) # restore heap condition 34900166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger except _StopIteration: 35054da9819cc74fe6091d090d12753116cfb6c6c62Raymond Hettinger _heappop(h) # remove empty iterator 35100166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger except IndexError: 35200166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger return 35300166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger 3544901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger# Extend the implementations of nsmallest and nlargest to use a key= argument 3554901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger_nsmallest = nsmallest 3564901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettingerdef nsmallest(n, iterable, key=None): 3574901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger """Find the n smallest elements in a dataset. 3584901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger 3594901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger Equivalent to: sorted(iterable, key=key)[:n] 3604901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger """ 361b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger # Short-cut for n==1 is to use min() when len(iterable)>0 362b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger if n == 1: 363b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger it = iter(iterable) 364b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger head = list(islice(it, 1)) 365b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger if not head: 366b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger return [] 367b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger if key is None: 368b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger return [min(chain(head, it))] 369b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger return [min(chain(head, it), key=key)] 370b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger 371b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger # When n>=size, it's faster to use sort() 372b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger try: 373b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger size = len(iterable) 374b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger except (TypeError, AttributeError): 375b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger pass 376b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger else: 377b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger if n >= size: 378b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger return sorted(iterable, key=key)[:n] 379b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger 380b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger # When key is none, use simpler decoration 381fe427895b58769840f1f61a82ea0cdfe55150347Georg Brandl if key is None: 382fe427895b58769840f1f61a82ea0cdfe55150347Georg Brandl it = izip(iterable, count()) # decorate 383fe427895b58769840f1f61a82ea0cdfe55150347Georg Brandl result = _nsmallest(n, it) 384fe427895b58769840f1f61a82ea0cdfe55150347Georg Brandl return map(itemgetter(0), result) # undecorate 385b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger 386b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger # General case, slowest method 3874901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger in1, in2 = tee(iterable) 3884901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger it = izip(imap(key, in1), count(), in2) # decorate 3894901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger result = _nsmallest(n, it) 3904901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger return map(itemgetter(2), result) # undecorate 3914901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger 3924901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger_nlargest = nlargest 3934901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettingerdef nlargest(n, iterable, key=None): 3944901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger """Find the n largest elements in a dataset. 3954901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger 3964901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger Equivalent to: sorted(iterable, key=key, reverse=True)[:n] 3974901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger """ 398b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger 399b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger # Short-cut for n==1 is to use max() when len(iterable)>0 400b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger if n == 1: 401b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger it = iter(iterable) 402b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger head = list(islice(it, 1)) 403b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger if not head: 404b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger return [] 405b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger if key is None: 406b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger return [max(chain(head, it))] 407b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger return [max(chain(head, it), key=key)] 408b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger 409b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger # When n>=size, it's faster to use sort() 410b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger try: 411b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger size = len(iterable) 412b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger except (TypeError, AttributeError): 413b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger pass 414b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger else: 415b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger if n >= size: 416b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger return sorted(iterable, key=key, reverse=True)[:n] 417b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger 418b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger # When key is none, use simpler decoration 419fe427895b58769840f1f61a82ea0cdfe55150347Georg Brandl if key is None: 420be9b765c073eefcc109320b651d977ff03090f2fRaymond Hettinger it = izip(iterable, count(0,-1)) # decorate 421fe427895b58769840f1f61a82ea0cdfe55150347Georg Brandl result = _nlargest(n, it) 422fe427895b58769840f1f61a82ea0cdfe55150347Georg Brandl return map(itemgetter(0), result) # undecorate 423b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger 424b5bc33cdabc2361afa60463fedd262a6b457dfdeRaymond Hettinger # General case, slowest method 4254901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger in1, in2 = tee(iterable) 426be9b765c073eefcc109320b651d977ff03090f2fRaymond Hettinger it = izip(imap(key, in1), count(0,-1), in2) # decorate 4274901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger result = _nlargest(n, it) 4284901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger return map(itemgetter(2), result) # undecorate 4294901a1f267e9d632f85054ce8b21ff23bff305e1Raymond Hettinger 430c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettingerif __name__ == "__main__": 431c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger # Simple sanity test 432c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger heap = [] 433c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0] 434c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger for item in data: 435c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger heappush(heap, item) 436c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger sort = [] 437c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger while heap: 438c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger sort.append(heappop(heap)) 439c46cb2a1a92c26e01ddb3921aa6828bcd3576f3eRaymond Hettinger print sort 44000166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger 44100166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger import doctest 44200166c5532fc7562c07383a0ae2985b3ffaf253aRaymond Hettinger doctest.testmod() 443