1.. _sortinghowto: 2 3Sorting HOW TO 4************** 5 6:Author: Andrew Dalke and Raymond Hettinger 7:Release: 0.1 8 9 10Python lists have a built-in :meth:`list.sort` method that modifies the list 11in-place. There is also a :func:`sorted` built-in function that builds a new 12sorted list from an iterable. 13 14In this document, we explore the various techniques for sorting data using Python. 15 16 17Sorting Basics 18============== 19 20A simple ascending sort is very easy: just call the :func:`sorted` function. It 21returns a new sorted list:: 22 23 >>> sorted([5, 2, 3, 1, 4]) 24 [1, 2, 3, 4, 5] 25 26You can also use the :meth:`list.sort` method of a list. It modifies the list 27in-place (and returns ``None`` to avoid confusion). Usually it's less convenient 28than :func:`sorted` - but if you don't need the original list, it's slightly 29more efficient. 30 31 >>> a = [5, 2, 3, 1, 4] 32 >>> a.sort() 33 >>> a 34 [1, 2, 3, 4, 5] 35 36Another difference is that the :meth:`list.sort` method is only defined for 37lists. In contrast, the :func:`sorted` function accepts any iterable. 38 39 >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'}) 40 [1, 2, 3, 4, 5] 41 42Key Functions 43============= 44 45Starting with Python 2.4, both :meth:`list.sort` and :func:`sorted` added a 46*key* parameter to specify a function to be called on each list element prior to 47making comparisons. 48 49For example, here's a case-insensitive string comparison: 50 51 >>> sorted("This is a test string from Andrew".split(), key=str.lower) 52 ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] 53 54The value of the *key* parameter should be a function that takes a single argument 55and returns a key to use for sorting purposes. This technique is fast because 56the key function is called exactly once for each input record. 57 58A common pattern is to sort complex objects using some of the object's indices 59as keys. For example: 60 61 >>> student_tuples = [ 62 ... ('john', 'A', 15), 63 ... ('jane', 'B', 12), 64 ... ('dave', 'B', 10), 65 ... ] 66 >>> sorted(student_tuples, key=lambda student: student[2]) # sort by age 67 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 68 69The same technique works for objects with named attributes. For example: 70 71 >>> class Student: 72 ... def __init__(self, name, grade, age): 73 ... self.name = name 74 ... self.grade = grade 75 ... self.age = age 76 ... def __repr__(self): 77 ... return repr((self.name, self.grade, self.age)) 78 79 >>> student_objects = [ 80 ... Student('john', 'A', 15), 81 ... Student('jane', 'B', 12), 82 ... Student('dave', 'B', 10), 83 ... ] 84 >>> sorted(student_objects, key=lambda student: student.age) # sort by age 85 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 86 87Operator Module Functions 88========================= 89 90The key-function patterns shown above are very common, so Python provides 91convenience functions to make accessor functions easier and faster. The operator 92module has :func:`operator.itemgetter`, :func:`operator.attrgetter`, and 93starting in Python 2.5 an :func:`operator.methodcaller` function. 94 95Using those functions, the above examples become simpler and faster: 96 97 >>> from operator import itemgetter, attrgetter 98 99 >>> sorted(student_tuples, key=itemgetter(2)) 100 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 101 102 >>> sorted(student_objects, key=attrgetter('age')) 103 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 104 105The operator module functions allow multiple levels of sorting. For example, to 106sort by *grade* then by *age*: 107 108 >>> sorted(student_tuples, key=itemgetter(1,2)) 109 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 110 111 >>> sorted(student_objects, key=attrgetter('grade', 'age')) 112 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 113 114The :func:`operator.methodcaller` function makes method calls with fixed 115parameters for each object being sorted. For example, the :meth:`str.count` 116method could be used to compute message priority by counting the 117number of exclamation marks in a message: 118 119 >>> from operator import methodcaller 120 >>> messages = ['critical!!!', 'hurry!', 'standby', 'immediate!!'] 121 >>> sorted(messages, key=methodcaller('count', '!')) 122 ['standby', 'hurry!', 'immediate!!', 'critical!!!'] 123 124Ascending and Descending 125======================== 126 127Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a 128boolean value. This is used to flag descending sorts. For example, to get the 129student data in reverse *age* order: 130 131 >>> sorted(student_tuples, key=itemgetter(2), reverse=True) 132 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 133 134 >>> sorted(student_objects, key=attrgetter('age'), reverse=True) 135 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 136 137Sort Stability and Complex Sorts 138================================ 139 140Starting with Python 2.2, sorts are guaranteed to be `stable 141<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that 142when multiple records have the same key, their original order is preserved. 143 144 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 145 >>> sorted(data, key=itemgetter(0)) 146 [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)] 147 148Notice how the two records for *blue* retain their original order so that 149``('blue', 1)`` is guaranteed to precede ``('blue', 2)``. 150 151This wonderful property lets you build complex sorts in a series of sorting 152steps. For example, to sort the student data by descending *grade* and then 153ascending *age*, do the *age* sort first and then sort again using *grade*: 154 155 >>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key 156 >>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending 157 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 158 159The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python 160does multiple sorts efficiently because it can take advantage of any ordering 161already present in a dataset. 162 163The Old Way Using Decorate-Sort-Undecorate 164========================================== 165 166This idiom is called Decorate-Sort-Undecorate after its three steps: 167 168* First, the initial list is decorated with new values that control the sort order. 169 170* Second, the decorated list is sorted. 171 172* Finally, the decorations are removed, creating a list that contains only the 173 initial values in the new order. 174 175For example, to sort the student data by *grade* using the DSU approach: 176 177 >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)] 178 >>> decorated.sort() 179 >>> [student for grade, i, student in decorated] # undecorate 180 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 181 182This idiom works because tuples are compared lexicographically; the first items 183are compared; if they are the same then the second items are compared, and so 184on. 185 186It is not strictly necessary in all cases to include the index *i* in the 187decorated list, but including it gives two benefits: 188 189* The sort is stable -- if two items have the same key, their order will be 190 preserved in the sorted list. 191 192* The original items do not have to be comparable because the ordering of the 193 decorated tuples will be determined by at most the first two items. So for 194 example the original list could contain complex numbers which cannot be sorted 195 directly. 196 197Another name for this idiom is 198`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\, 199after Randal L. Schwartz, who popularized it among Perl programmers. 200 201For large lists and lists where the comparison information is expensive to 202calculate, and Python versions before 2.4, DSU is likely to be the fastest way 203to sort the list. For 2.4 and later, key functions provide the same 204functionality. 205 206The Old Way Using the *cmp* Parameter 207===================================== 208 209Many constructs given in this HOWTO assume Python 2.4 or later. Before that, 210there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword 211arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to 212handle user specified comparison functions. 213 214In Python 3, the *cmp* parameter was removed entirely (as part of a larger effort to 215simplify and unify the language, eliminating the conflict between rich 216comparisons and the :meth:`__cmp__` magic method). 217 218In Python 2, :meth:`~list.sort` allowed an optional function which can be called for doing the 219comparisons. That function should take two arguments to be compared and then 220return a negative value for less-than, return zero if they are equal, or return 221a positive value for greater-than. For example, we can do: 222 223 >>> def numeric_compare(x, y): 224 ... return x - y 225 >>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare) # doctest: +SKIP 226 [1, 2, 3, 4, 5] 227 228Or you can reverse the order of comparison with: 229 230 >>> def reverse_numeric(x, y): 231 ... return y - x 232 >>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric) # doctest: +SKIP 233 [5, 4, 3, 2, 1] 234 235When porting code from Python 2.x to 3.x, the situation can arise when you have 236the user supplying a comparison function and you need to convert that to a key 237function. The following wrapper makes that easy to do:: 238 239 def cmp_to_key(mycmp): 240 'Convert a cmp= function into a key= function' 241 class K(object): 242 def __init__(self, obj, *args): 243 self.obj = obj 244 def __lt__(self, other): 245 return mycmp(self.obj, other.obj) < 0 246 def __gt__(self, other): 247 return mycmp(self.obj, other.obj) > 0 248 def __eq__(self, other): 249 return mycmp(self.obj, other.obj) == 0 250 def __le__(self, other): 251 return mycmp(self.obj, other.obj) <= 0 252 def __ge__(self, other): 253 return mycmp(self.obj, other.obj) >= 0 254 def __ne__(self, other): 255 return mycmp(self.obj, other.obj) != 0 256 return K 257 258To convert to a key function, just wrap the old comparison function: 259 260.. testsetup:: 261 262 from functools import cmp_to_key 263 264.. doctest:: 265 266 >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric)) 267 [5, 4, 3, 2, 1] 268 269In Python 2.7, the :func:`functools.cmp_to_key` function was added to the 270functools module. 271 272Odd and Ends 273============ 274 275* For locale aware sorting, use :func:`locale.strxfrm` for a key function or 276 :func:`locale.strcoll` for a comparison function. 277 278* The *reverse* parameter still maintains sort stability (so that records with 279 equal keys retain their original order). Interestingly, that effect can be 280 simulated without the parameter by using the builtin :func:`reversed` function 281 twice: 282 283 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 284 >>> standard_way = sorted(data, key=itemgetter(0), reverse=True) 285 >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0)))) 286 >>> assert standard_way == double_reversed 287 >>> standard_way 288 [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)] 289 290* To create a standard sort order for a class, just add the appropriate rich 291 comparison methods: 292 293 >>> Student.__eq__ = lambda self, other: self.age == other.age 294 >>> Student.__ne__ = lambda self, other: self.age != other.age 295 >>> Student.__lt__ = lambda self, other: self.age < other.age 296 >>> Student.__le__ = lambda self, other: self.age <= other.age 297 >>> Student.__gt__ = lambda self, other: self.age > other.age 298 >>> Student.__ge__ = lambda self, other: self.age >= other.age 299 >>> sorted(student_objects) 300 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 301 302 For general purpose comparisons, the recommended approach is to define all six 303 rich comparison operators. The :func:`functools.total_ordering` class 304 decorator makes this easy to implement. 305 306* Key functions need not depend directly on the objects being sorted. A key 307 function can also access external resources. For instance, if the student grades 308 are stored in a dictionary, they can be used to sort a separate list of student 309 names: 310 311 >>> students = ['dave', 'john', 'jane'] 312 >>> grades = {'john': 'F', 'jane':'A', 'dave': 'C'} 313 >>> sorted(students, key=grades.__getitem__) 314 ['jane', 'dave', 'john'] 315