1.. _sortinghowto: 2 3Sorting HOW TO 4************** 5 6:Author: Andrew Dalke and Raymond Hettinger 7:Release: 0.1 8 9 10Python lists have a built-in :meth:`list.sort` method that modifies the list 11in-place. There is also a :func:`sorted` built-in function that builds a new 12sorted list from an iterable. 13 14In this document, we explore the various techniques for sorting data using Python. 15 16 17Sorting Basics 18============== 19 20A simple ascending sort is very easy: just call the :func:`sorted` function. It 21returns a new sorted list:: 22 23 >>> sorted([5, 2, 3, 1, 4]) 24 [1, 2, 3, 4, 5] 25 26You can also use the :meth:`list.sort` method. It modifies the list 27in-place (and returns ``None`` to avoid confusion). Usually it's less convenient 28than :func:`sorted` - but if you don't need the original list, it's slightly 29more efficient. 30 31 >>> a = [5, 2, 3, 1, 4] 32 >>> a.sort() 33 >>> a 34 [1, 2, 3, 4, 5] 35 36Another difference is that the :meth:`list.sort` method is only defined for 37lists. In contrast, the :func:`sorted` function accepts any iterable. 38 39 >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'}) 40 [1, 2, 3, 4, 5] 41 42Key Functions 43============= 44 45Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a 46function (or other callable) to be called on each list element prior to making 47comparisons. 48 49For example, here's a case-insensitive string comparison: 50 51 >>> sorted("This is a test string from Andrew".split(), key=str.lower) 52 ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] 53 54The value of the *key* parameter should be a function (or other callable) that 55takes a single argument and returns a key to use for sorting purposes. This 56technique is fast because the key function is called exactly once for each 57input record. 58 59A common pattern is to sort complex objects using some of the object's indices 60as keys. For example: 61 62 >>> student_tuples = [ 63 ... ('john', 'A', 15), 64 ... ('jane', 'B', 12), 65 ... ('dave', 'B', 10), 66 ... ] 67 >>> sorted(student_tuples, key=lambda student: student[2]) # sort by age 68 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 69 70The same technique works for objects with named attributes. For example: 71 72 >>> class Student: 73 ... def __init__(self, name, grade, age): 74 ... self.name = name 75 ... self.grade = grade 76 ... self.age = age 77 ... def __repr__(self): 78 ... return repr((self.name, self.grade, self.age)) 79 80 >>> student_objects = [ 81 ... Student('john', 'A', 15), 82 ... Student('jane', 'B', 12), 83 ... Student('dave', 'B', 10), 84 ... ] 85 >>> sorted(student_objects, key=lambda student: student.age) # sort by age 86 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 87 88Operator Module Functions 89========================= 90 91The key-function patterns shown above are very common, so Python provides 92convenience functions to make accessor functions easier and faster. The 93:mod:`operator` module has :func:`~operator.itemgetter`, 94:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function. 95 96Using those functions, the above examples become simpler and faster: 97 98 >>> from operator import itemgetter, attrgetter 99 100 >>> sorted(student_tuples, key=itemgetter(2)) 101 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 102 103 >>> sorted(student_objects, key=attrgetter('age')) 104 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 105 106The operator module functions allow multiple levels of sorting. For example, to 107sort by *grade* then by *age*: 108 109 >>> sorted(student_tuples, key=itemgetter(1,2)) 110 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 111 112 >>> sorted(student_objects, key=attrgetter('grade', 'age')) 113 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 114 115Ascending and Descending 116======================== 117 118Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a 119boolean value. This is used to flag descending sorts. For example, to get the 120student data in reverse *age* order: 121 122 >>> sorted(student_tuples, key=itemgetter(2), reverse=True) 123 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 124 125 >>> sorted(student_objects, key=attrgetter('age'), reverse=True) 126 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 127 128Sort Stability and Complex Sorts 129================================ 130 131Sorts are guaranteed to be `stable 132<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that 133when multiple records have the same key, their original order is preserved. 134 135 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 136 >>> sorted(data, key=itemgetter(0)) 137 [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)] 138 139Notice how the two records for *blue* retain their original order so that 140``('blue', 1)`` is guaranteed to precede ``('blue', 2)``. 141 142This wonderful property lets you build complex sorts in a series of sorting 143steps. For example, to sort the student data by descending *grade* and then 144ascending *age*, do the *age* sort first and then sort again using *grade*: 145 146 >>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key 147 >>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending 148 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 149 150This can be abstracted out into a wrapper function that can take a list and 151tuples of field and order to sort them on multiple passes. 152 153 >>> def multisort(xs, specs): 154 ... for key, reverse in reversed(specs): 155 ... xs.sort(key=attrgetter(key), reverse=reverse) 156 ... return xs 157 158 >>> multisort(list(student_objects), (('grade', True), ('age', False))) 159 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 160 161The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python 162does multiple sorts efficiently because it can take advantage of any ordering 163already present in a dataset. 164 165The Old Way Using Decorate-Sort-Undecorate 166========================================== 167 168This idiom is called Decorate-Sort-Undecorate after its three steps: 169 170* First, the initial list is decorated with new values that control the sort order. 171 172* Second, the decorated list is sorted. 173 174* Finally, the decorations are removed, creating a list that contains only the 175 initial values in the new order. 176 177For example, to sort the student data by *grade* using the DSU approach: 178 179 >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)] 180 >>> decorated.sort() 181 >>> [student for grade, i, student in decorated] # undecorate 182 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 183 184This idiom works because tuples are compared lexicographically; the first items 185are compared; if they are the same then the second items are compared, and so 186on. 187 188It is not strictly necessary in all cases to include the index *i* in the 189decorated list, but including it gives two benefits: 190 191* The sort is stable -- if two items have the same key, their order will be 192 preserved in the sorted list. 193 194* The original items do not have to be comparable because the ordering of the 195 decorated tuples will be determined by at most the first two items. So for 196 example the original list could contain complex numbers which cannot be sorted 197 directly. 198 199Another name for this idiom is 200`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\, 201after Randal L. Schwartz, who popularized it among Perl programmers. 202 203Now that Python sorting provides key-functions, this technique is not often needed. 204 205 206The Old Way Using the *cmp* Parameter 207===================================== 208 209Many constructs given in this HOWTO assume Python 2.4 or later. Before that, 210there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword 211arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to 212handle user specified comparison functions. 213 214In Py3.0, the *cmp* parameter was removed entirely (as part of a larger effort to 215simplify and unify the language, eliminating the conflict between rich 216comparisons and the :meth:`__cmp__` magic method). 217 218In Py2.x, sort allowed an optional function which can be called for doing the 219comparisons. That function should take two arguments to be compared and then 220return a negative value for less-than, return zero if they are equal, or return 221a positive value for greater-than. For example, we can do: 222 223 >>> def numeric_compare(x, y): 224 ... return x - y 225 >>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare) # doctest: +SKIP 226 [1, 2, 3, 4, 5] 227 228Or you can reverse the order of comparison with: 229 230 >>> def reverse_numeric(x, y): 231 ... return y - x 232 >>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric) # doctest: +SKIP 233 [5, 4, 3, 2, 1] 234 235When porting code from Python 2.x to 3.x, the situation can arise when you have 236the user supplying a comparison function and you need to convert that to a key 237function. The following wrapper makes that easy to do:: 238 239 def cmp_to_key(mycmp): 240 'Convert a cmp= function into a key= function' 241 class K: 242 def __init__(self, obj, *args): 243 self.obj = obj 244 def __lt__(self, other): 245 return mycmp(self.obj, other.obj) < 0 246 def __gt__(self, other): 247 return mycmp(self.obj, other.obj) > 0 248 def __eq__(self, other): 249 return mycmp(self.obj, other.obj) == 0 250 def __le__(self, other): 251 return mycmp(self.obj, other.obj) <= 0 252 def __ge__(self, other): 253 return mycmp(self.obj, other.obj) >= 0 254 def __ne__(self, other): 255 return mycmp(self.obj, other.obj) != 0 256 return K 257 258To convert to a key function, just wrap the old comparison function: 259 260.. testsetup:: 261 262 from functools import cmp_to_key 263 264.. doctest:: 265 266 >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric)) 267 [5, 4, 3, 2, 1] 268 269In Python 3.2, the :func:`functools.cmp_to_key` function was added to the 270:mod:`functools` module in the standard library. 271 272Odd and Ends 273============ 274 275* For locale aware sorting, use :func:`locale.strxfrm` for a key function or 276 :func:`locale.strcoll` for a comparison function. 277 278* The *reverse* parameter still maintains sort stability (so that records with 279 equal keys retain the original order). Interestingly, that effect can be 280 simulated without the parameter by using the builtin :func:`reversed` function 281 twice: 282 283 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 284 >>> standard_way = sorted(data, key=itemgetter(0), reverse=True) 285 >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0)))) 286 >>> assert standard_way == double_reversed 287 >>> standard_way 288 [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)] 289 290* The sort routines are guaranteed to use :meth:`__lt__` when making comparisons 291 between two objects. So, it is easy to add a standard sort order to a class by 292 defining an :meth:`__lt__` method:: 293 294 >>> Student.__lt__ = lambda self, other: self.age < other.age 295 >>> sorted(student_objects) 296 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 297 298* Key functions need not depend directly on the objects being sorted. A key 299 function can also access external resources. For instance, if the student grades 300 are stored in a dictionary, they can be used to sort a separate list of student 301 names: 302 303 >>> students = ['dave', 'john', 'jane'] 304 >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'} 305 >>> sorted(students, key=newgrades.__getitem__) 306 ['jane', 'dave', 'john'] 307