1.. _sortinghowto:
2
3Sorting HOW TO
4**************
5
6:Author: Andrew Dalke and Raymond Hettinger
7:Release: 0.1
8
9
10Python lists have a built-in :meth:`list.sort` method that modifies the list
11in-place.  There is also a :func:`sorted` built-in function that builds a new
12sorted list from an iterable.
13
14In this document, we explore the various techniques for sorting data using Python.
15
16
17Sorting Basics
18==============
19
20A simple ascending sort is very easy: just call the :func:`sorted` function. It
21returns a new sorted list::
22
23    >>> sorted([5, 2, 3, 1, 4])
24    [1, 2, 3, 4, 5]
25
26You can also use the :meth:`list.sort` method of a list. It modifies the list
27in-place (and returns ``None`` to avoid confusion). Usually it's less convenient
28than :func:`sorted` - but if you don't need the original list, it's slightly
29more efficient.
30
31    >>> a = [5, 2, 3, 1, 4]
32    >>> a.sort()
33    >>> a
34    [1, 2, 3, 4, 5]
35
36Another difference is that the :meth:`list.sort` method is only defined for
37lists. In contrast, the :func:`sorted` function accepts any iterable.
38
39    >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
40    [1, 2, 3, 4, 5]
41
42Key Functions
43=============
44
45Starting with Python 2.4, both :meth:`list.sort` and :func:`sorted` added a
46*key* parameter to specify a function to be called on each list element prior to
47making comparisons.
48
49For example, here's a case-insensitive string comparison:
50
51    >>> sorted("This is a test string from Andrew".split(), key=str.lower)
52    ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
53
54The value of the *key* parameter should be a function that takes a single argument
55and returns a key to use for sorting purposes. This technique is fast because
56the key function is called exactly once for each input record.
57
58A common pattern is to sort complex objects using some of the object's indices
59as keys. For example:
60
61    >>> student_tuples = [
62    ...     ('john', 'A', 15),
63    ...     ('jane', 'B', 12),
64    ...     ('dave', 'B', 10),
65    ... ]
66    >>> sorted(student_tuples, key=lambda student: student[2])   # sort by age
67    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
68
69The same technique works for objects with named attributes. For example:
70
71    >>> class Student:
72    ...     def __init__(self, name, grade, age):
73    ...         self.name = name
74    ...         self.grade = grade
75    ...         self.age = age
76    ...     def __repr__(self):
77    ...         return repr((self.name, self.grade, self.age))
78
79    >>> student_objects = [
80    ...     Student('john', 'A', 15),
81    ...     Student('jane', 'B', 12),
82    ...     Student('dave', 'B', 10),
83    ... ]
84    >>> sorted(student_objects, key=lambda student: student.age)   # sort by age
85    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
86
87Operator Module Functions
88=========================
89
90The key-function patterns shown above are very common, so Python provides
91convenience functions to make accessor functions easier and faster. The operator
92module has :func:`operator.itemgetter`, :func:`operator.attrgetter`, and
93starting in Python 2.5 an :func:`operator.methodcaller` function.
94
95Using those functions, the above examples become simpler and faster:
96
97    >>> from operator import itemgetter, attrgetter
98
99    >>> sorted(student_tuples, key=itemgetter(2))
100    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
101
102    >>> sorted(student_objects, key=attrgetter('age'))
103    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
104
105The operator module functions allow multiple levels of sorting. For example, to
106sort by *grade* then by *age*:
107
108    >>> sorted(student_tuples, key=itemgetter(1,2))
109    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
110
111    >>> sorted(student_objects, key=attrgetter('grade', 'age'))
112    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
113
114The :func:`operator.methodcaller` function makes method calls with fixed
115parameters for each object being sorted.  For example, the :meth:`str.count`
116method could be used to compute message priority by counting the
117number of exclamation marks in a message:
118
119    >>> from operator import methodcaller
120    >>> messages = ['critical!!!', 'hurry!', 'standby', 'immediate!!']
121    >>> sorted(messages, key=methodcaller('count', '!'))
122    ['standby', 'hurry!', 'immediate!!', 'critical!!!']
123
124Ascending and Descending
125========================
126
127Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a
128boolean value. This is used to flag descending sorts. For example, to get the
129student data in reverse *age* order:
130
131    >>> sorted(student_tuples, key=itemgetter(2), reverse=True)
132    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
133
134    >>> sorted(student_objects, key=attrgetter('age'), reverse=True)
135    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
136
137Sort Stability and Complex Sorts
138================================
139
140Starting with Python 2.2, sorts are guaranteed to be `stable
141<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that
142when multiple records have the same key, their original order is preserved.
143
144    >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
145    >>> sorted(data, key=itemgetter(0))
146    [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]
147
148Notice how the two records for *blue* retain their original order so that
149``('blue', 1)`` is guaranteed to precede ``('blue', 2)``.
150
151This wonderful property lets you build complex sorts in a series of sorting
152steps. For example, to sort the student data by descending *grade* and then
153ascending *age*, do the *age* sort first and then sort again using *grade*:
154
155    >>> s = sorted(student_objects, key=attrgetter('age'))     # sort on secondary key
156    >>> sorted(s, key=attrgetter('grade'), reverse=True)       # now sort on primary key, descending
157    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
158
159The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python
160does multiple sorts efficiently because it can take advantage of any ordering
161already present in a dataset.
162
163The Old Way Using Decorate-Sort-Undecorate
164==========================================
165
166This idiom is called Decorate-Sort-Undecorate after its three steps:
167
168* First, the initial list is decorated with new values that control the sort order.
169
170* Second, the decorated list is sorted.
171
172* Finally, the decorations are removed, creating a list that contains only the
173  initial values in the new order.
174
175For example, to sort the student data by *grade* using the DSU approach:
176
177    >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
178    >>> decorated.sort()
179    >>> [student for grade, i, student in decorated]               # undecorate
180    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
181
182This idiom works because tuples are compared lexicographically; the first items
183are compared; if they are the same then the second items are compared, and so
184on.
185
186It is not strictly necessary in all cases to include the index *i* in the
187decorated list, but including it gives two benefits:
188
189* The sort is stable -- if two items have the same key, their order will be
190  preserved in the sorted list.
191
192* The original items do not have to be comparable because the ordering of the
193  decorated tuples will be determined by at most the first two items. So for
194  example the original list could contain complex numbers which cannot be sorted
195  directly.
196
197Another name for this idiom is
198`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\,
199after Randal L. Schwartz, who popularized it among Perl programmers.
200
201For large lists and lists where the comparison information is expensive to
202calculate, and Python versions before 2.4, DSU is likely to be the fastest way
203to sort the list. For 2.4 and later, key functions provide the same
204functionality.
205
206The Old Way Using the *cmp* Parameter
207=====================================
208
209Many constructs given in this HOWTO assume Python 2.4 or later. Before that,
210there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword
211arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to
212handle user specified comparison functions.
213
214In Python 3, the *cmp* parameter was removed entirely (as part of a larger effort to
215simplify and unify the language, eliminating the conflict between rich
216comparisons and the :meth:`__cmp__` magic method).
217
218In Python 2, :meth:`~list.sort` allowed an optional function which can be called for doing the
219comparisons. That function should take two arguments to be compared and then
220return a negative value for less-than, return zero if they are equal, or return
221a positive value for greater-than. For example, we can do:
222
223    >>> def numeric_compare(x, y):
224    ...     return x - y
225    >>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare) # doctest: +SKIP
226    [1, 2, 3, 4, 5]
227
228Or you can reverse the order of comparison with:
229
230    >>> def reverse_numeric(x, y):
231    ...     return y - x
232    >>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric) # doctest: +SKIP
233    [5, 4, 3, 2, 1]
234
235When porting code from Python 2.x to 3.x, the situation can arise when you have
236the user supplying a comparison function and you need to convert that to a key
237function. The following wrapper makes that easy to do::
238
239    def cmp_to_key(mycmp):
240        'Convert a cmp= function into a key= function'
241        class K(object):
242            def __init__(self, obj, *args):
243                self.obj = obj
244            def __lt__(self, other):
245                return mycmp(self.obj, other.obj) < 0
246            def __gt__(self, other):
247                return mycmp(self.obj, other.obj) > 0
248            def __eq__(self, other):
249                return mycmp(self.obj, other.obj) == 0
250            def __le__(self, other):
251                return mycmp(self.obj, other.obj) <= 0
252            def __ge__(self, other):
253                return mycmp(self.obj, other.obj) >= 0
254            def __ne__(self, other):
255                return mycmp(self.obj, other.obj) != 0
256        return K
257
258To convert to a key function, just wrap the old comparison function:
259
260.. testsetup::
261
262    from functools import cmp_to_key
263
264.. doctest::
265
266    >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
267    [5, 4, 3, 2, 1]
268
269In Python 2.7, the :func:`functools.cmp_to_key` function was added to the
270functools module.
271
272Odd and Ends
273============
274
275* For locale aware sorting, use :func:`locale.strxfrm` for a key function or
276  :func:`locale.strcoll` for a comparison function.
277
278* The *reverse* parameter still maintains sort stability (so that records with
279  equal keys retain their original order). Interestingly, that effect can be
280  simulated without the parameter by using the builtin :func:`reversed` function
281  twice:
282
283    >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
284    >>> standard_way = sorted(data, key=itemgetter(0), reverse=True)
285    >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0))))
286    >>> assert standard_way == double_reversed
287    >>> standard_way
288    [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)]
289
290* To create a standard sort order for a class, just add the appropriate rich
291  comparison methods:
292
293    >>> Student.__eq__ = lambda self, other: self.age == other.age
294    >>> Student.__ne__ = lambda self, other: self.age != other.age
295    >>> Student.__lt__ = lambda self, other: self.age < other.age
296    >>> Student.__le__ = lambda self, other: self.age <= other.age
297    >>> Student.__gt__ = lambda self, other: self.age > other.age
298    >>> Student.__ge__ = lambda self, other: self.age >= other.age
299    >>> sorted(student_objects)
300    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
301
302  For general purpose comparisons, the recommended approach is to define all six
303  rich comparison operators.  The :func:`functools.total_ordering` class
304  decorator makes this easy to implement.
305
306* Key functions need not depend directly on the objects being sorted. A key
307  function can also access external resources. For instance, if the student grades
308  are stored in a dictionary, they can be used to sort a separate list of student
309  names:
310
311    >>> students = ['dave', 'john', 'jane']
312    >>> grades = {'john': 'F', 'jane':'A', 'dave': 'C'}
313    >>> sorted(students, key=grades.__getitem__)
314    ['jane', 'dave', 'john']
315