1.. _sortinghowto:
2
3Sorting HOW TO
4**************
5
6:Author: Andrew Dalke and Raymond Hettinger
7:Release: 0.1
8
9
10Python lists have a built-in :meth:`list.sort` method that modifies the list
11in-place.  There is also a :func:`sorted` built-in function that builds a new
12sorted list from an iterable.
13
14In this document, we explore the various techniques for sorting data using Python.
15
16
17Sorting Basics
18==============
19
20A simple ascending sort is very easy: just call the :func:`sorted` function. It
21returns a new sorted list::
22
23    >>> sorted([5, 2, 3, 1, 4])
24    [1, 2, 3, 4, 5]
25
26You can also use the :meth:`list.sort` method. It modifies the list
27in-place (and returns ``None`` to avoid confusion). Usually it's less convenient
28than :func:`sorted` - but if you don't need the original list, it's slightly
29more efficient.
30
31    >>> a = [5, 2, 3, 1, 4]
32    >>> a.sort()
33    >>> a
34    [1, 2, 3, 4, 5]
35
36Another difference is that the :meth:`list.sort` method is only defined for
37lists. In contrast, the :func:`sorted` function accepts any iterable.
38
39    >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
40    [1, 2, 3, 4, 5]
41
42Key Functions
43=============
44
45Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a
46function (or other callable) to be called on each list element prior to making
47comparisons.
48
49For example, here's a case-insensitive string comparison:
50
51    >>> sorted("This is a test string from Andrew".split(), key=str.lower)
52    ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
53
54The value of the *key* parameter should be a function (or other callable) that
55takes a single argument and returns a key to use for sorting purposes. This
56technique is fast because the key function is called exactly once for each
57input record.
58
59A common pattern is to sort complex objects using some of the object's indices
60as keys. For example:
61
62    >>> student_tuples = [
63    ...     ('john', 'A', 15),
64    ...     ('jane', 'B', 12),
65    ...     ('dave', 'B', 10),
66    ... ]
67    >>> sorted(student_tuples, key=lambda student: student[2])   # sort by age
68    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
69
70The same technique works for objects with named attributes. For example:
71
72    >>> class Student:
73    ...     def __init__(self, name, grade, age):
74    ...         self.name = name
75    ...         self.grade = grade
76    ...         self.age = age
77    ...     def __repr__(self):
78    ...         return repr((self.name, self.grade, self.age))
79
80    >>> student_objects = [
81    ...     Student('john', 'A', 15),
82    ...     Student('jane', 'B', 12),
83    ...     Student('dave', 'B', 10),
84    ... ]
85    >>> sorted(student_objects, key=lambda student: student.age)   # sort by age
86    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
87
88Operator Module Functions
89=========================
90
91The key-function patterns shown above are very common, so Python provides
92convenience functions to make accessor functions easier and faster. The
93:mod:`operator` module has :func:`~operator.itemgetter`,
94:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function.
95
96Using those functions, the above examples become simpler and faster:
97
98    >>> from operator import itemgetter, attrgetter
99
100    >>> sorted(student_tuples, key=itemgetter(2))
101    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
102
103    >>> sorted(student_objects, key=attrgetter('age'))
104    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
105
106The operator module functions allow multiple levels of sorting. For example, to
107sort by *grade* then by *age*:
108
109    >>> sorted(student_tuples, key=itemgetter(1,2))
110    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
111
112    >>> sorted(student_objects, key=attrgetter('grade', 'age'))
113    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
114
115Ascending and Descending
116========================
117
118Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a
119boolean value. This is used to flag descending sorts. For example, to get the
120student data in reverse *age* order:
121
122    >>> sorted(student_tuples, key=itemgetter(2), reverse=True)
123    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
124
125    >>> sorted(student_objects, key=attrgetter('age'), reverse=True)
126    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
127
128Sort Stability and Complex Sorts
129================================
130
131Sorts are guaranteed to be `stable
132<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that
133when multiple records have the same key, their original order is preserved.
134
135    >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
136    >>> sorted(data, key=itemgetter(0))
137    [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]
138
139Notice how the two records for *blue* retain their original order so that
140``('blue', 1)`` is guaranteed to precede ``('blue', 2)``.
141
142This wonderful property lets you build complex sorts in a series of sorting
143steps. For example, to sort the student data by descending *grade* and then
144ascending *age*, do the *age* sort first and then sort again using *grade*:
145
146    >>> s = sorted(student_objects, key=attrgetter('age'))     # sort on secondary key
147    >>> sorted(s, key=attrgetter('grade'), reverse=True)       # now sort on primary key, descending
148    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
149
150This can be abstracted out into a wrapper function that can take a list and
151tuples of field and order to sort them on multiple passes.
152
153    >>> def multisort(xs, specs):
154    ...     for key, reverse in reversed(specs):
155    ...         xs.sort(key=attrgetter(key), reverse=reverse)
156    ...     return xs
157
158    >>> multisort(list(student_objects), (('grade', True), ('age', False)))
159    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
160
161The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python
162does multiple sorts efficiently because it can take advantage of any ordering
163already present in a dataset.
164
165The Old Way Using Decorate-Sort-Undecorate
166==========================================
167
168This idiom is called Decorate-Sort-Undecorate after its three steps:
169
170* First, the initial list is decorated with new values that control the sort order.
171
172* Second, the decorated list is sorted.
173
174* Finally, the decorations are removed, creating a list that contains only the
175  initial values in the new order.
176
177For example, to sort the student data by *grade* using the DSU approach:
178
179    >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
180    >>> decorated.sort()
181    >>> [student for grade, i, student in decorated]               # undecorate
182    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
183
184This idiom works because tuples are compared lexicographically; the first items
185are compared; if they are the same then the second items are compared, and so
186on.
187
188It is not strictly necessary in all cases to include the index *i* in the
189decorated list, but including it gives two benefits:
190
191* The sort is stable -- if two items have the same key, their order will be
192  preserved in the sorted list.
193
194* The original items do not have to be comparable because the ordering of the
195  decorated tuples will be determined by at most the first two items. So for
196  example the original list could contain complex numbers which cannot be sorted
197  directly.
198
199Another name for this idiom is
200`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\,
201after Randal L. Schwartz, who popularized it among Perl programmers.
202
203Now that Python sorting provides key-functions, this technique is not often needed.
204
205
206The Old Way Using the *cmp* Parameter
207=====================================
208
209Many constructs given in this HOWTO assume Python 2.4 or later. Before that,
210there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword
211arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to
212handle user specified comparison functions.
213
214In Py3.0, the *cmp* parameter was removed entirely (as part of a larger effort to
215simplify and unify the language, eliminating the conflict between rich
216comparisons and the :meth:`__cmp__` magic method).
217
218In Py2.x, sort allowed an optional function which can be called for doing the
219comparisons. That function should take two arguments to be compared and then
220return a negative value for less-than, return zero if they are equal, or return
221a positive value for greater-than. For example, we can do:
222
223    >>> def numeric_compare(x, y):
224    ...     return x - y
225    >>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare) # doctest: +SKIP
226    [1, 2, 3, 4, 5]
227
228Or you can reverse the order of comparison with:
229
230    >>> def reverse_numeric(x, y):
231    ...     return y - x
232    >>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric) # doctest: +SKIP
233    [5, 4, 3, 2, 1]
234
235When porting code from Python 2.x to 3.x, the situation can arise when you have
236the user supplying a comparison function and you need to convert that to a key
237function. The following wrapper makes that easy to do::
238
239    def cmp_to_key(mycmp):
240        'Convert a cmp= function into a key= function'
241        class K:
242            def __init__(self, obj, *args):
243                self.obj = obj
244            def __lt__(self, other):
245                return mycmp(self.obj, other.obj) < 0
246            def __gt__(self, other):
247                return mycmp(self.obj, other.obj) > 0
248            def __eq__(self, other):
249                return mycmp(self.obj, other.obj) == 0
250            def __le__(self, other):
251                return mycmp(self.obj, other.obj) <= 0
252            def __ge__(self, other):
253                return mycmp(self.obj, other.obj) >= 0
254            def __ne__(self, other):
255                return mycmp(self.obj, other.obj) != 0
256        return K
257
258To convert to a key function, just wrap the old comparison function:
259
260.. testsetup::
261
262    from functools import cmp_to_key
263
264.. doctest::
265
266    >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
267    [5, 4, 3, 2, 1]
268
269In Python 3.2, the :func:`functools.cmp_to_key` function was added to the
270:mod:`functools` module in the standard library.
271
272Odd and Ends
273============
274
275* For locale aware sorting, use :func:`locale.strxfrm` for a key function or
276  :func:`locale.strcoll` for a comparison function.
277
278* The *reverse* parameter still maintains sort stability (so that records with
279  equal keys retain the original order). Interestingly, that effect can be
280  simulated without the parameter by using the builtin :func:`reversed` function
281  twice:
282
283    >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
284    >>> standard_way = sorted(data, key=itemgetter(0), reverse=True)
285    >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0))))
286    >>> assert standard_way == double_reversed
287    >>> standard_way
288    [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)]
289
290* The sort routines are guaranteed to use :meth:`__lt__` when making comparisons
291  between two objects. So, it is easy to add a standard sort order to a class by
292  defining an :meth:`__lt__` method::
293
294    >>> Student.__lt__ = lambda self, other: self.age < other.age
295    >>> sorted(student_objects)
296    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
297
298* Key functions need not depend directly on the objects being sorted. A key
299  function can also access external resources. For instance, if the student grades
300  are stored in a dictionary, they can be used to sort a separate list of student
301  names:
302
303    >>> students = ['dave', 'john', 'jane']
304    >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
305    >>> sorted(students, key=newgrades.__getitem__)
306    ['jane', 'dave', 'john']
307