1:mod:`csv` --- CSV File Reading and Writing
2===========================================
3
4.. module:: csv
5   :synopsis: Write and read tabular data to and from delimited files.
6
7.. sectionauthor:: Skip Montanaro <skip@pobox.com>
8
9**Source code:** :source:`Lib/csv.py`
10
11.. index::
12   single: csv
13   pair: data; tabular
14
15--------------
16
17The so-called CSV (Comma Separated Values) format is the most common import and
18export format for spreadsheets and databases.  CSV format was used for many
19years prior to attempts to describe the format in a standardized way in
20:rfc:`4180`.  The lack of a well-defined standard means that subtle differences
21often exist in the data produced and consumed by different applications.  These
22differences can make it annoying to process CSV files from multiple sources.
23Still, while the delimiters and quoting characters vary, the overall format is
24similar enough that it is possible to write a single module which can
25efficiently manipulate such data, hiding the details of reading and writing the
26data from the programmer.
27
28The :mod:`csv` module implements classes to read and write tabular data in CSV
29format.  It allows programmers to say, "write this data in the format preferred
30by Excel," or "read data from this file which was generated by Excel," without
31knowing the precise details of the CSV format used by Excel.  Programmers can
32also describe the CSV formats understood by other applications or define their
33own special-purpose CSV formats.
34
35The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
36write sequences.  Programmers can also read and write data in dictionary form
37using the :class:`DictReader` and :class:`DictWriter` classes.
38
39.. seealso::
40
41   :pep:`305` - CSV File API
42      The Python Enhancement Proposal which proposed this addition to Python.
43
44
45.. _csv-contents:
46
47Module Contents
48---------------
49
50The :mod:`csv` module defines the following functions:
51
52
53.. index::
54   single: universal newlines; csv.reader function
55
56.. function:: reader(csvfile, dialect='excel', **fmtparams)
57
58   Return a reader object which will iterate over lines in the given *csvfile*.
59   *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
60   string each time its :meth:`!__next__` method is called --- :term:`file objects
61   <file object>` and list objects are both suitable.   If *csvfile* is a file object,
62   it should be opened with ``newline=''``. [1]_  An optional
63   *dialect* parameter can be given which is used to define a set of parameters
64   specific to a particular CSV dialect.  It may be an instance of a subclass of
65   the :class:`Dialect` class or one of the strings returned by the
66   :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
67   can be given to override individual formatting parameters in the current
68   dialect.  For full details about the dialect and formatting parameters, see
69   section :ref:`csv-fmt-params`.
70
71   Each row read from the csv file is returned as a list of strings.  No
72   automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
73   option is specified (in which case unquoted fields are transformed into floats).
74
75   A short usage example::
76
77      >>> import csv
78      >>> with open('eggs.csv', newline='') as csvfile:
79      ...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
80      ...     for row in spamreader:
81      ...         print(', '.join(row))
82      Spam, Spam, Spam, Spam, Spam, Baked Beans
83      Spam, Lovely Spam, Wonderful Spam
84
85
86.. function:: writer(csvfile, dialect='excel', **fmtparams)
87
88   Return a writer object responsible for converting the user's data into delimited
89   strings on the given file-like object.  *csvfile* can be any object with a
90   :func:`write` method.  If *csvfile* is a file object, it should be opened with
91   ``newline=''`` [1]_.  An optional *dialect*
92   parameter can be given which is used to define a set of parameters specific to a
93   particular CSV dialect.  It may be an instance of a subclass of the
94   :class:`Dialect` class or one of the strings returned by the
95   :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
96   can be given to override individual formatting parameters in the current
97   dialect.  For full details about the dialect and formatting parameters, see
98   section :ref:`csv-fmt-params`. To make it
99   as easy as possible to interface with modules which implement the DB API, the
100   value :const:`None` is written as the empty string.  While this isn't a
101   reversible transformation, it makes it easier to dump SQL NULL data values to
102   CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
103   All other non-string data are stringified with :func:`str` before being written.
104
105   A short usage example::
106
107      import csv
108      with open('eggs.csv', 'w', newline='') as csvfile:
109          spamwriter = csv.writer(csvfile, delimiter=' ',
110                                  quotechar='|', quoting=csv.QUOTE_MINIMAL)
111          spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
112          spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
113
114
115.. function:: register_dialect(name[, dialect[, **fmtparams]])
116
117   Associate *dialect* with *name*.  *name* must be a string. The
118   dialect can be specified either by passing a sub-class of :class:`Dialect`, or
119   by *fmtparams* keyword arguments, or both, with keyword arguments overriding
120   parameters of the dialect. For full details about the dialect and formatting
121   parameters, see section :ref:`csv-fmt-params`.
122
123
124.. function:: unregister_dialect(name)
125
126   Delete the dialect associated with *name* from the dialect registry.  An
127   :exc:`Error` is raised if *name* is not a registered dialect name.
128
129
130.. function:: get_dialect(name)
131
132   Return the dialect associated with *name*.  An :exc:`Error` is raised if
133   *name* is not a registered dialect name.  This function returns an immutable
134   :class:`Dialect`.
135
136.. function:: list_dialects()
137
138   Return the names of all registered dialects.
139
140
141.. function:: field_size_limit([new_limit])
142
143   Returns the current maximum field size allowed by the parser. If *new_limit* is
144   given, this becomes the new limit.
145
146
147The :mod:`csv` module defines the following classes:
148
149.. class:: DictReader(f, fieldnames=None, restkey=None, restval=None, \
150                      dialect='excel', *args, **kwds)
151
152   Create an object that operates like a regular reader but maps the
153   information in each row to an :mod:`OrderedDict <collections.OrderedDict>`
154   whose keys are given by the optional *fieldnames* parameter.
155
156   The *fieldnames* parameter is a :term:`sequence`.  If *fieldnames* is
157   omitted, the values in the first row of file *f* will be used as the
158   fieldnames.  Regardless of how the fieldnames are determined, the ordered
159   dictionary preserves their original ordering.
160
161   If a row has more fields than fieldnames, the remaining data is put in a
162   list and stored with the fieldname specified by *restkey* (which defaults
163   to ``None``).  If a non-blank row has fewer fields than fieldnames, the
164   missing values are filled-in with ``None``.
165
166   All other optional or keyword arguments are passed to the underlying
167   :class:`reader` instance.
168
169   .. versionchanged:: 3.6
170      Returned rows are now of type :class:`OrderedDict`.
171
172   A short usage example::
173
174       >>> import csv
175       >>> with open('names.csv', newline='') as csvfile:
176       ...     reader = csv.DictReader(csvfile)
177       ...     for row in reader:
178       ...         print(row['first_name'], row['last_name'])
179       ...
180       Eric Idle
181       John Cleese
182
183       >>> print(row)
184       OrderedDict([('first_name', 'John'), ('last_name', 'Cleese')])
185
186
187.. class:: DictWriter(f, fieldnames, restval='', extrasaction='raise', \
188                      dialect='excel', *args, **kwds)
189
190   Create an object which operates like a regular writer but maps dictionaries
191   onto output rows.  The *fieldnames* parameter is a :mod:`sequence
192   <collections.abc>` of keys that identify the order in which values in the
193   dictionary passed to the :meth:`writerow` method are written to file
194   *f*.  The optional *restval* parameter specifies the value to be
195   written if the dictionary is missing a key in *fieldnames*.  If the
196   dictionary passed to the :meth:`writerow` method contains a key not found in
197   *fieldnames*, the optional *extrasaction* parameter indicates what action to
198   take.
199   If it is set to ``'raise'``, the default value, a :exc:`ValueError`
200   is raised.
201   If it is set to ``'ignore'``, extra values in the dictionary are ignored.
202   Any other optional or keyword arguments are passed to the underlying
203   :class:`writer` instance.
204
205   Note that unlike the :class:`DictReader` class, the *fieldnames* parameter
206   of the :class:`DictWriter` class is not optional.
207
208   A short usage example::
209
210       import csv
211
212       with open('names.csv', 'w', newline='') as csvfile:
213           fieldnames = ['first_name', 'last_name']
214           writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
215
216           writer.writeheader()
217           writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
218           writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
219           writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
220
221
222.. class:: Dialect
223
224   The :class:`Dialect` class is a container class relied on primarily for its
225   attributes, which are used to define the parameters for a specific
226   :class:`reader` or :class:`writer` instance.
227
228
229.. class:: excel()
230
231   The :class:`excel` class defines the usual properties of an Excel-generated CSV
232   file.  It is registered with the dialect name ``'excel'``.
233
234
235.. class:: excel_tab()
236
237   The :class:`excel_tab` class defines the usual properties of an Excel-generated
238   TAB-delimited file.  It is registered with the dialect name ``'excel-tab'``.
239
240
241.. class:: unix_dialect()
242
243   The :class:`unix_dialect` class defines the usual properties of a CSV file
244   generated on UNIX systems, i.e. using ``'\n'`` as line terminator and quoting
245   all fields.  It is registered with the dialect name ``'unix'``.
246
247   .. versionadded:: 3.2
248
249
250.. class:: Sniffer()
251
252   The :class:`Sniffer` class is used to deduce the format of a CSV file.
253
254   The :class:`Sniffer` class provides two methods:
255
256   .. method:: sniff(sample, delimiters=None)
257
258      Analyze the given *sample* and return a :class:`Dialect` subclass
259      reflecting the parameters found.  If the optional *delimiters* parameter
260      is given, it is interpreted as a string containing possible valid
261      delimiter characters.
262
263
264   .. method:: has_header(sample)
265
266      Analyze the sample text (presumed to be in CSV format) and return
267      :const:`True` if the first row appears to be a series of column headers.
268
269An example for :class:`Sniffer` use::
270
271   with open('example.csv', newline='') as csvfile:
272       dialect = csv.Sniffer().sniff(csvfile.read(1024))
273       csvfile.seek(0)
274       reader = csv.reader(csvfile, dialect)
275       # ... process CSV file contents here ...
276
277
278The :mod:`csv` module defines the following constants:
279
280.. data:: QUOTE_ALL
281
282   Instructs :class:`writer` objects to quote all fields.
283
284
285.. data:: QUOTE_MINIMAL
286
287   Instructs :class:`writer` objects to only quote those fields which contain
288   special characters such as *delimiter*, *quotechar* or any of the characters in
289   *lineterminator*.
290
291
292.. data:: QUOTE_NONNUMERIC
293
294   Instructs :class:`writer` objects to quote all non-numeric fields.
295
296   Instructs the reader to convert all non-quoted fields to type *float*.
297
298
299.. data:: QUOTE_NONE
300
301   Instructs :class:`writer` objects to never quote fields.  When the current
302   *delimiter* occurs in output data it is preceded by the current *escapechar*
303   character.  If *escapechar* is not set, the writer will raise :exc:`Error` if
304   any characters that require escaping are encountered.
305
306   Instructs :class:`reader` to perform no special processing of quote characters.
307
308The :mod:`csv` module defines the following exception:
309
310
311.. exception:: Error
312
313   Raised by any of the functions when an error is detected.
314
315.. _csv-fmt-params:
316
317Dialects and Formatting Parameters
318----------------------------------
319
320To make it easier to specify the format of input and output records, specific
321formatting parameters are grouped together into dialects.  A dialect is a
322subclass of the :class:`Dialect` class having a set of specific methods and a
323single :meth:`validate` method.  When creating :class:`reader` or
324:class:`writer` objects, the programmer can specify a string or a subclass of
325the :class:`Dialect` class as the dialect parameter.  In addition to, or instead
326of, the *dialect* parameter, the programmer can also specify individual
327formatting parameters, which have the same names as the attributes defined below
328for the :class:`Dialect` class.
329
330Dialects support the following attributes:
331
332
333.. attribute:: Dialect.delimiter
334
335   A one-character string used to separate fields.  It defaults to ``','``.
336
337
338.. attribute:: Dialect.doublequote
339
340   Controls how instances of *quotechar* appearing inside a field should
341   themselves be quoted.  When :const:`True`, the character is doubled. When
342   :const:`False`, the *escapechar* is used as a prefix to the *quotechar*.  It
343   defaults to :const:`True`.
344
345   On output, if *doublequote* is :const:`False` and no *escapechar* is set,
346   :exc:`Error` is raised if a *quotechar* is found in a field.
347
348
349.. attribute:: Dialect.escapechar
350
351   A one-character string used by the writer to escape the *delimiter* if *quoting*
352   is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
353   :const:`False`. On reading, the *escapechar* removes any special meaning from
354   the following character. It defaults to :const:`None`, which disables escaping.
355
356
357.. attribute:: Dialect.lineterminator
358
359   The string used to terminate lines produced by the :class:`writer`. It defaults
360   to ``'\r\n'``.
361
362   .. note::
363
364      The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
365      end-of-line, and ignores *lineterminator*. This behavior may change in the
366      future.
367
368
369.. attribute:: Dialect.quotechar
370
371   A one-character string used to quote fields containing special characters, such
372   as the *delimiter* or *quotechar*, or which contain new-line characters.  It
373   defaults to ``'"'``.
374
375
376.. attribute:: Dialect.quoting
377
378   Controls when quotes should be generated by the writer and recognised by the
379   reader.  It can take on any of the :const:`QUOTE_\*` constants (see section
380   :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
381
382
383.. attribute:: Dialect.skipinitialspace
384
385   When :const:`True`, whitespace immediately following the *delimiter* is ignored.
386   The default is :const:`False`.
387
388
389.. attribute:: Dialect.strict
390
391   When ``True``, raise exception :exc:`Error` on bad CSV input.
392   The default is ``False``.
393
394Reader Objects
395--------------
396
397Reader objects (:class:`DictReader` instances and objects returned by the
398:func:`reader` function) have the following public methods:
399
400.. method:: csvreader.__next__()
401
402   Return the next row of the reader's iterable object as a list (if the object
403   was returned from :func:`reader`) or a dict (if it is a :class:`DictReader`
404   instance), parsed according to the current dialect.  Usually you should call
405   this as ``next(reader)``.
406
407
408Reader objects have the following public attributes:
409
410.. attribute:: csvreader.dialect
411
412   A read-only description of the dialect in use by the parser.
413
414
415.. attribute:: csvreader.line_num
416
417   The number of lines read from the source iterator. This is not the same as the
418   number of records returned, as records can span multiple lines.
419
420
421DictReader objects have the following public attribute:
422
423.. attribute:: csvreader.fieldnames
424
425   If not passed as a parameter when creating the object, this attribute is
426   initialized upon first access or when the first record is read from the
427   file.
428
429
430
431Writer Objects
432--------------
433
434:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
435the :func:`writer` function) have the following public methods.  A *row* must be
436an iterable of strings or numbers for :class:`Writer` objects and a dictionary
437mapping fieldnames to strings or numbers (by passing them through :func:`str`
438first) for :class:`DictWriter` objects.  Note that complex numbers are written
439out surrounded by parens. This may cause some problems for other programs which
440read CSV files (assuming they support complex numbers at all).
441
442
443.. method:: csvwriter.writerow(row)
444
445   Write the *row* parameter to the writer's file object, formatted according to
446   the current dialect.
447
448   .. versionchanged:: 3.5
449      Added support of arbitrary iterables.
450
451.. method:: csvwriter.writerows(rows)
452
453   Write all elements in *rows* (an iterable of *row* objects as described
454   above) to the writer's file object, formatted according to the current
455   dialect.
456
457Writer objects have the following public attribute:
458
459
460.. attribute:: csvwriter.dialect
461
462   A read-only description of the dialect in use by the writer.
463
464
465DictWriter objects have the following public method:
466
467
468.. method:: DictWriter.writeheader()
469
470   Write a row with the field names (as specified in the constructor).
471
472   .. versionadded:: 3.2
473
474
475.. _csv-examples:
476
477Examples
478--------
479
480The simplest example of reading a CSV file::
481
482   import csv
483   with open('some.csv', newline='') as f:
484       reader = csv.reader(f)
485       for row in reader:
486           print(row)
487
488Reading a file with an alternate format::
489
490   import csv
491   with open('passwd', newline='') as f:
492       reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
493       for row in reader:
494           print(row)
495
496The corresponding simplest possible writing example is::
497
498   import csv
499   with open('some.csv', 'w', newline='') as f:
500       writer = csv.writer(f)
501       writer.writerows(someiterable)
502
503Since :func:`open` is used to open a CSV file for reading, the file
504will by default be decoded into unicode using the system default
505encoding (see :func:`locale.getpreferredencoding`).  To decode a file
506using a different encoding, use the ``encoding`` argument of open::
507
508   import csv
509   with open('some.csv', newline='', encoding='utf-8') as f:
510       reader = csv.reader(f)
511       for row in reader:
512           print(row)
513
514The same applies to writing in something other than the system default
515encoding: specify the encoding argument when opening the output file.
516
517Registering a new dialect::
518
519   import csv
520   csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
521   with open('passwd', newline='') as f:
522       reader = csv.reader(f, 'unixpwd')
523
524A slightly more advanced use of the reader --- catching and reporting errors::
525
526   import csv, sys
527   filename = 'some.csv'
528   with open(filename, newline='') as f:
529       reader = csv.reader(f)
530       try:
531           for row in reader:
532               print(row)
533       except csv.Error as e:
534           sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
535
536And while the module doesn't directly support parsing strings, it can easily be
537done::
538
539   import csv
540   for row in csv.reader(['one,two,three']):
541       print(row)
542
543
544.. rubric:: Footnotes
545
546.. [1] If ``newline=''`` is not specified, newlines embedded inside quoted fields
547   will not be interpreted correctly, and on platforms that use ``\r\n`` linendings
548   on write an extra ``\r`` will be added.  It should always be safe to specify
549   ``newline=''``, since the csv module does its own
550   (:term:`universal <universal newlines>`) newline handling.
551