1:mod:`parser` --- Access Python parse trees
2===========================================
3
4.. module:: parser
5   :synopsis: Access parse trees for Python source code.
6
7.. moduleauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
8.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
9
10.. Copyright 1995 Virginia Polytechnic Institute and State University and Fred
11   L. Drake, Jr.  This copyright notice must be distributed on all copies, but
12   this document otherwise may be distributed as part of the Python
13   distribution.  No fee may be charged for this document in any representation,
14   either on paper or electronically.  This restriction does not affect other
15   elements in a distributed package in any way.
16
17.. index:: single: parsing; Python source code
18
19--------------
20
21The :mod:`parser` module provides an interface to Python's internal parser and
22byte-code compiler.  The primary purpose for this interface is to allow Python
23code to edit the parse tree of a Python expression and create executable code
24from this.  This is better than trying to parse and modify an arbitrary Python
25code fragment as a string because parsing is performed in a manner identical to
26the code forming the application.  It is also faster.
27
28.. note::
29
30   From Python 2.5 onward, it's much more convenient to cut in at the Abstract
31   Syntax Tree (AST) generation and compilation stage, using the :mod:`ast`
32   module.
33
34There are a few things to note about this module which are important to making
35use of the data structures created.  This is not a tutorial on editing the parse
36trees for Python code, but some examples of using the :mod:`parser` module are
37presented.
38
39Most importantly, a good understanding of the Python grammar processed by the
40internal parser is required.  For full information on the language syntax, refer
41to :ref:`reference-index`.  The parser
42itself is created from a grammar specification defined in the file
43:file:`Grammar/Grammar` in the standard Python distribution.  The parse trees
44stored in the ST objects created by this module are the actual output from the
45internal parser when created by the :func:`expr` or :func:`suite` functions,
46described below.  The ST objects created by :func:`sequence2st` faithfully
47simulate those structures.  Be aware that the values of the sequences which are
48considered "correct" will vary from one version of Python to another as the
49formal grammar for the language is revised.  However, transporting code from one
50Python version to another as source text will always allow correct parse trees
51to be created in the target version, with the only restriction being that
52migrating to an older version of the interpreter will not support more recent
53language constructs.  The parse trees are not typically compatible from one
54version to another, whereas source code has always been forward-compatible.
55
56Each element of the sequences returned by :func:`st2list` or :func:`st2tuple`
57has a simple form.  Sequences representing non-terminal elements in the grammar
58always have a length greater than one.  The first element is an integer which
59identifies a production in the grammar.  These integers are given symbolic names
60in the C header file :file:`Include/graminit.h` and the Python module
61:mod:`symbol`.  Each additional element of the sequence represents a component
62of the production as recognized in the input string: these are always sequences
63which have the same form as the parent.  An important aspect of this structure
64which should be noted is that keywords used to identify the parent node type,
65such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the
66node tree without any special treatment.  For example, the :keyword:`!if` keyword
67is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value
68associated with all :const:`NAME` tokens, including variable and function names
69defined by the user.  In an alternate form returned when line number information
70is requested, the same token might be represented as ``(1, 'if', 12)``, where
71the ``12`` represents the line number at which the terminal symbol was found.
72
73Terminal elements are represented in much the same way, but without any child
74elements and the addition of the source text which was identified.  The example
75of the :keyword:`if` keyword above is representative.  The various types of
76terminal symbols are defined in the C header file :file:`Include/token.h` and
77the Python module :mod:`token`.
78
79The ST objects are not required to support the functionality of this module,
80but are provided for three purposes: to allow an application to amortize the
81cost of processing complex parse trees, to provide a parse tree representation
82which conserves memory space when compared to the Python list or tuple
83representation, and to ease the creation of additional modules in C which
84manipulate parse trees.  A simple "wrapper" class may be created in Python to
85hide the use of ST objects.
86
87The :mod:`parser` module defines functions for a few distinct purposes.  The
88most important purposes are to create ST objects and to convert ST objects to
89other representations such as parse trees and compiled code objects, but there
90are also functions which serve to query the type of parse tree represented by an
91ST object.
92
93
94.. seealso::
95
96   Module :mod:`symbol`
97      Useful constants representing internal nodes of the parse tree.
98
99   Module :mod:`token`
100      Useful constants representing leaf nodes of the parse tree and functions for
101      testing node values.
102
103
104.. _creating-sts:
105
106Creating ST Objects
107-------------------
108
109ST objects may be created from source code or from a parse tree. When creating
110an ST object from source, different functions are used to create the ``'eval'``
111and ``'exec'`` forms.
112
113
114.. function:: expr(source)
115
116   The :func:`expr` function parses the parameter *source* as if it were an input
117   to ``compile(source, 'file.py', 'eval')``.  If the parse succeeds, an ST object
118   is created to hold the internal parse tree representation, otherwise an
119   appropriate exception is raised.
120
121
122.. function:: suite(source)
123
124   The :func:`suite` function parses the parameter *source* as if it were an input
125   to ``compile(source, 'file.py', 'exec')``.  If the parse succeeds, an ST object
126   is created to hold the internal parse tree representation, otherwise an
127   appropriate exception is raised.
128
129
130.. function:: sequence2st(sequence)
131
132   This function accepts a parse tree represented as a sequence and builds an
133   internal representation if possible.  If it can validate that the tree conforms
134   to the Python grammar and all nodes are valid node types in the host version of
135   Python, an ST object is created from the internal representation and returned
136   to the called.  If there is a problem creating the internal representation, or
137   if the tree cannot be validated, a :exc:`ParserError` exception is raised.  An
138   ST object created this way should not be assumed to compile correctly; normal
139   exceptions raised by compilation may still be initiated when the ST object is
140   passed to :func:`compilest`.  This may indicate problems not related to syntax
141   (such as a :exc:`MemoryError` exception), but may also be due to constructs such
142   as the result of parsing ``del f(0)``, which escapes the Python parser but is
143   checked by the bytecode compiler.
144
145   Sequences representing terminal tokens may be represented as either two-element
146   lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1,
147   'name', 56)``.  If the third element is present, it is assumed to be a valid
148   line number.  The line number may be specified for any subset of the terminal
149   symbols in the input tree.
150
151
152.. function:: tuple2st(sequence)
153
154   This is the same function as :func:`sequence2st`.  This entry point is
155   maintained for backward compatibility.
156
157
158.. _converting-sts:
159
160Converting ST Objects
161---------------------
162
163ST objects, regardless of the input used to create them, may be converted to
164parse trees represented as list- or tuple- trees, or may be compiled into
165executable code objects.  Parse trees may be extracted with or without line
166numbering information.
167
168
169.. function:: st2list(st, line_info=False, col_info=False)
170
171   This function accepts an ST object from the caller in *st* and returns a
172   Python list representing the equivalent parse tree.  The resulting list
173   representation can be used for inspection or the creation of a new parse tree in
174   list form.  This function does not fail so long as memory is available to build
175   the list representation.  If the parse tree will only be used for inspection,
176   :func:`st2tuple` should be used instead to reduce memory consumption and
177   fragmentation.  When the list representation is required, this function is
178   significantly faster than retrieving a tuple representation and converting that
179   to nested lists.
180
181   If *line_info* is true, line number information will be included for all
182   terminal tokens as a third element of the list representing the token.  Note
183   that the line number provided specifies the line on which the token *ends*.
184   This information is omitted if the flag is false or omitted.
185
186
187.. function:: st2tuple(st, line_info=False, col_info=False)
188
189   This function accepts an ST object from the caller in *st* and returns a
190   Python tuple representing the equivalent parse tree.  Other than returning a
191   tuple instead of a list, this function is identical to :func:`st2list`.
192
193   If *line_info* is true, line number information will be included for all
194   terminal tokens as a third element of the list representing the token.  This
195   information is omitted if the flag is false or omitted.
196
197
198.. function:: compilest(st, filename='<syntax-tree>')
199
200   .. index::
201      builtin: exec
202      builtin: eval
203
204   The Python byte compiler can be invoked on an ST object to produce code objects
205   which can be used as part of a call to the built-in :func:`exec` or :func:`eval`
206   functions. This function provides the interface to the compiler, passing the
207   internal parse tree from *st* to the parser, using the source file name
208   specified by the *filename* parameter. The default value supplied for *filename*
209   indicates that the source was an ST object.
210
211   Compiling an ST object may result in exceptions related to compilation; an
212   example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``:
213   this statement is considered legal within the formal grammar for Python but is
214   not a legal language construct.  The :exc:`SyntaxError` raised for this
215   condition is actually generated by the Python byte-compiler normally, which is
216   why it can be raised at this point by the :mod:`parser` module.  Most causes of
217   compilation failure can be diagnosed programmatically by inspection of the parse
218   tree.
219
220
221.. _querying-sts:
222
223Queries on ST Objects
224---------------------
225
226Two functions are provided which allow an application to determine if an ST was
227created as an expression or a suite.  Neither of these functions can be used to
228determine if an ST was created from source code via :func:`expr` or
229:func:`suite` or from a parse tree via :func:`sequence2st`.
230
231
232.. function:: isexpr(st)
233
234   .. index:: builtin: compile
235
236   When *st* represents an ``'eval'`` form, this function returns true, otherwise
237   it returns false.  This is useful, since code objects normally cannot be queried
238   for this information using existing built-in functions.  Note that the code
239   objects created by :func:`compilest` cannot be queried like this either, and
240   are identical to those created by the built-in :func:`compile` function.
241
242
243.. function:: issuite(st)
244
245   This function mirrors :func:`isexpr` in that it reports whether an ST object
246   represents an ``'exec'`` form, commonly known as a "suite."  It is not safe to
247   assume that this function is equivalent to ``not isexpr(st)``, as additional
248   syntactic fragments may be supported in the future.
249
250
251.. _st-errors:
252
253Exceptions and Error Handling
254-----------------------------
255
256The parser module defines a single exception, but may also pass other built-in
257exceptions from other portions of the Python runtime environment.  See each
258function for information about the exceptions it can raise.
259
260
261.. exception:: ParserError
262
263   Exception raised when a failure occurs within the parser module.  This is
264   generally produced for validation failures rather than the built-in
265   :exc:`SyntaxError` raised during normal parsing. The exception argument is
266   either a string describing the reason of the failure or a tuple containing a
267   sequence causing the failure from a parse tree passed to :func:`sequence2st`
268   and an explanatory string.  Calls to :func:`sequence2st` need to be able to
269   handle either type of exception, while calls to other functions in the module
270   will only need to be aware of the simple string values.
271
272Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may
273raise exceptions which are normally raised by the parsing and compilation
274process.  These include the built in exceptions :exc:`MemoryError`,
275:exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`.  In these
276cases, these exceptions carry all the meaning normally associated with them.
277Refer to the descriptions of each function for detailed information.
278
279
280.. _st-objects:
281
282ST Objects
283----------
284
285Ordered and equality comparisons are supported between ST objects. Pickling of
286ST objects (using the :mod:`pickle` module) is also supported.
287
288
289.. data:: STType
290
291   The type of the objects returned by :func:`expr`, :func:`suite` and
292   :func:`sequence2st`.
293
294ST objects have the following methods:
295
296
297.. method:: ST.compile(filename='<syntax-tree>')
298
299   Same as ``compilest(st, filename)``.
300
301
302.. method:: ST.isexpr()
303
304   Same as ``isexpr(st)``.
305
306
307.. method:: ST.issuite()
308
309   Same as ``issuite(st)``.
310
311
312.. method:: ST.tolist(line_info=False, col_info=False)
313
314   Same as ``st2list(st, line_info, col_info)``.
315
316
317.. method:: ST.totuple(line_info=False, col_info=False)
318
319   Same as ``st2tuple(st, line_info, col_info)``.
320
321
322Example: Emulation of :func:`compile`
323-------------------------------------
324
325While many useful operations may take place between parsing and bytecode
326generation, the simplest operation is to do nothing.  For this purpose, using
327the :mod:`parser` module to produce an intermediate data structure is equivalent
328to the code ::
329
330   >>> code = compile('a + 5', 'file.py', 'eval')
331   >>> a = 5
332   >>> eval(code)
333   10
334
335The equivalent operation using the :mod:`parser` module is somewhat longer, and
336allows the intermediate internal parse tree to be retained as an ST object::
337
338   >>> import parser
339   >>> st = parser.expr('a + 5')
340   >>> code = st.compile('file.py')
341   >>> a = 5
342   >>> eval(code)
343   10
344
345An application which needs both ST and code objects can package this code into
346readily available functions::
347
348   import parser
349
350   def load_suite(source_string):
351       st = parser.suite(source_string)
352       return st, st.compile()
353
354   def load_expression(source_string):
355       st = parser.expr(source_string)
356       return st, st.compile()
357