1
2
3
4ANTLR(1)               PCCTS Manual Pages                ANTLR(1)
5
6
7
8NAME
9     antlr - ANother Tool for Language Recognition
10
11SYNTAX
12     antlr [_o_p_t_i_o_n_s] _g_r_a_m_m_a_r__f_i_l_e_s
13
14DESCRIPTION
15     _A_n_t_l_r converts an extended form of context-free grammar into
16     a set of C functions which directly implement an efficient
17     form of deterministic recursive-descent LL(k) parser.
18     Context-free grammars may be augmented with predicates to
19     allow semantics to influence parsing; this allows a form of
20     context-sensitive parsing.  Selective backtracking is also
21     available to handle non-LL(k) and even non-LALR(k) con-
22     structs.  _A_n_t_l_r also produces a definition of a lexer which
23     can be automatically converted into C code for a DFA-based
24     lexer by _d_l_g.  Hence, _a_n_t_l_r serves a function much like that
25     of _y_a_c_c, however, it is notably more flexible and is more
26     integrated with a lexer generator (_a_n_t_l_r directly generates
27     _d_l_g code, whereas _y_a_c_c and _l_e_x are given independent
28     descriptions).  Unlike _y_a_c_c which accepts LALR(1) grammars,
29     _a_n_t_l_r accepts LL(k) grammars in an extended BNF notation -
30     which eliminates the need for precedence rules.
31
32     Like _y_a_c_c grammars, _a_n_t_l_r grammars can use automatically-
33     maintained symbol attribute values referenced as dollar
34     variables.  Further, because _a_n_t_l_r generates top-down
35     parsers, arbitrary values may be inherited from parent rules
36     (passed like function parameters).  _A_n_t_l_r also has a mechan-
37     ism for creating and manipulating abstract-syntax-trees.
38
39     There are various other niceties in _a_n_t_l_r, including the
40     ability to spread one grammar over multiple files or even
41     multiple grammars in a single file, the ability to generate
42     a version of the grammar with actions stripped out (for
43     documentation purposes), and lots more.
44
45OPTIONS
46     -ck _n
47          Use up to _n symbols of lookahead when using compressed
48          (linear approximation) lookahead.  This type of looka-
49          head is very cheap to compute and is attempted before
50          full LL(k) lookahead, which is of exponential complex-
51          ity in the worst case.  In general, the compressed loo-
52          kahead can be much deeper (e.g, -ck 10) _t_h_a_n _t_h_e _f_u_l_l
53          _l_o_o_k_a_h_e_a_d (_w_h_i_c_h _u_s_u_a_l_l_y _m_u_s_t _b_e _l_e_s_s _t_h_a_n _4).
54
55     -CC  Generate C++ output from both ANTLR and DLG.
56
57     -cr  Generate a cross-reference for all rules.  For each
58          rule, print a list of all other rules that reference
59          it.
60
61     -e1  Ambiguities/errors shown in low detail (default).
62
63     -e2  Ambiguities/errors shown in more detail.
64
65     -e3  Ambiguities/errors shown in excruciating detail.
66
67     -fe file
68          Rename err.c to file.
69
70     -fh file
71          Rename stdpccts.h header (turns on -gh) to file.
72
73     -fl file
74          Rename lexical output, parser.dlg, to file.
75
76     -fm file
77          Rename file with lexical mode definitions, mode.h, to
78          file.
79
80     -fr file
81          Rename file which remaps globally visible symbols,
82          remap.h, to file.
83
84     -ft file
85          Rename tokens.h to file.
86
87     -ga  Generate ANSI-compatible code (default case).  This has
88          not been rigorously tested to be ANSI XJ11 C compliant,
89          but it is close.  The normal output of _a_n_t_l_r is
90          currently compilable under both K&R, ANSI C, and C++-
91          this option does nothing because _a_n_t_l_r generates a
92          bunch of #ifdef's to do the right thing depending on
93          the language.
94
95     -gc  Indicates that _a_n_t_l_r should generate no C code, i.e.,
96          only perform analysis on the grammar.
97
98     -gd  C code is inserted in each of the _a_n_t_l_r generated pars-
99          ing functions to provide for user-defined handling of a
100          detailed parse trace.  The inserted code consists of
101          calls to the user-supplied macros or functions called
102          zzTRACEIN and zzTRACEOUT.  The only argument is a _c_h_a_r
103          * pointing to a C-style string which is the grammar
104          rule recognized by the current parsing function.  If no
105          definition is given for the trace functions, upon rule
106          entry and exit, a message will be printed indicating
107          that a particular rule as been entered or exited.
108
109     -ge  Generate an error class for each non-terminal.
110
111     -gh  Generate stdpccts.h for non-ANTLR-generated files to
112          include.  This file contains all defines needed to
113          describe the type of parser generated by _a_n_t_l_r (e.g.
114          how much lookahead is used and whether or not trees are
115          constructed) and contains the header action specified
116          by the user.
117
118     -gk  Generate parsers that delay lookahead fetches until
119          needed.  Without this option, _a_n_t_l_r generates parsers
120          which always have _k tokens of lookahead available.
121
122     -gl  Generate line info about grammar actions in C parser of
123          the form # _l_i_n_e "_f_i_l_e" which makes error messages from
124          the C/C++ compiler make more sense as they will point
125          into the grammar file not the resulting C file.
126          Debugging is easier as well, because you will step
127          through the grammar not C file.
128
129     -gs  Do not generate sets for token expression lists;
130          instead generate a ||-separated sequence of
131          LA(1)==_t_o_k_e_n__n_u_m_b_e_r.  The default is to generate sets.
132
133     -gt  Generate code for Abstract-Syntax Trees.
134
135     -gx  Do not create the lexical analyzer files (dlg-related).
136          This option should be given when the user wishes to
137          provide a customized lexical analyzer.  It may also be
138          used in _m_a_k_e scripts to cause only the parser to be
139          rebuilt when a change not affecting the lexical struc-
140          ture is made to the input grammars.
141
142     -k _n Set k of LL(k) to _n; i.e. set tokens of look-ahead
143          (default==1).
144
145     -o dir
146          Directory where output files should go (default=".").
147          This is very nice for keeping the source directory
148          clear of ANTLR and DLG spawn.
149
150     -p   The complete grammar, collected from all input grammar
151          files and stripped of all comments and embedded
152          actions, is listed to stdout.  This is intended to aid
153          in viewing the entire grammar as a whole and to elim-
154          inate the need to keep actions concisely stated so that
155          the grammar is easier to read.  Hence, it is preferable
156          to embed even complex actions directly in the grammar,
157          rather than to call them as subroutines, since the sub-
158          routine call overhead will be saved.
159
160     -pa  This option is the same as -p except that the output is
161          annotated with the first sets determined from grammar
162          analysis.
163
164     -prc on
165          Turn on the computation and hoisting of predicate con-
166          text.
167
168     -prc off
169          Turn off the computation and hoisting of predicate con-
170          text.  This option makes 1.10 behave like the 1.06
171          release with option -pr on.  Context computation is off
172          by default.
173
174     -rl _n
175          Limit the maximum number of tree nodes used by grammar
176          analysis to _n.  Occasionally, _a_n_t_l_r is unable to
177          analyze a grammar submitted by the user.  This rare
178          situation can only occur when the grammar is large and
179          the amount of lookahead is greater than one.  A non-
180          linear analysis algorithm is used by PCCTS to handle
181          the general case of LL(k) parsing.  The average com-
182          plexity of analysis, however, is near linear due to
183          some fancy footwork in the implementation which reduces
184          the number of calls to the full LL(k) algorithm.  An
185          error message will be displayed, if this limit is
186          reached, which indicates the grammar construct being
187          analyzed when _a_n_t_l_r hit a non-linearity.  Use this
188          option if _a_n_t_l_r seems to go out to lunch and your disk
189          start thrashing; try _n=10000 to start.  Once the
190          offending construct has been identified, try to remove
191          the ambiguity that _a_n_t_l_r was trying to overcome with
192          large lookahead analysis.  The introduction of (...)?
193          backtracking blocks eliminates some of these problems -
194          _a_n_t_l_r does not analyze alternatives that begin with
195          (...)? (it simply backtracks, if necessary, at run
196          time).
197
198     -w1  Set low warning level.  Do not warn if semantic
199          predicates and/or (...)? blocks are assumed to cover
200          ambiguous alternatives.
201
202     -w2  Ambiguous parsing decisions yield warnings even if
203          semantic predicates or (...)? blocks are used.  Warn if
204          predicate context computed and semantic predicates
205          incompletely disambiguate alternative productions.
206
207     -    Read grammar from standard input and generate stdin.c
208          as the parser file.
209
210SPECIAL CONSIDERATIONS
211     _A_n_t_l_r works...  we think.  There is no implicit guarantee of
212     anything.  We reserve no legal rights to the software known
213     as the Purdue Compiler Construction Tool Set (PCCTS) - PCCTS
214     is in the public domain.  An individual or company may do
215     whatever they wish with source code distributed with PCCTS
216     or the code generated by PCCTS, including the incorporation
217     of PCCTS, or its output, into commercial software.  We
218     encourage users to develop software with PCCTS.  However, we
219     do ask that credit is given to us for developing PCCTS.  By
220     "credit", we mean that if you incorporate our source code
221     into one of your programs (commercial product, research pro-
222     ject, or otherwise) that you acknowledge this fact somewhere
223     in the documentation, research report, etc...  If you like
224     PCCTS and have developed a nice tool with the output, please
225     mention that you developed it using PCCTS.  As long as these
226     guidelines are followed, we expect to continue enhancing
227     this system and expect to make other tools available as they
228     are completed.
229
230FILES
231     *.c  output C parser.
232
233     *.cpp
234          output C++ parser when C++ mode is used.
235
236     parser.dlg
237          output _d_l_g lexical analyzer.
238
239     err.c
240          token string array, error sets and error support rou-
241          tines.  Not used in C++ mode.
242
243     remap.h
244          file that redefines all globally visible parser sym-
245          bols.  The use of the #parser directive creates this
246          file.  Not used in C++ mode.
247
248     stdpccts.h
249          list of definitions needed by C files, not generated by
250          PCCTS, that reference PCCTS objects.  This is not gen-
251          erated by default.  Not used in C++ mode.
252
253     tokens.h
254          output #_d_e_f_i_n_e_s for tokens used and function prototypes
255          for functions generated for rules.
256
257
258SEE ALSO
259     dlg(1), pccts(1)
260
261
262
263
264
265