1======================================================================
2
3                       CHANGES_SUMMARY.TXT
4
5        A QUICK overview of changes from 1.33 in reverse order
6
7  A summary of additions rather than bug fixes and minor code changes.
8
9          Numbers refer to items in CHANGES_FROM_133*.TXT
10             which may contain additional information.
11
12                          DISCLAIMER
13
14 The software and these notes are provided "as is".  They may include
15 typographical or technical errors and their authors disclaims all
16 liability of any kind or nature for damages due to error, fault,
17 defect, or deficiency regardless of cause.  All warranties of any
18 kind, either express or implied, including, but not limited to, the
19 implied  warranties of merchantability and fitness for a particular
20 purpose are disclaimed.
21
22======================================================================
23
24#258. You can specify a user-defined base class for your parser
25
26    The base class must constructor must have a signature similar to
27    that of ANTLRParser.
28
29#253. Generation of block preamble (-preamble and -preamble_first)
30
31    The antlr option -preamble causes antlr to insert the code
32    BLOCK_PREAMBLE at the start of each rule and block.
33
34    The antlr option -preamble_first is similar, but inserts the
35    code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol
36    PreambleFirst_123 is equivalent to the first set defined by
37    the #FirstSetSymbol described in Item #248.
38
39#248. Generate symbol for first set of an alternative
40
41        rr : #FirstSetSymbol(rr_FirstSet)  ( Foo | Bar ) ;
42
43#216. Defer token fetch for C++ mode
44
45    When the ANTLRParser class is built with the pre-processor option
46    ZZDEFER_FETCH defined, the fetch of new tokens by consume() is deferred
47    until LA(i) or LT(i) is called.
48
49#215. Use reset() to reset DLGLexerBase
50#188. Added pccts/h/DLG_stream_input.h
51#180. Added ANTLRParser::getEofToken()
52#173. -glms for Microsoft style filenames with -gl
53#170. Suppression for predicates with lookahead depth >1
54
55      Consider the following grammar with -ck 2 and the predicate in rule
56      "a" with depth 2:
57
58            r1  : (ab)* "@"
59                ;
60
61            ab  : a
62                | b
63                ;
64
65            a   : (A B)? => <<p(LATEXT(2))>>? A B C
66                ;
67
68            b   : A B C
69                ;
70
71      Normally, the predicate would be hoisted into rule r1 in order to
72      determine whether to call rule "ab".  However it should *not* be
73      hoisted because, even if p is false, there is a valid alternative
74      in rule b.  With "-mrhoistk on" the predicate will be suppressed.
75
76      If "-info p" command line option is present the following information
77      will appear in the generated code:
78
79                while ( (LA(1)==A)
80        #if 0
81
82        Part (or all) of predicate with depth > 1 suppressed by alternative
83            without predicate
84
85        pred  <<  p(LATEXT(2))>>?
86                  depth=k=2  ("=>" guard)  rule a  line 8  t1.g
87          tree context:
88            (root = A
89               B
90            )
91
92        The token sequence which is suppressed: ( A B )
93        The sequence of references which generate that sequence of tokens:
94
95           1 to ab          r1/1       line 1     t1.g
96           2 ab             ab/1       line 4     t1.g
97           3 to b           ab/2       line 5     t1.g
98           4 b              b/1        line 11    t1.g
99           5 #token A       b/1        line 11    t1.g
100           6 #token B       b/1        line 11    t1.g
101
102        #endif
103
104      A slightly more complicated example:
105
106            r1  : (ab)* "@"
107                ;
108
109            ab  : a
110                | b
111                ;
112
113            a   : (A B)? => <<p(LATEXT(2))>>? (A  B | D E)
114                ;
115
116            b   : <<q(LATEXT(2))>>? D E
117                ;
118
119
120      In this case, the sequence (D E) in rule "a" which lies behind
121      the guard is used to suppress the predicate with context (D E)
122      in rule b.
123
124                while ( (LA(1)==A || LA(1)==D)
125            #if 0
126
127            Part (or all) of predicate with depth > 1 suppressed by alternative
128                without predicate
129
130            pred  <<  q(LATEXT(2))>>?
131                              depth=k=2  rule b  line 11  t2.g
132              tree context:
133                (root = D
134                   E
135                )
136
137            The token sequence which is suppressed: ( D E )
138            The sequence of references which generate that sequence of tokens:
139
140               1 to ab          r1/1       line 1     t2.g
141               2 ab             ab/1       line 4     t2.g
142               3 to a           ab/1       line 4     t2.g
143               4 a              a/1        line 8     t2.g
144               5 #token D       a/1        line 8     t2.g
145               6 #token E       a/1        line 8     t2.g
146
147            #endif
148            &&
149            #if 0
150
151            pred  <<  p(LATEXT(2))>>?
152                              depth=k=2  ("=>" guard)  rule a  line 8  t2.g
153              tree context:
154                (root = A
155                   B
156                )
157
158            #endif
159
160            (! ( LA(1)==A && LA(2)==B ) || p(LATEXT(2)) )  {
161                ab();
162                ...
163
164#165. (Changed in MR13) option -newAST
165
166      To create ASTs from an ANTLRTokenPtr antlr usually calls
167      "new AST(ANTLRTokenPtr)".  This option generates a call
168      to "newAST(ANTLRTokenPtr)" instead.  This allows a user
169      to define a parser member function to create an AST object.
170
171#161. (Changed in MR13) Switch -gxt inhibits generation of tokens.h
172
173#158. (Changed in MR13) #header causes problem for pre-processors
174
175      A user who runs the C pre-processor on antlr source suggested
176      that another syntax be allowed.  With MR13 such directives
177      such as #header, #pragma, etc. may be written as "\#header",
178      "\#pragma", etc.  For escaping pre-processor directives inside
179      a #header use something like the following:
180
181            \#header
182            <<
183                \#include <stdio.h>
184            >>
185
186#155. (Changed in MR13) Context behind predicates can suppress
187
188      With -mrhoist enabled the context behind a guarded predicate can
189      be used to suppress other predicates.  Consider the following grammar:
190
191        r0 : (r1)+;
192
193        r1  : rp
194            | rq
195            ;
196        rp  : <<p LATEXT(1)>>? B ;
197        rq : (A)? => <<q LATEXT(1)>>? (A|B);
198
199      In earlier versions both predicates "p" and "q" would be hoisted into
200      rule r0. With MR12c predicate p is suppressed because the context which
201      follows predicate q includes "B" which can "cover" predicate "p".  In
202      other words, in trying to decide in r0 whether to call r1, it doesn't
203      really matter whether p is false or true because, either way, there is
204      a valid choice within r1.
205
206#154. (Changed in MR13) Making hoist suppression explicit using <<nohoist>>
207
208      A common error, even among experienced pccts users, is to code
209      an init-action to inhibit hoisting rather than a leading action.
210      An init-action does not inhibit hoisting.
211
212      This was coded:
213
214        rule1 : <<;>> rule2
215
216      This is what was meant:
217
218        rule1 : <<;>> <<;>> rule2
219
220      With MR13, the user can code:
221
222        rule1 : <<;>> <<nohoist>> rule2
223
224      The following will give an error message:
225
226        rule1 : <<nohoist>> rule2
227
228      If the <<nohoist>> appears as an init-action rather than a leading
229      action an error message is issued.  The meaning of an init-action
230      containing "nohoist" is unclear: does it apply to just one
231      alternative or to all alternatives ?
232
233#151a. Addition of ANTLRParser::getLexer(), ANTLRTokenStream::getLexer()
234
235      You must manually cast the ANTLRTokenStream to your program's
236      lexer class. Because the name of the lexer's class is not fixed.
237      Thus it is impossible to incorporate it into the DLGLexerBase
238      class.
239
240#151b.(Changed in MR12) ParserBlackBox member getLexer()
241
242#150. (Changed in MR12) syntaxErrCount and lexErrCount now public
243
244#149. (Changed in MR12) antlr option -info o (letter o for orphan)
245
246      If there is more than one rule which is not referenced by any
247      other rule then all such rules are listed.  This is useful for
248      alerting one to rules which are not used, but which can still
249      contribute to ambiguity.
250
251#148. (Changed in MR11) #token names appearing in zztokens,token_tbl
252
253      One can write:
254
255            #token Plus ("+")             "\+"
256            #token RP   ("(")             "\("
257            #token COM  ("comment begin") "/\*"
258
259      The string in parenthesis will be used in syntax error messages.
260
261#146. (Changed in MR11) Option -treport for locating "difficult" alts
262
263      It can be difficult to determine which alternatives are causing
264      pccts to work hard to resolve an ambiguity.  In some cases the
265      ambiguity is successfully resolved after much CPU time so there
266      is no message at all.
267
268      A rough measure of the amount of work being peformed which is
269      independent of the CPU speed and system load is the number of
270      tnodes created.  Using "-info t" gives information about the
271      total number of tnodes created and the peak number of tnodes.
272
273        Tree Nodes:  peak 1300k  created 1416k  lost 0
274
275      It also puts in the generated C or C++ file the number of tnodes
276      created for a rule (at the end of the rule).  However this
277      information is not sufficient to locate the alternatives within
278      a rule which are causing the creation of tnodes.
279
280      Using:
281
282             antlr -treport 100000 ....
283
284      causes antlr to list on stdout any alternatives which require the
285      creation of more than 100,000 tnodes, along with the lookahead sets
286      for those alternatives.
287
288      The following is a trivial case from the ansi.g grammar which shows
289      the format of the report.  This report might be of more interest
290      in cases where 1,000,000 tuples were created to resolve the ambiguity.
291
292      -------------------------------------------------------------------------
293        There were 0 tuples whose ambiguity could not be resolved
294             by full lookahead
295        There were 157 tnodes created to resolve ambiguity between:
296
297          Choice 1: statement/2  line 475  file ansi.g
298          Choice 2: statement/3  line 476  file ansi.g
299
300            Intersection of lookahead[1] sets:
301
302               IDENTIFIER
303
304            Intersection of lookahead[2] sets:
305
306               LPARENTHESIS     COLON            AMPERSAND        MINUS
307               STAR             PLUSPLUS         MINUSMINUS       ONESCOMPLEMENT
308               NOT              SIZEOF           OCTALINT         DECIMALINT
309               HEXADECIMALINT   FLOATONE         FLOATTWO         IDENTIFIER
310               STRING           CHARACTER
311      -------------------------------------------------------------------------
312
313#143. (Changed in MR11) Optional ";" at end of #token statement
314
315      Fixes problem of:
316
317            #token X "x"
318
319            <<
320                parser action
321            >>
322
323      Being confused with:
324
325            #token X "x" <<lexical action>>
326
327#142. (Changed in MR11) class BufFileInput subclass of DLGInputStream
328
329      Alexey Demakov (demakov@kazbek.ispras.ru) has supplied class
330      BufFileInput derived from DLGInputStream which provides a
331      function lookahead(char *string) to test characters in the
332      input stream more than one character ahead.
333      The class is located in pccts/h/BufFileInput.* of the kit.
334
335#140. #pred to define predicates
336
337      +---------------------------------------------------+
338      | Note: Assume "-prc on" for this entire discussion |
339      +---------------------------------------------------+
340
341      A problem with predicates is that each one is regarded as
342      unique and capable of disambiguating cases where two
343      alternatives have identical lookahead.  For example:
344
345        rule : <<pred(LATEXT(1))>>? A
346             | <<pred(LATEXT(1))>>? A
347             ;
348
349      will not cause any error messages or warnings to be issued
350      by earlier versions of pccts.  To compare the text of the
351      predicates is an incomplete solution.
352
353      In 1.33MR11 I am introducing the #pred statement in order to
354      solve some problems with predicates.  The #pred statement allows
355      one to give a symbolic name to a "predicate literal" or a
356      "predicate expression" in order to refer to it in other predicate
357      expressions or in the rules of the grammar.
358
359      The predicate literal associated with a predicate symbol is C
360      or C++ code which can be used to test the condition.  A
361      predicate expression defines a predicate symbol in terms of other
362      predicate symbols using "!", "&&", and "||".  A predicate symbol
363      can be defined in terms of a predicate literal, a predicate
364      expression, or *both*.
365
366      When a predicate symbol is defined with both a predicate literal
367      and a predicate expression, the predicate literal is used to generate
368      code, but the predicate expression is used to check for two
369      alternatives with identical predicates in both alternatives.
370
371      Here are some examples of #pred statements:
372
373        #pred  IsLabel       <<isLabel(LATEXT(1))>>?
374        #pred  IsLocalVar    <<isLocalVar(LATEXT(1))>>?
375        #pred  IsGlobalVar   <<isGlobalVar(LATEXT(1)>>?
376        #pred  IsVar         <<isVar(LATEXT(1))>>?       IsLocalVar || IsGlobalVar
377        #pred  IsScoped      <<isScoped(LATEXT(1))>>?    IsLabel || IsLocalVar
378
379      I hope that the use of EBNF notation to describe the syntax of the
380      #pred statement will not cause problems for my readers (joke).
381
382        predStatement : "#pred"
383                            CapitalizedName
384                              (
385                                  "<<predicate_literal>>?"
386                                | "<<predicate_literal>>?"  predOrExpr
387                                | predOrExpr
388                              )
389                      ;
390
391        predOrExpr    : predAndExpr ( "||" predAndExpr ) * ;
392
393        predAndExpr   : predPrimary ( "&&" predPrimary ) * ;
394
395        predPrimary   : CapitalizedName
396                      | "!" predPrimary
397                      | "(" predOrExpr ")"
398                      ;
399
400      What is the purpose of this nonsense ?
401
402      To understand how predicate symbols help, you need to realize that
403      predicate symbols are used in two different ways with two different
404      goals.
405
406        a. Allow simplification of predicates which have been combined
407           during predicate hoisting.
408
409        b. Allow recognition of identical predicates which can't disambiguate
410           alternatives with common lookahead.
411
412      First we will discuss goal (a).  Consider the following rule:
413
414            rule0: rule1
415                 | ID
416                 | ...
417                 ;
418
419            rule1: rule2
420                 | rule3
421                 ;
422
423            rule2: <<isX(LATEXT(1))>>? ID ;
424            rule3: <<!isX(LATEXT(1)>>? ID ;
425
426      When the predicates in rule2 and rule3 are combined by hoisting
427      to create a prediction expression for rule1 the result is:
428
429            if ( LA(1)==ID
430                && ( isX(LATEXT(1) || !isX(LATEXT(1) ) ) { rule1(); ...
431
432      This is inefficient, but more importantly, can lead to false
433      assumptions that the predicate expression distinguishes the rule1
434      alternative with some other alternative with lookahead ID.  In
435      MR11 one can write:
436
437            #pred IsX     <<isX(LATEXT(1))>>?
438
439            ...
440
441            rule2: <<IsX>>? ID  ;
442            rule3: <<!IsX>>? ID ;
443
444      During hoisting MR11 recognizes this as a special case and
445      eliminates the predicates.  The result is a prediction
446      expression like the following:
447
448            if ( LA(1)==ID ) { rule1(); ...
449
450      Please note that the following cases which appear to be equivalent
451      *cannot* be simplified by MR11 during hoisting because the hoisting
452      logic only checks for a "!" in the predicate action, not in the
453      predicate expression for a predicate symbol.
454
455        *Not* equivalent and is not simplified during hoisting:
456
457            #pred IsX      <<isX(LATEXT(1))>>?
458            #pred NotX     <<!isX(LATEXT(1))>>?
459            ...
460            rule2: <<IsX>>? ID  ;
461            rule3: <<NotX>>? ID ;
462
463        *Not* equivalent and is not simplified during hoisting:
464
465            #pred IsX      <<isX(LATEXT(1))>>?
466            #pred NotX     !IsX
467            ...
468            rule2: <<IsX>>? ID  ;
469            rule3: <<NotX>>? ID ;
470
471      Now we will discuss goal (b).
472
473      When antlr discovers that there is a lookahead ambiguity between
474      two alternatives it attempts to resolve the ambiguity by searching
475      for predicates in both alternatives.  In the past any predicate
476      would do, even if the same one appeared in both alternatives:
477
478            rule: <<p(LATEXT(1))>>? X
479                | <<p(LATEXT(1))>>? X
480                ;
481
482      The #pred statement is a start towards solving this problem.
483      During ambiguity resolution (*not* predicate hoisting) the
484      predicates for the two alternatives are expanded and compared.
485      Consider the following example:
486
487            #pred Upper     <<isUpper(LATEXT(1))>>?
488            #pred Lower     <<isLower(LATEXT(1))>>?
489            #pred Alpha     <<isAlpha(LATEXT(1))>>?  Upper || Lower
490
491            rule0: rule1
492                 | <<Alpha>>? ID
493                 ;
494
495            rule1:
496                 | rule2
497                 | rule3
498                 ...
499                 ;
500
501            rule2: <<Upper>>? ID;
502            rule3: <<Lower>>? ID;
503
504      The definition of #pred Alpha expresses:
505
506            a. to test the predicate use the C code "isAlpha(LATEXT(1))"
507
508            b. to analyze the predicate use the information that
509               Alpha is equivalent to the union of Upper and Lower,
510
511      During ambiguity resolution the definition of Alpha is expanded
512      into "Upper || Lower" and compared with the predicate in the other
513      alternative, which is also "Upper || Lower".  Because they are
514      identical MR11 will report a problem.
515
516    -------------------------------------------------------------------------
517      t10.g, line 5: warning: the predicates used to disambiguate rule rule0
518             (file t10.g alt 1 line 5 and alt 2 line 6)
519             are identical when compared without context and may have no
520             resolving power for some lookahead sequences.
521    -------------------------------------------------------------------------
522
523      If you use the "-info p" option the output file will contain:
524
525    +----------------------------------------------------------------------+
526    |#if 0                                                                 |
527    |                                                                      |
528    |The following predicates are identical when compared without          |
529    |  lookahead context information.  For some ambiguous lookahead        |
530    |  sequences they may not have any power to resolve the ambiguity.     |
531    |                                                                      |
532    |Choice 1: rule0/1  alt 1  line 5  file t10.g                          |
533    |                                                                      |
534    |  The original predicate for choice 1 with available context          |
535    |    information:                                                      |
536    |                                                                      |
537    |    OR expr                                                           |
538    |                                                                      |
539    |      pred  <<  Upper>>?                                              |
540    |                        depth=k=1  rule rule2  line 14  t10.g         |
541    |        set context:                                                  |
542    |           ID                                                         |
543    |                                                                      |
544    |      pred  <<  Lower>>?                                              |
545    |                        depth=k=1  rule rule3  line 15  t10.g         |
546    |        set context:                                                  |
547    |           ID                                                         |
548    |                                                                      |
549    |  The predicate for choice 1 after expansion (but without context     |
550    |    information):                                                     |
551    |                                                                      |
552    |    OR expr                                                           |
553    |                                                                      |
554    |      pred  <<  isUpper(LATEXT(1))>>?                                 |
555    |                        depth=k=1  rule   line 1  t10.g               |
556    |                                                                      |
557    |      pred  <<  isLower(LATEXT(1))>>?                                 |
558    |                        depth=k=1  rule   line 2  t10.g               |
559    |                                                                      |
560    |                                                                      |
561    |Choice 2: rule0/2  alt 2  line 6  file t10.g                          |
562    |                                                                      |
563    |  The original predicate for choice 2 with available context          |
564    |    information:                                                      |
565    |                                                                      |
566    |  pred  <<  Alpha>>?                                                  |
567    |                    depth=k=1  rule rule0  line 6  t10.g              |
568    |    set context:                                                      |
569    |       ID                                                             |
570    |                                                                      |
571    |  The predicate for choice 2 after expansion (but without context     |
572    |    information):                                                     |
573    |                                                                      |
574    |  OR expr                                                             |
575    |                                                                      |
576    |    pred  <<  isUpper(LATEXT(1))>>?                                   |
577    |                      depth=k=1  rule   line 1  t10.g                 |
578    |                                                                      |
579    |    pred  <<  isLower(LATEXT(1))>>?                                   |
580    |                      depth=k=1  rule   line 2  t10.g                 |
581    |                                                                      |
582    |                                                                      |
583    |#endif                                                                |
584    +----------------------------------------------------------------------+
585
586      The comparison of the predicates for the two alternatives takes
587      place without context information, which means that in some cases
588      the predicates will be considered identical even though they operate
589      on disjoint lookahead sets.  Consider:
590
591            #pred Alpha
592
593            rule1: <<Alpha>>? ID
594                 | <<Alpha>>? Label
595                 ;
596
597      Because the comparison of predicates takes place without context
598      these will be considered identical.  The reason for comparing
599      without context is that otherwise it would be necessary to re-evaluate
600      the entire predicate expression for each possible lookahead sequence.
601      This would require more code to be written and more CPU time during
602      grammar analysis, and it is not yet clear whether anyone will even make
603      use of the new #pred facility.
604
605      A temporary workaround might be to use different #pred statements
606      for predicates you know have different context.  This would avoid
607      extraneous warnings.
608
609      The above example might be termed a "false positive".  Comparison
610      without context will also lead to "false negatives".  Consider the
611      following example:
612
613            #pred Alpha
614            #pred Beta
615
616            rule1: <<Alpha>>? A
617                 | rule2
618                 ;
619
620            rule2: <<Alpha>>? A
621                 | <<Beta>>?  B
622                 ;
623
624      The predicate used for alt 2 of rule1 is (Alpha || Beta).  This
625      appears to be different than the predicate Alpha used for alt1.
626      However, the context of Beta is B.  Thus when the lookahead is A
627      Beta will have no resolving power and Alpha will be used for both
628      alternatives.  Using the same predicate for both alternatives isn't
629      very helpful, but this will not be detected with 1.33MR11.
630
631      To properly handle this the predicate expression would have to be
632      evaluated for each distinct lookahead context.
633
634      To determine whether two predicate expressions are identical is
635      difficult.  The routine may fail to identify identical predicates.
636
637      The #pred feature also compares predicates to see if a choice between
638      alternatives which is resolved by a predicate which makes the second
639      choice unreachable.  Consider the following example:
640
641            #pred A         <<A(LATEXT(1)>>?
642            #pred B         <<B(LATEXT(1)>>?
643            #pred A_or_B    A || B
644
645            r   : s
646                | t
647                ;
648            s   : <<A_or_B>>? ID
649                ;
650            t   : <<A>>? ID
651                ;
652
653        ----------------------------------------------------------------------------
654        t11.g, line 5: warning: the predicate used to disambiguate the
655               first choice of  rule r
656             (file t11.g alt 1 line 5 and alt 2 line 6)
657             appears to "cover" the second predicate when compared without context.
658             The second predicate may have no resolving power for some lookahead
659               sequences.
660        ----------------------------------------------------------------------------
661
662#132. (Changed in 1.33MR11) Recognition of identical predicates in alts
663
664      Prior to 1.33MR11, there would be no ambiguity warning when the
665      very same predicate was used to disambiguate both alternatives:
666
667        test: ref B
668            | ref C
669            ;
670
671        ref : <<pred(LATEXT(1)>>? A
672
673      In 1.33MR11 this will cause the warning:
674
675        warning: the predicates used to disambiguate rule test
676            (file v98.g alt 1 line 1 and alt 2 line 2)
677             are identical and have no resolving power
678
679        -----------------  Note  -----------------
680
681          This is different than the following case
682
683                test: <<pred(LATEXT(1))>>? A B
684                    | <<pred(LATEXT(1)>>?  A C
685                    ;
686
687          In this case there are two distinct predicates
688          which have exactly the same text.  In the first
689          example there are two references to the same
690          predicate.  The problem represented by this
691          grammar will be addressed later.
692
693
694#127. (Changed in 1.33MR11)
695
696                    Count Syntax Errors     Count DLG Errors
697                    -------------------     ----------------
698
699       C++ mode     ANTLRParser::           DLGLexerBase::
700                      syntaxErrCount          lexErrCount
701       C mode       zzSyntaxErrCount        zzLexErrCount
702
703       The C mode variables are global and initialized to 0.
704       They are *not* reset to 0 automatically when antlr is
705       restarted.
706
707       The C++ mode variables are public.  They are initialized
708       to 0 by the constructors.  They are *not* reset to 0 by the
709       ANTLRParser::init() method.
710
711       Suggested by Reinier van den Born (reinier@vnet.ibm.com).
712
713#126. (Changed in 1.33MR11) Addition of #first <<...>>
714
715       The #first <<...>> inserts the specified text in the output
716       files before any other #include statements required by pccts.
717       The only things before the #first text are comments and
718       a #define ANTLR_VERSION.
719
720       Requested by  and Esa Pulkkinen (esap@cs.tut.fi) and Alexin
721       Zoltan (alexin@inf.u-szeged.hu).
722
723#124. A Note on the New "&&" Style Guarded Predicates
724
725        I've been asked several times, "What is the difference between
726        the old "=>" style guard predicates and the new style "&&" guard
727        predicates, and how do you choose one over the other" ?
728
729        The main difference is that the "=>" does not apply the
730        predicate if the context guard doesn't match, whereas
731        the && form always does.  What is the significance ?
732
733        If you have a predicate which is not on the "leading edge"
734        it is cannot be hoisted.  Suppose you need a predicate that
735        looks at LA(2).  You must introduce it manually.  The
736        classic example is:
737
738            castExpr :
739                     LP typeName RP
740                     | ....
741                     ;
742
743            typeName : <<isTypeName(LATEXT(1))>>?  ID
744                     | STRUCT ID
745                     ;
746
747        The problem  is that isTypeName() isn't on the leading edge
748        of typeName, so it won't be hoisted into castExpr to help
749        make a decision on which production to choose.
750
751        The *first* attempt to fix it is this:
752
753            castExpr :
754                     <<isTypeName(LATEXT(2))>>?
755                                        LP typeName RP
756                     | ....
757                     ;
758
759        Unfortunately, this won't work because it ignores
760        the problem of STRUCT.  The solution is to apply
761        isTypeName() in castExpr if LA(2) is an ID and
762        don't apply it when LA(2) is STRUCT:
763
764            castExpr :
765                     (LP ID)? => <<isTypeName(LATEXT(2))>>?
766                                        LP typeName RP
767                     | ....
768                     ;
769
770        In conclusion, the "=>" style guarded predicate is
771        useful when:
772
773            a. the tokens required for the predicate
774               are not on the leading edge
775            b. there are alternatives in the expression
776               selected by the predicate for which the
777               predicate is inappropriate
778
779        If (b) were false, then one could use a simple
780        predicate (assuming "-prc on"):
781
782            castExpr :
783                     <<isTypeName(LATEXT(2))>>?
784                                        LP typeName RP
785                     | ....
786                     ;
787
788            typeName : <<isTypeName(LATEXT(1))>>?  ID
789                     ;
790
791        So, when do you use the "&&" style guarded predicate ?
792
793        The new-style "&&" predicate should always be used with
794        predicate context.  The context guard is in ADDITION to
795        the automatically computed context.  Thus it useful for
796        predicates which depend on the token type for reasons
797        other than context.
798
799        The following example is contributed by Reinier van den Born
800        (reinier@vnet.ibm.com).
801
802 +-------------------------------------------------------------------------+
803 | This grammar has two ways to call functions:                            |
804 |                                                                         |
805 |  - a "standard" call syntax with parens and comma separated args        |
806 |  - a shell command like syntax (no parens and spacing separated args)   |
807 |                                                                         |
808 | The former also allows a variable to hold the name of the function,     |
809 | the latter can also be used to call external commands.                  |
810 |                                                                         |
811 | The grammar (simplified) looks like this:                               |
812 |                                                                         |
813 |   fun_call   :     ID "(" { expr ("," expr)* } ")"                      |
814 |                                  /* ID is function name */              |
815 |              | "@" ID "(" { expr ("," expr)* } ")"                      |
816 |                                  /* ID is var containing fun name */    |
817 |              ;                                                          |
818 |                                                                         |
819 |   command    : ID expr*          /* ID is function name */              |
820 |              | path expr*        /* path is external command name */    |
821 |              ;                                                          |
822 |                                                                         |
823 |   path       : ID                /* left out slashes and such */        |
824 |              | "@" ID            /* ID is environment var */            |
825 |              ;                                                          |
826 |                                                                         |
827 |   expr       : ....                                                     |
828 |              | "(" expr ")";                                            |
829 |                                                                         |
830 |   call       : fun_call                                                 |
831 |              | command                                                  |
832 |              ;                                                          |
833 |                                                                         |
834 | Obviously the call is wildly ambiguous. This is more or less how this   |
835 | is to be resolved:                                                      |
836 |                                                                         |
837 |    A call begins with an ID or an @ followed by an ID.                  |
838 |                                                                         |
839 |    If it is an ID and if it is an ext. command name  -> command         |
840 |                       if followed by a paren         -> fun_call        |
841 |                       otherwise                      -> command         |
842 |                                                                         |
843 |    If it is an @  and if the ID is a var name        -> fun_call        |
844 |                       otherwise                      -> command         |
845 |                                                                         |
846 | One can implement these rules quite neatly using && predicates:         |
847 |                                                                         |
848 |   call       : ("@" ID)? && <<isVarName(LT(2))>>? fun_call              |
849 |              | (ID)?     && <<isExtCmdName>>?     command               |
850 |              | (ID "(")?                          fun_call              |
851 |              |                                    command               |
852 |              ;                                                          |
853 |                                                                         |
854 | This can be done better, so it is not an ideal example, but it          |
855 | conveys the principle.                                                  |
856 +-------------------------------------------------------------------------+
857
858#122. (Changed in 1.33MR11)  Member functions to reset DLG in C++ mode
859
860         void DLGFileReset(FILE *f) { input = f; found_eof = 0; }
861         void DLGStringReset(DLGChar *s) { input = s; p = &input[0]; }
862
863        Supplied by R.A. Nelson (cowboy@VNET.IBM.COM)
864
865#119. (Changed in 1.33MR11) Ambiguity aid for grammars
866
867      The user can ask for additional information on ambiguities reported
868      by antlr to stdout.  At the moment, only one ambiguity report can
869      be created in an antlr run.
870
871      This feature is enabled using the "-aa" (Ambiguity Aid)  option.
872
873      The following options control the reporting of ambiguities:
874
875          -aa ruleName       Selects reporting by name of rule
876          -aa lineNumber     Selects reporting by line number
877                               (file name not compared)
878
879          -aam               Selects "multiple" reporting for a token
880                             in the intersection set of the
881                             alternatives.
882
883                             For instance, the token ID may appear dozens
884                             of times in various paths as the program
885                             explores the rules which are reachable from
886                             the point of an ambiguity. With option -aam
887                             every possible path the search program
888                             encounters is reported.
889
890                             Without -aam only the first encounter is
891                             reported.  This may result in incomplete
892                             information, but the information may be
893                             sufficient and much shorter.
894
895          -aad depth         Selects the depth of the search.
896                             The default value is 1.
897
898                             The number of paths to be searched, and the
899                             size of the report can grow geometrically
900                             with the -ck value if a full search for all
901                             contributions to the source of the ambiguity
902                             is explored.
903
904                             The depth represents the number of tokens
905                             in the lookahead set which are matched against
906                             the set of ambiguous tokens.  A depth of 1
907                             means that the search stops when a lookahead
908                             sequence of just one token is matched.
909
910                             A k=1 ck=6 grammar might generate 5,000 items
911                             in a report if a full depth 6 search is made
912                             with the Ambiguity Aid.  The source of the
913                             problem may be in the first token and obscured
914                             by the volume of data - I hesitate to call
915                             it information.
916
917                             When the user selects a depth > 1, the search
918                             is first performed at depth=1 for both
919                             alternatives, then depth=2 for both alternatives,
920                             etc.
921
922      Sample output for rule grammar in antlr.g itself:
923
924  +---------------------------------------------------------------------+
925  | Ambiguity Aid                                                       |
926  |                                                                     |
927  |   Choice 1: grammar/70                 line 632  file a.g           |
928  |   Choice 2: grammar/82                 line 644  file a.g           |
929  |                                                                     |
930  |   Intersection of lookahead[1] sets:                                |
931  |                                                                     |
932  |      "\}"             "class"          "#errclass"      "#tokclass" |
933  |                                                                     |
934  |    Choice:1  Depth:1  Group:1  ("#errclass")                        |
935  |  1 in (...)* block                grammar/70       line 632   a.g   |
936  |  2 to error                       grammar/73       line 635   a.g   |
937  |  3 error                          error/1          line 894   a.g   |
938  |  4 #token "#errclass"             error/2          line 895   a.g   |
939  |                                                                     |
940  |    Choice:1  Depth:1  Group:2  ("#tokclass")                        |
941  |  2 to tclass                      grammar/74       line 636   a.g   |
942  |  3 tclass                         tclass/1         line 937   a.g   |
943  |  4 #token "#tokclass"             tclass/2         line 938   a.g   |
944  |                                                                     |
945  |    Choice:1  Depth:1  Group:3  ("class")                            |
946  |  2 to class_def                   grammar/75       line 637   a.g   |
947  |  3 class_def                      class_def/1      line 669   a.g   |
948  |  4 #token "class"                 class_def/3      line 671   a.g   |
949  |                                                                     |
950  |    Choice:1  Depth:1  Group:4  ("\}")                               |
951  |  2 #token "\}"                    grammar/76       line 638   a.g   |
952  |                                                                     |
953  |    Choice:2  Depth:1  Group:5  ("#errclass")                        |
954  |  1 in (...)* block                grammar/83       line 645   a.g   |
955  |  2 to error                       grammar/93       line 655   a.g   |
956  |  3 error                          error/1          line 894   a.g   |
957  |  4 #token "#errclass"             error/2          line 895   a.g   |
958  |                                                                     |
959  |    Choice:2  Depth:1  Group:6  ("#tokclass")                        |
960  |  2 to tclass                      grammar/94       line 656   a.g   |
961  |  3 tclass                         tclass/1         line 937   a.g   |
962  |  4 #token "#tokclass"             tclass/2         line 938   a.g   |
963  |                                                                     |
964  |    Choice:2  Depth:1  Group:7  ("class")                            |
965  |  2 to class_def                   grammar/95       line 657   a.g   |
966  |  3 class_def                      class_def/1      line 669   a.g   |
967  |  4 #token "class"                 class_def/3      line 671   a.g   |
968  |                                                                     |
969  |    Choice:2  Depth:1  Group:8  ("\}")                               |
970  |  2 #token "\}"                    grammar/96       line 658   a.g   |
971  +---------------------------------------------------------------------+
972
973      For a linear lookahead set ambiguity (where k=1 or for k>1 but
974      when all lookahead sets [i] with i<k all have degree one) the
975      reports appear in the following order:
976
977        for (depth=1 ; depth <= "-aad depth" ; depth++) {
978          for (alternative=1; alternative <=2 ; alternative++) {
979            while (matches-are-found) {
980              group++;
981              print-report
982            };
983          };
984       };
985
986      For reporting a k-tuple ambiguity, the reports appear in the
987      following order:
988
989        for (depth=1 ; depth <= "-aad depth" ; depth++) {
990          while (matches-are-found) {
991            for (alternative=1; alternative <=2 ; alternative++) {
992              group++;
993              print-report
994            };
995          };
996       };
997
998      This is because matches are generated in different ways for
999      linear lookahead and k-tuples.
1000
1001#117. (Changed in 1.33MR10) new EXPERIMENTAL predicate hoisting code
1002
1003      The hoisting of predicates into rules to create prediction
1004      expressions is a problem in antlr.  Consider the following
1005      example (k=1 with -prc on):
1006
1007        start   : (a)* "@" ;
1008        a       : b | c ;
1009        b       : <<isUpper(LATEXT(1))>>? A ;
1010        c       : A ;
1011
1012      Prior to 1.33MR10 the code generated for "start" would resemble:
1013
1014        while {
1015            if (LA(1)==A &&
1016                    (!LA(1)==A || isUpper())) {
1017              a();
1018            }
1019        };
1020
1021      This code is wrong because it makes rule "c" unreachable from
1022      "start".  The essence of the problem is that antlr fails to
1023      recognize that there can be a valid alternative within "a" even
1024      when the predicate <<isUpper(LATEXT(1))>>? is false.
1025
1026      In 1.33MR10 with -mrhoist the hoisting of the predicate into
1027      "start" is suppressed because it recognizes that "c" can
1028      cover all the cases where the predicate is false:
1029
1030        while {
1031            if (LA(1)==A) {
1032              a();
1033            }
1034        };
1035
1036      With the antlr "-info p" switch the user will receive information
1037      about the predicate suppression in the generated file:
1038
1039      --------------------------------------------------------------
1040        #if 0
1041
1042        Hoisting of predicate suppressed by alternative without predicate.
1043        The alt without the predicate includes all cases where
1044            the predicate is false.
1045
1046           WITH predicate: line 7  v1.g
1047           WITHOUT predicate: line 7  v1.g
1048
1049        The context set for the predicate:
1050
1051             A
1052
1053        The lookahead set for the alt WITHOUT the semantic predicate:
1054
1055             A
1056
1057        The predicate:
1058
1059          pred <<  isUpper(LATEXT(1))>>?
1060                          depth=k=1  rule b  line 9  v1.g
1061            set context:
1062               A
1063            tree context: null
1064
1065        Chain of referenced rules:
1066
1067            #0  in rule start (line 5 v1.g) to rule a
1068            #1  in rule a (line 7 v1.g)
1069
1070        #endif
1071      --------------------------------------------------------------
1072
1073      A predicate can be suppressed by a combination of alternatives
1074      which, taken together, cover a predicate:
1075
1076        start   : (a)* "@" ;
1077
1078        a       : b | ca | cb | cc ;
1079
1080        b       : <<isUpper(LATEXT(1))>>? ( A | B | C ) ;
1081
1082        ca      : A ;
1083        cb      : B ;
1084        cc      : C ;
1085
1086      Consider a more complex example in which "c" covers only part of
1087      a predicate:
1088
1089        start   : (a)* "@" ;
1090
1091        a       : b
1092                | c
1093                ;
1094
1095        b       : <<isUpper(LATEXT(1))>>?
1096                    ( A
1097                    | X
1098                    );
1099
1100        c       : A
1101                ;
1102
1103      Prior to 1.33MR10 the code generated for "start" would resemble:
1104
1105        while {
1106            if ( (LA(1)==A || LA(1)==X) &&
1107                    (! (LA(1)==A || LA(1)==X) || isUpper()) {
1108              a();
1109            }
1110        };
1111
1112      With 1.33MR10 and -mrhoist the predicate context is restricted to
1113      the non-covered lookahead.  The code resembles:
1114
1115        while {
1116            if ( (LA(1)==A || LA(1)==X) &&
1117                  (! (LA(1)==X) || isUpper()) {
1118              a();
1119            }
1120        };
1121
1122      With the antlr "-info p" switch the user will receive information
1123      about the predicate restriction in the generated file:
1124
1125      --------------------------------------------------------------
1126        #if 0
1127
1128        Restricting the context of a predicate because of overlap
1129          in the lookahead set between the alternative with the
1130          semantic predicate and one without
1131        Without this restriction the alternative without the predicate
1132          could not be reached when input matched the context of the
1133          predicate and the predicate was false.
1134
1135           WITH predicate: line 11  v4.g
1136           WITHOUT predicate: line 12  v4.g
1137
1138        The original context set for the predicate:
1139
1140             A                X
1141
1142        The lookahead set for the alt WITHOUT the semantic predicate:
1143
1144             A
1145
1146        The intersection of the two sets
1147
1148             A
1149
1150        The original predicate:
1151
1152          pred <<  isUpper(LATEXT(1))>>?
1153                          depth=k=1  rule b  line 15  v4.g
1154            set context:
1155               A                X
1156            tree context: null
1157
1158        The new (modified) form of the predicate:
1159
1160          pred <<  isUpper(LATEXT(1))>>?
1161                          depth=k=1  rule b  line 15  v4.g
1162            set context:
1163               X
1164            tree context: null
1165
1166        #endif
1167      --------------------------------------------------------------
1168
1169      The bad news about -mrhoist:
1170
1171        (a) -mrhoist does not analyze predicates with lookahead
1172            depth > 1.
1173
1174        (b) -mrhoist does not look past a guarded predicate to
1175            find context which might cover other predicates.
1176
1177      For these cases you might want to use syntactic predicates.
1178      When a semantic predicate fails during guess mode the guess
1179      fails and the next alternative is tried.
1180
1181      Limitation (a) is illustrated by the following example:
1182
1183        start    : (stmt)* EOF ;
1184
1185        stmt     : cast
1186                 | expr
1187                 ;
1188        cast     : <<isTypename(LATEXT(2))>>? LP ID RP ;
1189
1190        expr     : LP ID RP ;
1191
1192      This is not much different from the first example, except that
1193      it requires two tokens of lookahead context to determine what
1194      to do.  This predicate is NOT suppressed because the current version
1195      is unable to handle predicates with depth > 1.
1196
1197      A predicate can be combined with other predicates during hoisting.
1198      In those cases the depth=1 predicates are still handled.  Thus,
1199      in the following example the isUpper() predicate will be suppressed
1200      by line #4 when hoisted from "bizarre" into "start", but will still
1201      be present in "bizarre" in order to predict "stmt".
1202
1203        start    : (bizarre)* EOF ;     // #1
1204                                        // #2
1205        bizarre  : stmt                 // #3
1206                 | A                    // #4
1207                 ;
1208
1209        stmt     : cast
1210                 | expr
1211                 ;
1212
1213        cast     : <<isTypename(LATEXT(2))>>? LP ID RP ;
1214
1215        expr     : LP ID RP ;
1216                 | <<isUpper(LATEXT(1))>>? A
1217
1218      Limitation (b) is illustrated by the following example of a
1219      context guarded predicate:
1220
1221        rule : (A)? <<p>>?          // #1
1222                     (A             // #2
1223                     |B             // #3
1224                     )              // #4
1225             | <<q>> B              // #5
1226             ;
1227
1228      Recall that this means that when the lookahead is NOT A then
1229      the predicate "p" is ignored and it attempts to match "A|B".
1230      Ideally, the "B" at line #3 should suppress predicate "q".
1231      However, the current version does not attempt to look past
1232      the guard predicate to find context which might suppress other
1233      predicates.
1234
1235      In some cases -mrhoist will lead to the reporting of ambiguities
1236      which were not visible before:
1237
1238        start   : (a)* "@";
1239        a       : bc | d;
1240        bc      : b  | c ;
1241
1242        b       : <<isUpper(LATEXT(1))>>? A;
1243        c       : A ;
1244
1245        d       : A ;
1246
1247      In this case there is a true ambiguity in "a" between "bc" and "d"
1248      which can both match "A".  Without -mrhoist the predicate in "b"
1249      is hoisted into "a" and there is no ambiguity reported.  However,
1250      with -mrhoist, the predicate in "b" is suppressed by "c" (as it
1251      should be) making the ambiguity in "a" apparent.
1252
1253      The motivations for these changes were hoisting problems reported
1254      by Reinier van den Born (reinier@vnet.ibm.com) and several others.
1255
1256#113. (Changed in 1.33MR10) new context guarded pred: (g)? && <<p>>? expr
1257
1258      The existing context guarded predicate:
1259
1260            rule : (guard)? => <<p>>? expr
1261                 | next_alternative
1262                 ;
1263
1264      generates code which resembles:
1265
1266            if (lookahead(expr) && (!guard || pred)) {
1267              expr()
1268            } else ....
1269
1270      This is not suitable for some applications because it allows
1271      expr() to be invoked when the predicate is false.  This is
1272      intentional because it is meant to mimic automatically computed
1273      predicate context.
1274
1275      The new context guarded predicate uses the guard information
1276      differently because it has a different goal.  Consider:
1277
1278            rule : (guard)? && <<p>>? expr
1279                 | next_alternative
1280                 ;
1281
1282      The new style of context guarded predicate is equivalent to:
1283
1284            rule : <<guard==true && pred>>? expr
1285                 | next_alternative
1286                 ;
1287
1288      It generates code which resembles:
1289
1290            if (lookahead(expr) && guard && pred) {
1291                expr();
1292            } else ...
1293
1294      Both forms of guarded predicates severely restrict the form of
1295      the context guard: it can contain no rule references, no
1296      (...)*, no (...)+, and no {...}.  It may contain token and
1297      token class references, and alternation ("|").
1298
1299      Addition for 1.33MR11: in the token expression all tokens must
1300      be at the same height of the token tree:
1301
1302            (A ( B | C))? && ...            is ok (all height 2)
1303            (A ( B |  ))? && ...            is not ok (some 1, some 2)
1304            (A B C D | E F G H)? && ...     is ok (all height 4)
1305            (A B C D | E )? && ...          is not ok (some 4, some 1)
1306
1307      This restriction is required in order to properly compute the lookahead
1308      set for expressions like:
1309
1310            rule1 : (A B C)? && <<pred>>? rule2 ;
1311            rule2 : (A|X) (B|Y) (C|Z);
1312
1313      This addition was suggested by Rienier van den Born (reinier@vnet.ibm.com)
1314
1315#109. (Changed in 1.33MR10) improved trace information
1316
1317      The quality of the trace information provided by the "-gd"
1318      switch has been improved significantly.  Here is an example
1319      of the output from a test program.  It shows the rule name,
1320      the first token of lookahead, the call depth, and the guess
1321      status:
1322
1323        exit rule gusxx {"?"} depth 2
1324        enter rule gusxx {"?"} depth 2
1325        enter rule gus1 {"o"} depth 3 guessing
1326        guess done - returning to rule gus1 {"o"} at depth 3
1327                    (guess mode continues - an enclosing guess is still active)
1328        guess done - returning to rule gus1 {"Z"} at depth 3
1329                    (guess mode continues - an enclosing guess is still active)
1330        exit rule gus1 {"Z"} depth 3 guessing
1331        guess done - returning to rule gusxx {"o"} at depth 2 (guess mode ends)
1332        enter rule gus1 {"o"} depth 3
1333        guess done - returning to rule gus1 {"o"} at depth 3 (guess mode ends)
1334        guess done - returning to rule gus1 {"Z"} at depth 3 (guess mode ends)
1335        exit rule gus1 {"Z"} depth 3
1336        line 1: syntax error at "Z" missing SC
1337            ...
1338
1339      Rule trace reporting is controlled by the value of the integer
1340      [zz]traceOptionValue:  when it is positive tracing is enabled,
1341      otherwise it is disabled.  Tracing during guess mode is controlled
1342      by the value of the integer [zz]traceGuessOptionValue.  When
1343      it is positive AND [zz]traceOptionValue is positive rule trace
1344      is reported in guess mode.
1345
1346      The values of [zz]traceOptionValue and [zz]traceGuessOptionValue
1347      can be adjusted by subroutine calls listed below.
1348
1349      Depending on the presence or absence of the antlr -gd switch
1350      the variable [zz]traceOptionValueDefault is set to 0 or 1.  When
1351      the parser is initialized or [zz]traceReset() is called the
1352      value of [zz]traceOptionValueDefault is copied to [zz]traceOptionValue.
1353      The value of [zz]traceGuessOptionValue is always initialzed to 1,
1354      but, as noted earlier, nothing will be reported unless
1355      [zz]traceOptionValue is also positive.
1356
1357      When the parser state is saved/restored the value of the trace
1358      variables are also saved/restored.  If a restore causes a change in
1359      reporting behavior from on to off or vice versa this will be reported.
1360
1361      When the -gd option is selected, the macro "#define zzTRACE_RULES"
1362      is added to appropriate output files.
1363
1364        C++ mode
1365        --------
1366        int     traceOption(int delta)
1367        int     traceGuessOption(int delta)
1368        void    traceReset()
1369        int     traceOptionValueDefault
1370
1371        C mode
1372        --------
1373        int     zzTraceOption(int delta)
1374        int     zzTraceGuessOption(int delta)
1375        void    zzTraceReset()
1376        int     zzTraceOptionValueDefault
1377
1378      The argument "delta" is added to the traceOptionValue.  To
1379      turn on trace when inside a particular rule one:
1380
1381        rule : <<traceOption(+1);>>
1382               (
1383                rest-of-rule
1384               )
1385               <<traceOption(-1);>>
1386       ;  /* fail clause */ <<traceOption(-1);>>
1387
1388      One can use the same idea to turn *off* tracing within a
1389      rule by using a delta of (-1).
1390
1391      An improvement in the rule trace was suggested by Sramji
1392      Ramanathan (ps@kumaran.com).
1393
1394#108. A Note on Deallocation of Variables Allocated in Guess Mode
1395
1396                            NOTE
1397        ------------------------------------------------------
1398        This mechanism only works for heap allocated variables
1399        ------------------------------------------------------
1400
1401      The rewrite of the trace provides the machinery necessary
1402      to properly free variables or undo actions following a
1403      failed guess.
1404
1405      The macro zzUSER_GUESS_HOOK(guessSeq,zzrv) is expanded
1406      as part of the zzGUESS macro.  When a guess is opened
1407      the value of zzrv is 0.  When a longjmp() is executed to
1408      undo the guess, the value of zzrv will be 1.
1409
1410      The macro zzUSER_GUESS_DONE_HOOK(guessSeq) is expanded
1411      as part of the zzGUESS_DONE macro.  This is executed
1412      whether the guess succeeds or fails as part of closing
1413      the guess.
1414
1415      The guessSeq is a sequence number which is assigned to each
1416      guess and is incremented by 1 for each guess which becomes
1417      active.  It is needed by the user to associate the start of
1418      a guess with the failure and/or completion (closing) of a
1419      guess.
1420
1421      Guesses are nested.  They must be closed in the reverse
1422      of the order that they are opened.
1423
1424      In order to free memory used by a variable during a guess
1425      a user must write a routine which can be called to
1426      register the variable along with the current guess sequence
1427      number provided by the zzUSER_GUESS_HOOK macro. If the guess
1428      fails, all variables tagged with the corresponding guess
1429      sequence number should be released.  This is ugly, but
1430      it would require a major rewrite of antlr 1.33 to use
1431      some mechanism other than setjmp()/longjmp().
1432
1433      The order of calls for a *successful* guess would be:
1434
1435        zzUSER_GUESS_HOOK(guessSeq,0);
1436        zzUSER_GUESS_DONE_HOOK(guessSeq);
1437
1438      The order of calls for a *failed* guess would be:
1439
1440        zzUSER_GUESS_HOOK(guessSeq,0);
1441        zzUSER_GUESS_HOOK(guessSeq,1);
1442        zzUSER_GUESS_DONE_HOOK(guessSeq);
1443
1444      The default definitions of these macros are empty strings.
1445
1446      Here is an example in C++ mode.  The zzUSER_GUESS_HOOK and
1447      zzUSER_GUESS_DONE_HOOK macros and myGuessHook() routine
1448      can be used without change in both C and C++ versions.
1449
1450      ----------------------------------------------------------------------
1451        <<
1452
1453        #include "AToken.h"
1454
1455        typedef ANTLRCommonToken ANTLRToken;
1456
1457        #include "DLGLexer.h"
1458
1459        int main() {
1460
1461          {
1462            DLGFileInput     in(stdin);
1463            DLGLexer         lexer(&in,2000);
1464            ANTLRTokenBuffer pipe(&lexer,1);
1465            ANTLRCommonToken aToken;
1466            P                parser(&pipe);
1467
1468            lexer.setToken(&aToken);
1469            parser.init();
1470            parser.start();
1471          };
1472
1473          fclose(stdin);
1474          fclose(stdout);
1475          return 0;
1476        }
1477
1478        >>
1479
1480        <<
1481        char *s=NULL;
1482
1483        #undef zzUSER_GUESS_HOOK
1484        #define zzUSER_GUESS_HOOK(guessSeq,zzrv) myGuessHook(guessSeq,zzrv);
1485        #undef zzUSER_GUESS_DONE_HOOK
1486        #define zzUSER_GUESS_DONE_HOOK(guessSeq)   myGuessHook(guessSeq,2);
1487
1488        void myGuessHook(int guessSeq,int zzrv) {
1489          if (zzrv == 0) {
1490            fprintf(stderr,"User hook: starting guess #%d\n",guessSeq);
1491          } else if (zzrv == 1) {
1492            free (s);
1493            s=NULL;
1494            fprintf(stderr,"User hook: failed guess #%d\n",guessSeq);
1495          } else if (zzrv == 2) {
1496            free (s);
1497            s=NULL;
1498            fprintf(stderr,"User hook: ending guess #%d\n",guessSeq);
1499          };
1500        }
1501
1502        >>
1503
1504        #token A    "a"
1505        #token      "[\t \ \n]"     <<skip();>>
1506
1507        class P {
1508
1509        start : (top)+
1510              ;
1511
1512        top   : (which) ?   <<fprintf(stderr,"%s is a which\n",s); free(s); s=NULL; >>
1513              | other       <<fprintf(stderr,"%s is an other\n",s); free(s); s=NULL; >>
1514              ; <<if (s != NULL) free(s); s=NULL; >>
1515
1516        which : which2
1517              ;
1518
1519        which2 : which3
1520              ;
1521        which3
1522              : (label)?         <<fprintf(stderr,"%s is a label\n",s);>>
1523              | (global)?        <<fprintf(stderr,"%s is a global\n",s);>>
1524              | (exclamation)?   <<fprintf(stderr,"%s is an exclamation\n",s);>>
1525              ;
1526
1527        label :       <<s=strdup(LT(1)->getText());>> A ":" ;
1528
1529        global :      <<s=strdup(LT(1)->getText());>> A "::" ;
1530
1531        exclamation : <<s=strdup(LT(1)->getText());>> A "!" ;
1532
1533        other :       <<s=strdup(LT(1)->getText());>> "other" ;
1534
1535        }
1536      ----------------------------------------------------------------------
1537
1538      This is a silly example, but illustrates the idea.  For the input
1539      "a ::" with tracing enabled the output begins:
1540
1541      ----------------------------------------------------------------------
1542        enter rule "start" depth 1
1543        enter rule "top" depth 2
1544        User hook: starting guess #1
1545        enter rule "which" depth 3 guessing
1546        enter rule "which2" depth 4 guessing
1547        enter rule "which3" depth 5 guessing
1548        User hook: starting guess #2
1549        enter rule "label" depth 6 guessing
1550        guess failed
1551        User hook: failed guess #2
1552        guess done - returning to rule "which3" at depth 5 (guess mode continues
1553                                                 - an enclosing guess is still active)
1554        User hook: ending guess #2
1555        User hook: starting guess #3
1556        enter rule "global" depth 6 guessing
1557        exit rule "global" depth 6 guessing
1558        guess done - returning to rule "which3" at depth 5 (guess mode continues
1559                                                 - an enclosing guess is still active)
1560        User hook: ending guess #3
1561        enter rule "global" depth 6 guessing
1562        exit rule "global" depth 6 guessing
1563        exit rule "which3" depth 5 guessing
1564        exit rule "which2" depth 4 guessing
1565        exit rule "which" depth 3 guessing
1566        guess done - returning to rule "top" at depth 2 (guess mode ends)
1567        User hook: ending guess #1
1568        enter rule "which" depth 3
1569        .....
1570      ----------------------------------------------------------------------
1571
1572      Remember:
1573
1574        (a) Only init-actions are executed during guess mode.
1575        (b) A rule can be invoked multiple times during guess mode.
1576        (c) If the guess succeeds the rule will be called once more
1577              without guess mode so that normal actions will be executed.
1578            This means that the init-action might need to distinguish
1579              between guess mode and non-guess mode using the variable
1580              [zz]guessing.
1581
1582#101. (Changed in 1.33MR10) antlr -info command line switch
1583
1584        -info
1585
1586            p   - extra predicate information in generated file
1587
1588            t   - information about tnode use:
1589                    at the end of each rule in generated file
1590                    summary on stderr at end of program
1591
1592            m   - monitor progress
1593                    prints name of each rule as it is started
1594                    flushes output at start of each rule
1595
1596            f   - first/follow set information to stdout
1597
1598            0   - no operation (added in 1.33MR11)
1599
1600      The options may be combined and may appear in any order.
1601      For example:
1602
1603        antlr -info ptm -CC -gt -mrhoist on mygrammar.g
1604
1605#100a. (Changed in 1.33MR10) Predicate tree simplification
1606
1607      When the same predicates can be referenced in more than one
1608      alternative of a block large predicate trees can be formed.
1609
1610      The difference that these optimizations make is so dramatic
1611      that I have decided to use it even when -mrhoist is not selected.
1612
1613      Consider the following grammar:
1614
1615        start : ( all )* ;
1616
1617        all   : a
1618              | d
1619              | e
1620              | f
1621              ;
1622
1623        a     : c A B
1624              | c A C
1625              ;
1626
1627        c     : <<AAA(LATEXT(2))>>?
1628              ;
1629
1630        d     : <<BBB(LATEXT(2))>>? B C
1631              ;
1632
1633        e     : <<CCC(LATEXT(2))>>? B C
1634              ;
1635
1636        f     : e X Y
1637              ;
1638
1639      In rule "a" there is a reference to rule "c" in both alternatives.
1640      The length of the predicate AAA is k=2 and it can be followed in
1641      alternative 1 only by (A B) while in alternative 2 it can be
1642      followed only by (A C).  Thus they do not have identical context.
1643
1644      In rule "all" the alternatives which refer to rules "e" and "f" allow
1645      elimination of the duplicate reference to predicate CCC.
1646
1647      The table below summarized the kind of simplification performed by
1648      1.33MR10.  In the table, X and Y stand for single predicates
1649      (not trees).
1650
1651        (OR X (OR Y (OR Z)))  => (OR X Y Z)
1652        (AND X (AND Y (AND Z)))  => (AND X Y Z)
1653
1654        (OR X  (... (OR  X Y) ... ))     => (OR X (... Y ... ))
1655        (AND X (... (AND X Y) ... ))     => (AND X (... Y ... ))
1656        (OR X  (... (AND X Y) ... ))     => (OR X (...  ... ))
1657        (AND X (... (OR  X Y) ... ))     => (AND X (...  ... ))
1658
1659        (AND X)               => X
1660        (OR X)                => X
1661
1662      In a test with a complex grammar for a real application, a predicate
1663      tree with six OR nodes and 12 leaves was reduced to "(OR X Y Z)".
1664
1665      In 1.33MR10 there is a greater effort to release memory used
1666      by predicates once they are no longer in use.
1667
1668#100b. (Changed in 1.33MR10) Suppression of extra predicate tests
1669
1670      The following optimizations require that -mrhoist be selected.
1671
1672      It is relatively easy to optimize the code generated for predicate
1673      gates when they are of the form:
1674
1675            (AND X Y Z ...)
1676        or  (OR  X Y Z ...)
1677
1678      where X, Y, Z, and "..." represent individual predicates (leaves) not
1679      predicate trees.
1680
1681      If the predicate is an AND the contexts of the X, Y, Z, etc. are
1682      ANDed together to create a single Tree context for the group and
1683      context tests for the individual predicates are suppressed:
1684
1685            --------------------------------------------------
1686            Note: This was incorrect.  The contexts should be
1687            ORed together.  This has been fixed.  A more
1688            complete description is available in item #152.
1689            ---------------------------------------------------
1690
1691      Optimization 1:  (AND X Y Z ...)
1692
1693        Suppose the context for Xtest is LA(1)==LP and the context for
1694        Ytest is LA(1)==LP && LA(2)==ID.
1695
1696            Without the optimization the code would resemble:
1697
1698                if (lookaheadContext &&
1699                    !(LA(1)==LP && LA(1)==LP && LA(2)==ID) ||
1700                        ( (! LA(1)==LP || Xtest) &&
1701                          (! (LA(1)==LP || LA(2)==ID) || Xtest)
1702                        )) {...
1703
1704            With the -mrhoist optimization the code would resemble:
1705
1706                if (lookaheadContext &&
1707                    ! (LA(1)==LP && LA(2)==ID) || (Xtest && Ytest) {...
1708
1709      Optimization 2: (OR X Y Z ...) with identical contexts
1710
1711        Suppose the context for Xtest is LA(1)==ID and for Ytest
1712        the context is also LA(1)==ID.
1713
1714            Without the optimization the code would resemble:
1715
1716                if (lookaheadContext &&
1717                    ! (LA(1)==ID || LA(1)==ID) ||
1718                        (LA(1)==ID && Xtest) ||
1719                        (LA(1)==ID && Ytest) {...
1720
1721            With the -mrhoist optimization the code would resemble:
1722
1723                if (lookaheadContext &&
1724                    (! LA(1)==ID) || (Xtest || Ytest) {...
1725
1726      Optimization 3: (OR X Y Z ...) with distinct contexts
1727
1728        Suppose the context for Xtest is LA(1)==ID and for Ytest
1729        the context is LA(1)==LP.
1730
1731            Without the optimization the code would resemble:
1732
1733                if (lookaheadContext &&
1734                    ! (LA(1)==ID || LA(1)==LP) ||
1735                        (LA(1)==ID && Xtest) ||
1736                        (LA(1)==LP && Ytest) {...
1737
1738            With the -mrhoist optimization the code would resemble:
1739
1740                if (lookaheadContext &&
1741                        (zzpf=0,
1742                            (LA(1)==ID && (zzpf=1) && Xtest) ||
1743                            (LA(1)==LP && (zzpf=1) && Ytest) ||
1744                            !zzpf) {
1745
1746            These may appear to be of similar complexity at first,
1747            but the non-optimized version contains two tests of each
1748            context while the optimized version contains only one
1749            such test, as well as eliminating some of the inverted
1750            logic (" !(...) || ").
1751
1752      Optimization 4: Computation of predicate gate trees
1753
1754        When generating code for the gates of predicate expressions
1755        antlr 1.33 vanilla uses a recursive procedure to generate
1756        "&&" and "||" expressions for testing the lookahead. As each
1757        layer of the predicate tree is exposed a new set of "&&" and
1758        "||" expressions on the lookahead are generated.  In many
1759        cases the lookahead being tested has already been tested.
1760
1761        With -mrhoist a lookahead tree is computed for the entire
1762        lookahead expression.  This means that predicates with identical
1763        context or context which is a subset of another predicate's
1764        context disappear.
1765
1766        This is especially important for predicates formed by rules
1767        like the following:
1768
1769            uppperCaseVowel : <<isUpperCase(LATEXT(1))>>?  vowel;
1770            vowel:          : <<isVowel(LATEXT(1))>>? LETTERS;
1771
1772        These predicates are combined using AND since both must be
1773        satisfied for rule upperCaseVowel.  They have identical
1774        context which makes this optimization very effective.
1775
1776      The affect of Items #100a and #100b together can be dramatic.  In
1777      a very large (but real world) grammar one particular predicate
1778      expression was reduced from an (unreadable) 50 predicate leaves,
1779      195 LA(1) terms, and 5500 characters to an (easily comprehensible)
1780      3 predicate leaves (all different) and a *single* LA(1) term.
1781
1782#98.  (Changed in 1.33MR10) Option "-info p"
1783
1784      When the user selects option "-info p" the program will generate
1785      detailed information about predicates.  If the user selects
1786      "-mrhoist on" additional detail will be provided explaining
1787      the promotion and suppression of predicates.  The output is part
1788      of the generated file and sandwiched between #if 0/#endif statements.
1789
1790      Consider the following k=1 grammar:
1791
1792        start : ( all ) * ;
1793
1794        all   : ( a
1795                | b
1796                )
1797                ;
1798
1799        a     : c B
1800              ;
1801
1802        c     : <<LATEXT(1)>>?
1803              | B
1804              ;
1805
1806        b     : <<LATEXT(1)>>? X
1807              ;
1808
1809      Below is an excerpt of the output for rule "start" for the three
1810      predicate options (off, on, and maintenance release style hoisting).
1811
1812      For those who do not wish to use the "-mrhoist on" option for code
1813      generation the option can be used in a "diagnostic" mode to provide
1814      valuable information:
1815
1816            a. where one should insert null actions to inhibit hoisting
1817            b. a chain of rule references which shows where predicates are
1818               being hoisted
1819
1820      ======================================================================
1821      Example of "-info p" with "-mrhoist on"
1822      ======================================================================
1823        #if 0
1824
1825        Hoisting of predicate suppressed by alternative without predicate.
1826        The alt without the predicate includes all cases where the
1827           predicate is false.
1828
1829           WITH predicate: line 11  v36.g
1830           WITHOUT predicate: line 12  v36.g
1831
1832        The context set for the predicate:
1833
1834             B
1835
1836        The lookahead set for alt WITHOUT the semantic predicate:
1837
1838             B
1839
1840        The predicate:
1841
1842          pred <<  LATEXT(1)>>?  depth=k=1  rule c  line 11  v36.g
1843
1844            set context:
1845               B
1846            tree context: null
1847
1848        Chain of referenced rules:
1849
1850            #0  in rule start (line 1 v36.g) to rule all
1851            #1  in rule all (line 3 v36.g) to rule a
1852            #2  in rule a (line 8 v36.g) to rule c
1853            #3  in rule c (line 11 v36.g)
1854
1855        #endif
1856        &&
1857        #if 0
1858
1859        pred <<  LATEXT(1)>>?  depth=k=1  rule b  line 15  v36.g
1860
1861          set context:
1862             X
1863          tree context: null
1864
1865        #endif
1866      ======================================================================
1867      Example of "-info p"  with the default -prc setting ( "-prc off")
1868      ======================================================================
1869        #if 0
1870
1871        OR
1872          pred <<  LATEXT(1)>>?  depth=k=1  rule c  line 11  v36.g
1873
1874            set context:
1875              nil
1876            tree context: null
1877
1878          pred <<  LATEXT(1)>>?  depth=k=1  rule b  line 15  v36.g
1879
1880            set context:
1881              nil
1882            tree context: null
1883
1884        #endif
1885      ======================================================================
1886      Example of "-info p" with "-prc on" and "-mrhoist off"
1887      ======================================================================
1888        #if 0
1889
1890        OR
1891          pred <<  LATEXT(1)>>?  depth=k=1  rule c  line 11  v36.g
1892
1893            set context:
1894               B
1895            tree context: null
1896
1897          pred <<  LATEXT(1)>>?  depth=k=1  rule b  line 15  v36.g
1898
1899            set context:
1900               X
1901            tree context: null
1902
1903        #endif
1904      ======================================================================
1905
1906#60.  (Changed in 1.33MR7) Major changes to exception handling
1907
1908        There were significant problems in the handling of exceptions
1909        in 1.33 vanilla.  The general problem is that it can only
1910        process one level of exception handler.  For example, a named
1911        exception handler, an exception handler for an alternative, or
1912        an exception for a subrule  always went to the rule's exception
1913        handler if there was no "catch" which matched the exception.
1914
1915        In 1.33MR7 the exception handlers properly "nest".  If an
1916        exception handler does not have a matching "catch" then the
1917        nextmost outer exception handler is checked for an appropriate
1918        "catch" clause, and so on until an exception handler with an
1919        appropriate "catch" is found.
1920
1921        There are still undesirable features in the way exception
1922        handlers are implemented, but I do not have time to fix them
1923        at the moment:
1924
1925            The exception handlers for alternatives are outside the
1926            block containing the alternative.  This makes it impossible
1927            to access variables declared in a block or to resume the
1928            parse by "falling through".  The parse can still be easily
1929            resumed in other ways, but not in the most natural fashion.
1930
1931            This results in an inconsistentcy between named exception
1932            handlers and exception handlers for alternatives.  When
1933            an exception handler for an alternative "falls through"
1934            it goes to the nextmost outer handler - not the "normal
1935            action".
1936
1937        A major difference between 1.33MR7 and 1.33 vanilla is
1938        the default action after an exception is caught:
1939
1940            1.33 Vanilla
1941            ------------
1942            In 1.33 vanilla the signal value is set to zero ("NoSignal")
1943            and the code drops through to the code following the exception.
1944            For named exception handlers this is the "normal action".
1945            For alternative exception handlers this is the rule's handler.
1946
1947            1.33MR7
1948            -------
1949            In 1.33MR7 the signal value is NOT automatically set to zero.
1950
1951            There are two cases:
1952
1953                For named exception handlers: if the signal value has been
1954                set to zero the code drops through to the "normal action".
1955
1956                For all other cases the code branches to the nextmost outer
1957                exception handler until it reaches the handler for the rule.
1958
1959        The following macros have been defined for convenience:
1960
1961            C/C++ Mode Name
1962            --------------------
1963            (zz)suppressSignal
1964                  set signal & return signal arg to 0 ("NoSignal")
1965            (zz)setSignal(intValue)
1966                  set signal & return signal arg to some value
1967            (zz)exportSignal
1968                  copy the signal value to the return signal arg
1969
1970        I'm not sure why PCCTS make a distinction between the local
1971        signal value and the return signal argument, but I'm loathe
1972        to change the code. The burden of copying the local signal
1973        value to the return signal argument can be given to the
1974        default signal handler, I suppose.
1975
1976#53.  (Explanation for 1.33MR6) What happens after an exception is caught ?
1977
1978        The Book is silent about what happens after an exception
1979        is caught.
1980
1981        The following code fragment prints "Error Action" followed
1982        by "Normal Action".
1983
1984        test : Word ex:Number <<printf("Normal Action\n");>>
1985                exception[ex]
1986                   catch NoViableAlt:
1987                        <<printf("Error Action\n");>>
1988        ;
1989
1990        The reason for "Normal Action" is that the normal flow of the
1991        program after a user-written exception handler is to "drop through".
1992        In the case of an exception handler for a rule this results in
1993        the exection of a "return" statement.  In the case of an
1994        exception handler attached to an alternative, rule, or token
1995        this is the code that would have executed had there been no
1996        exception.
1997
1998        The user can achieve the desired result by using a "return"
1999        statement.
2000
2001        test : Word ex:Number <<printf("Normal Action\n");>>
2002                exception[ex]
2003                   catch NoViableAlt:
2004                        <<printf("Error Action\n"); return;>>
2005        ;
2006
2007        The most powerful mechanism for recovery from parse errors
2008        in pccts is syntactic predicates because they provide
2009        backtracking.  Exceptions allow "return", "break",
2010        "consumeUntil(...)", "goto _handler", "goto _fail", and
2011        changing the _signal value.
2012
2013#41.  (Added in 1.33MR6) antlr -stdout
2014
2015        Using "antlr -stdout ..." forces the text that would
2016        normally go to the grammar.c or grammar.cpp file to
2017        stdout.
2018
2019#40.  (Added in 1.33MR6) antlr -tab to change tab stops
2020
2021        Using "antlr -tab number ..." changes the tab stops
2022        for the grammar.c or grammar.cpp file.  The number
2023        must be between 0 and 8.  Using 0 gives tab characters,
2024        values between 1 and 8 give the appropriate number of
2025        space characters.
2026
2027#34.  (Added to 1.33MR1) Add public DLGLexerBase::set_line(int newValue)
2028
2029        Previously there was no public function for changing the line
2030        number maintained by the lexer.
2031
2032#28.   (Added to 1.33MR1) More control over DLG header
2033
2034        Version 1.33MR1 adds the following directives to PCCTS
2035        for C++ mode:
2036
2037          #lexprefix  <<source code>>
2038
2039                Adds source code to the DLGLexer.h file
2040                after the #include "DLexerBase.h" but
2041                before the start of the class definition.
2042
2043          #lexmember  <<source code>>
2044
2045                Adds source code to the DLGLexer.h file
2046                as part of the DLGLexer class body.  It
2047                appears immediately after the start of
2048                the class and a "public: statement.
2049
2050