1 2 package java_cup.runtime; 3 4 import java.util.Stack; 5 6 /** This class implements a skeleton table driven LR parser. In general, 7 * LR parsers are a form of bottom up shift-reduce parsers. Shift-reduce 8 * parsers act by shifting input onto a parse stack until the symbols 9 * matching the right hand side of a production appear on the top of the 10 * stack. Once this occurs, a reduce is performed. This involves removing 11 * the symbols corresponding to the right hand side of the production 12 * (the so called "handle") and replacing them with the non-terminal from 13 * the left hand side of the production. <p> 14 * 15 * To control the decision of whether to shift or reduce at any given point, 16 * the parser uses a state machine (the "viable prefix recognition machine" 17 * built by the parser generator). The current state of the machine is placed 18 * on top of the parse stack (stored as part of a symbol object representing 19 * a terminal or non terminal). The parse action table is consulted 20 * (using the current state and the current lookahead token as indexes) to 21 * determine whether to shift or to reduce. When the parser shifts, it 22 * changes to a new state by pushing a new symbol (containing a new state) 23 * onto the stack. When the parser reduces, it pops the handle (right hand 24 * side of a production) off the stack. This leaves the parser in the state 25 * it was in before any of those symbols were matched. Next the reduce-goto 26 * table is consulted (using the new state and current lookahead token as 27 * indexes) to determine a new state to go to. The parser then shifts to 28 * this goto state by pushing the left hand side symbol of the production 29 * (also containing the new state) onto the stack.<p> 30 * 31 * This class actually provides four LR parsers. The methods parse() and 32 * debug_parse() provide two versions of the main parser (the only difference 33 * being that debug_parse() emits debugging trace messages as it parses). 34 * In addition to these main parsers, the error recovery mechanism uses two 35 * more. One of these is used to simulate "parsing ahead" in the input 36 * without carrying out actions (to verify that a potential error recovery 37 * has worked), and the other is used to parse through buffered "parse ahead" 38 * input in order to execute all actions and re-synchronize the actual parser 39 * configuration.<p> 40 * 41 * This is an abstract class which is normally filled out by a subclass 42 * generated by the JavaCup parser generator. In addition to supplying 43 * the actual parse tables, generated code also supplies methods which 44 * invoke various pieces of user supplied code, provide access to certain 45 * special symbols (e.g., EOF and error), etc. Specifically, the following 46 * abstract methods are normally supplied by generated code: 47 * <dl compact> 48 * <dt> short[][] production_table() 49 * <dd> Provides a reference to the production table (indicating the index of 50 * the left hand side non terminal and the length of the right hand side 51 * for each production in the grammar). 52 * <dt> short[][] action_table() 53 * <dd> Provides a reference to the parse action table. 54 * <dt> short[][] reduce_table() 55 * <dd> Provides a reference to the reduce-goto table. 56 * <dt> int start_state() 57 * <dd> Indicates the index of the start state. 58 * <dt> int start_production() 59 * <dd> Indicates the index of the starting production. 60 * <dt> int EOF_sym() 61 * <dd> Indicates the index of the EOF symbol. 62 * <dt> int error_sym() 63 * <dd> Indicates the index of the error symbol. 64 * <dt> symbol do_action() 65 * <dd> Executes a piece of user supplied action code. This always comes at 66 * the point of a reduce in the parse, so this code also allocates and 67 * fills in the left hand side non terminal symbol object that is to be 68 * pushed onto the stack for the reduce. 69 * <dt> void init_actions() 70 * <dd> Code to initialize a special object that encapsulates user supplied 71 * actions (this object is used by do_action() to actually carry out the 72 * actions). 73 * <dt> token scan() 74 * <dd> Used to get the next input token from the scanner. 75 * </dl> 76 * 77 * In addition to these routines that <i>must</i> be supplied by the 78 * generated subclass there are also a series of routines that <i>may</i> 79 * be supplied. These include: 80 * <dl> 81 * <dt> int error_sync_size() 82 * <dd> This determines how many tokens past the point of an error 83 * must be parsed without error in order to consider a recovery to 84 * be valid. This defaults to 3. Values less than 2 are not 85 * recommended. 86 * <dt> void report_error(String message, Object info) 87 * <dd> This method is called to report an error. The default implementation 88 * simply prints a message to System.err and ignores its second parameter. 89 * This method is often replaced in order to provide a more sophisticated 90 * error reporting mechanism. 91 * <dt> void report_fatal_error(String message, Object info) 92 * <dd> This method is called when a fatal error that cannot be recovered from 93 * is encountered. In the default implementation, it calls 94 * report_error() to emit a message, then throws an exception. 95 * <dt> void syntax_error(token cur_token) 96 * <dd> This method is called as soon as syntax error is detected (but 97 * before recovery is attempted). In the default implementation it 98 * invokes: report_error("Syntax error", null); 99 * <dt> void unrecovered_syntax_error(token cur_token) 100 * <dd> This method is called if syntax error recovery fails. In the default 101 * implementation it invokes:<br> 102 * report_fatal_error("Couldn't repair and continue parse", null); 103 * </dl> 104 * 105 * @see java_cup.runtime.symbol 106 * @see java_cup.runtime.token 107 * @see java_cup.runtime.virtual_parse_stack 108 * @version last updated: 11/25/95 109 * @author Scott Hudson 110 */ 111 112 public abstract class lr_parser { 113 114 /*-----------------------------------------------------------*/ 115 /*--- Constructor(s) ----------------------------------------*/ 116 /*-----------------------------------------------------------*/ 117 118 /** Simple constructor. */ lr_parser()119 public lr_parser() 120 { 121 /* nothing to do here */ 122 } 123 124 /*-----------------------------------------------------------*/ 125 /*--- (Access to) Static (Class) Variables ------------------*/ 126 /*-----------------------------------------------------------*/ 127 128 /** The default number of tokens after an error we much match to consider 129 * it recovered from. 130 */ 131 protected final static int _error_sync_size = 3; 132 133 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 134 135 /** The number of tokens after an error we much match to consider it 136 * recovered from. 137 */ error_sync_size()138 protected int error_sync_size() {return _error_sync_size; } 139 140 /*-----------------------------------------------------------*/ 141 /*--- (Access to) Instance Variables ------------------------*/ 142 /*-----------------------------------------------------------*/ 143 144 /** Table of production information (supplied by generated subclass). 145 * This table contains one entry per production and is indexed by 146 * the negative-encoded values (reduce actions) in the action_table. 147 * Each entry has two parts, the index of the non-terminal on the 148 * left hand side of the production, and the number of symbols 149 * on the right hand side. 150 */ production_table()151 public abstract short[][] production_table(); 152 153 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 154 155 /** The action table (supplied by generated subclass). This table is 156 * indexed by state and terminal number indicating what action is to 157 * be taken when the parser is in the given state (i.e., the given state 158 * is on top of the stack) and the given terminal is next on the input. 159 * States are indexed using the first dimension, however, the entries for 160 * a given state are compacted and stored in adjacent index, value pairs 161 * which are searched for rather than accessed directly (see get_action()). 162 * The actions stored in the table will be either shifts, reduces, or 163 * errors. Shifts are encoded as positive values (one greater than the 164 * state shifted to). Reduces are encoded as negative values (one less 165 * than the production reduced by). Error entries are denoted by zero. 166 * 167 * @see java_cup.runtime.lr_parser#get_action 168 */ action_table()169 public abstract short[][] action_table(); 170 171 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 172 173 /** The reduce-goto table (supplied by generated subclass). This 174 * table is indexed by state and non-terminal number and contains 175 * state numbers. States are indexed using the first dimension, however, 176 * the entries for a given state are compacted and stored in adjacent 177 * index, value pairs which are searched for rather than accessed 178 * directly (see get_reduce()). When a reduce occurs, the handle 179 * (corresponding to the RHS of the matched production) is popped off 180 * the stack. The new top of stack indicates a state. This table is 181 * then indexed by that state and the LHS of the reducing production to 182 * indicate where to "shift" to. 183 * 184 * @see java_cup.runtime.lr_parser#get_reduce 185 */ reduce_table()186 public abstract short[][] reduce_table(); 187 188 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 189 190 /** The index of the start state (supplied by generated subclass). */ start_state()191 public abstract int start_state(); 192 193 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 194 195 /** The index of the start production (supplied by generated subclass). */ start_production()196 public abstract int start_production(); 197 198 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 199 200 /** The index of the end of file terminal symbol (supplied by generated 201 * subclass). 202 */ EOF_sym()203 public abstract int EOF_sym(); 204 205 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 206 207 /** The index of the special error symbol (supplied by generated subclass). */ error_sym()208 public abstract int error_sym(); 209 210 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 211 212 /** Internal flag to indicate when parser should quit. */ 213 protected boolean _done_parsing = false; 214 215 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 216 217 /** This method is called to indicate that the parser should quit. This is 218 * normally called by an accept action, but can be used to cancel parsing 219 * early in other circumstances if desired. 220 */ done_parsing()221 public void done_parsing() 222 { 223 _done_parsing = true; 224 } 225 226 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 227 /* Global parse state shared by parse(), error recovery, and 228 * debugging routines */ 229 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 230 231 /** Indication of the index for top of stack (for use by actions). */ 232 protected int tos; 233 234 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 235 236 /** The current lookahead token. */ 237 protected token cur_token; 238 239 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 240 241 /** The parse stack itself. */ 242 protected Stack stack = new Stack(); 243 244 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 245 246 /** Direct reference to the production table. */ 247 protected short[][] production_tab; 248 249 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 250 251 /** Direct reference to the action table. */ 252 protected short[][] action_tab; 253 254 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 255 256 /** Direct reference to the reduce-goto table. */ 257 protected short[][] reduce_tab; 258 259 /*-----------------------------------------------------------*/ 260 /*--- General Methods ---------------------------------------*/ 261 /*-----------------------------------------------------------*/ 262 263 /** Perform a bit of user supplied action code (supplied by generated 264 * subclass). Actions are indexed by an internal action number assigned 265 * at parser generation time. 266 * 267 * @param act_num the internal index of the action to be performed. 268 * @param parser the parser object we are acting for. 269 * @param stack the parse stack of that object. 270 * @param top the index of the top element of the parse stack. 271 */ do_action( int act_num, lr_parser parser, Stack stack, int top)272 public abstract symbol do_action( 273 int act_num, 274 lr_parser parser, 275 Stack stack, 276 int top) 277 throws java.lang.Exception; 278 279 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 280 281 /** User code for initialization inside the parser. Typically this 282 * initializes the scanner. This is called before the parser requests 283 * the first token. Here this is just a placeholder for subclasses that 284 * might need this and we perform no action. This method is normally 285 * overridden by the generated code using this contents of the "init with" 286 * clause as its body. 287 */ user_init()288 public void user_init() throws java.lang.Exception { } 289 290 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 291 292 /** Initialize the action object. This is called before the parser does 293 * any parse actions. This is filled in by generated code to create 294 * an object that encapsulates all action code. 295 */ init_actions()296 protected abstract void init_actions() throws java.lang.Exception; 297 298 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 299 300 /** Get the next token from the input (supplied by generated subclass). 301 * Once end of file has been reached, all subsequent calls to scan 302 * should return an EOF token (which is symbol number 0). This method 303 * is supplied by the generator using using the code declared in the 304 * "scan with" clause. 305 */ scan()306 public abstract token scan() throws java.lang.Exception; 307 308 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 309 310 /** Report a fatal error. This method takes a message string and an 311 * additional object (to be used by specializations implemented in 312 * subclasses). Here in the base class a very simple implementation 313 * is provided which reports the error then throws an exception. 314 * 315 * @param message an error message. 316 * @param info an extra object reserved for use by specialized subclasses. 317 */ report_fatal_error( String message, Object info)318 public void report_fatal_error( 319 String message, 320 Object info) 321 throws java.lang.Exception 322 { 323 /* stop parsing (not really necessary since we throw an exception, but) */ 324 done_parsing(); 325 326 /* use the normal error message reporting to put out the message */ 327 report_error(message, info); 328 329 /* throw an exception */ 330 throw new Exception("Can't recover from previous error(s)"); 331 } 332 333 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 334 335 /** Report a non fatal error (or warning). This method takes a message 336 * string and an additional object (to be used by specializations 337 * implemented in subclasses). Here in the base class a very simple 338 * implementation is provided which simply prints the message to 339 * System.err. 340 * 341 * @param message an error message. 342 * @param info an extra object reserved for use by specialized subclasses. 343 */ report_error(String message, Object info)344 public void report_error(String message, Object info) 345 { 346 System.err.println(message); 347 } 348 349 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 350 351 /** This method is called when a syntax error has been detected and recovery 352 * is about to be invoked. Here in the base class we just emit a 353 * "Syntax error" error message. 354 * 355 * @param cur_token the current lookahead token. 356 */ syntax_error(token cur_token)357 public void syntax_error(token cur_token) 358 { 359 report_error("Syntax error", null); 360 } 361 362 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 363 364 /** This method is called if it is determined that syntax error recovery 365 * has been unsuccessful. Here in the base class we report a fatal error. 366 * 367 * @param cur_token the current lookahead token. 368 */ unrecovered_syntax_error(token cur_token)369 public void unrecovered_syntax_error(token cur_token) 370 throws java.lang.Exception 371 { 372 report_fatal_error("Couldn't repair and continue parse", null); 373 } 374 375 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 376 377 /** Fetch an action from the action table. The table is broken up into 378 * rows, one per state (rows are indexed directly by state number). 379 * Within each row, a list of index, value pairs are given (as sequential 380 * entries in the table), and the list is terminated by a default entry 381 * (denoted with a symbol index of -1). To find the proper entry in a row 382 * we do a linear or binary search (depending on the size of the row). 383 * 384 * @param state the state index of the action being accessed. 385 * @param sym the symbol index of the action being accessed. 386 */ get_action(int state, int sym)387 protected final short get_action(int state, int sym) 388 { 389 short tag; 390 int first, last, probe; 391 short[] row = action_tab[state]; 392 393 /* linear search if we are < 10 entries */ 394 if (row.length < 20) 395 for (probe = 0; probe < row.length; probe++) 396 { 397 /* is this entry labeled with our symbol or the default? */ 398 tag = row[probe++]; 399 if (tag == sym || tag == -1) 400 { 401 /* return the next entry */ 402 return row[probe]; 403 } 404 } 405 /* otherwise binary search */ 406 else 407 { 408 first = 0; 409 last = (row.length-1)/2 - 1; /* leave out trailing default entry */ 410 while (first <= last) 411 { 412 probe = (first+last)/2; 413 if (sym == row[probe*2]) 414 return row[probe*2+1]; 415 else if (sym > row[probe*2]) 416 first = probe+1; 417 else 418 last = probe-1; 419 } 420 421 /* not found, use the default at the end */ 422 return row[row.length-1]; 423 } 424 425 /* shouldn't happened, but if we run off the end we return the 426 default (error == 0) */ 427 return 0; 428 } 429 430 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 431 432 /** Fetch a state from the reduce-goto table. The table is broken up into 433 * rows, one per state (rows are indexed directly by state number). 434 * Within each row, a list of index, value pairs are given (as sequential 435 * entries in the table), and the list is terminated by a default entry 436 * (denoted with a symbol index of -1). To find the proper entry in a row 437 * we do a linear search. 438 * 439 * @param state the state index of the entry being accessed. 440 * @param sym the symbol index of the entry being accessed. 441 */ get_reduce(int state, int sym)442 protected final short get_reduce(int state, int sym) 443 { 444 short tag; 445 short[] row = reduce_tab[state]; 446 447 /* if we have a null row we go with the default */ 448 if (row == null) 449 return -1; 450 451 for (int probe = 0; probe < row.length; probe++) 452 { 453 /* is this entry labeled with our symbol or the default? */ 454 tag = row[probe++]; 455 if (tag == sym || tag == -1) 456 { 457 /* return the next entry */ 458 return row[probe]; 459 } 460 } 461 /* if we run off the end we return the default (error == -1) */ 462 return -1; 463 } 464 465 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 466 467 /** This method provides the main parsing routine. It returns only when 468 * done_parsing() has been called (typically because the parser has 469 * accepted, or a fatal error has been reported). See the header 470 * documentation for the class regarding how shift/reduce parsers operate 471 * and how the various tables are used. 472 */ parse()473 public void parse() throws java.lang.Exception 474 { 475 /* the current action code */ 476 int act; 477 478 /* the symbol/stack element returned by a reduce */ 479 symbol lhs_sym; 480 481 /* information about production being reduced with */ 482 short handle_size, lhs_sym_num; 483 484 /* set up direct reference to tables to drive the parser */ 485 486 production_tab = production_table(); 487 action_tab = action_table(); 488 reduce_tab = reduce_table(); 489 490 /* initialize the action encapsulation object */ 491 init_actions(); 492 493 /* do user initialization */ 494 user_init(); 495 496 /* get the first token */ 497 cur_token = scan(); 498 499 /* push dummy symbol with start state to get us underway */ 500 stack.push(new symbol(0, start_state())); 501 tos = 0; 502 503 /* continue until we are told to stop */ 504 for (_done_parsing = false; !_done_parsing; ) 505 { 506 /* current state is always on the top of the stack */ 507 508 /* look up action out of the current state with the current input */ 509 act = get_action(((symbol)stack.peek()).parse_state, cur_token.sym); 510 511 /* decode the action -- > 0 encodes shift */ 512 if (act > 0) 513 { 514 /* shift to the encoded state by pushing it on the stack */ 515 cur_token.parse_state = act-1; 516 stack.push(cur_token); 517 tos++; 518 519 /* advance to the next token */ 520 cur_token = scan(); 521 } 522 /* if its less than zero, then it encodes a reduce action */ 523 else if (act < 0) 524 { 525 /* perform the action for the reduce */ 526 lhs_sym = do_action((-act)-1, this, stack, tos); 527 528 /* look up information about the production */ 529 lhs_sym_num = production_tab[(-act)-1][0]; 530 handle_size = production_tab[(-act)-1][1]; 531 532 /* pop the handle off the stack */ 533 for (int i = 0; i < handle_size; i++) 534 { 535 stack.pop(); 536 tos--; 537 } 538 539 /* look up the state to go to from the one popped back to */ 540 act = get_reduce(((symbol)stack.peek()).parse_state, lhs_sym_num); 541 542 /* shift to that state */ 543 lhs_sym.parse_state = act; 544 stack.push(lhs_sym); 545 tos++; 546 } 547 /* finally if the entry is zero, we have an error */ 548 else if (act == 0) 549 { 550 /* call user syntax error reporting routine */ 551 syntax_error(cur_token); 552 553 /* try to error recover */ 554 if (!error_recovery(false)) 555 { 556 /* if that fails give up with a fatal syntax error */ 557 unrecovered_syntax_error(cur_token); 558 559 /* just in case that wasn't fatal enough, end parse */ 560 done_parsing(); 561 } 562 } 563 } 564 } 565 566 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 567 568 /** Write a debugging message to System.err for the debugging version 569 * of the parser. 570 * 571 * @param mess the text of the debugging message. 572 */ debug_message(String mess)573 public void debug_message(String mess) 574 { 575 System.err.println(mess); 576 } 577 578 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 579 580 /** Dump the parse stack for debugging purposes. */ dump_stack()581 public void dump_stack() 582 { 583 if (stack == null) 584 { 585 debug_message("# Stack dump requested, but stack is null"); 586 return; 587 } 588 589 debug_message("============ Parse Stack Dump ============"); 590 591 /* dump the stack */ 592 for (int i=0; i<stack.size(); i++) 593 { 594 debug_message("Symbol: " + ((symbol)stack.elementAt(i)).sym + 595 " State: " + ((symbol)stack.elementAt(i)).parse_state); 596 } 597 debug_message("=========================================="); 598 } 599 600 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 601 602 /** Do debug output for a reduce. 603 * 604 * @param prod_num the production we are reducing with. 605 * @param nt_num the index of the LHS non terminal. 606 * @param rhs_size the size of the RHS. 607 */ debug_reduce(int prod_num, int nt_num, int rhs_size)608 public void debug_reduce(int prod_num, int nt_num, int rhs_size) 609 { 610 debug_message("# Reduce with prod #" + prod_num + " [NT=" + nt_num + 611 ", " + "SZ=" + rhs_size + "]"); 612 } 613 614 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 615 616 /** Do debug output for shift. 617 * 618 * @param shift_tkn the token being shifted onto the stack. 619 */ debug_shift(token shift_tkn)620 public void debug_shift(token shift_tkn) 621 { 622 debug_message("# Shift under term #" + shift_tkn.sym + 623 " to state #" + shift_tkn.parse_state); 624 } 625 626 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 627 628 /** Perform a parse with debugging output. This does exactly the 629 * same things as parse(), except that it calls debug_shift() and 630 * debug_reduce() when shift and reduce moves are taken by the parser 631 * and produces various other debugging messages. 632 */ debug_parse()633 public void debug_parse() 634 throws java.lang.Exception 635 { 636 /* the current action code */ 637 int act; 638 639 /* the symbol/stack element returned by a reduce */ 640 symbol lhs_sym; 641 642 /* information about production being reduced with */ 643 short handle_size, lhs_sym_num; 644 645 /* set up direct reference to tables to drive the parser */ 646 production_tab = production_table(); 647 action_tab = action_table(); 648 reduce_tab = reduce_table(); 649 650 debug_message("# Initializing parser"); 651 652 /* initialize the action encapsulation object */ 653 init_actions(); 654 655 /* do user initialization */ 656 user_init(); 657 658 /* the current token */ 659 cur_token = scan(); 660 661 debug_message("# Current token is #" + cur_token.sym); 662 663 /* push dummy symbol with start state to get us underway */ 664 stack.push(new symbol(0, start_state())); 665 tos = 0; 666 667 /* continue until we are told to stop */ 668 for (_done_parsing = false; !_done_parsing; ) 669 { 670 /* current state is always on the top of the stack */ 671 672 /* look up action out of the current state with the current input */ 673 act = get_action(((symbol)stack.peek()).parse_state, cur_token.sym); 674 675 /* decode the action -- > 0 encodes shift */ 676 if (act > 0) 677 { 678 /* shift to the encoded state by pushing it on the stack */ 679 cur_token.parse_state = act-1; 680 debug_shift(cur_token); 681 stack.push(cur_token); 682 tos++; 683 684 /* advance to the next token */ 685 cur_token = scan(); 686 debug_message("# Current token is #" + cur_token.sym); 687 } 688 /* if its less than zero, then it encodes a reduce action */ 689 else if (act < 0) 690 { 691 /* perform the action for the reduce */ 692 lhs_sym = do_action((-act)-1, this, stack, tos); 693 694 /* look up information about the production */ 695 lhs_sym_num = production_tab[(-act)-1][0]; 696 handle_size = production_tab[(-act)-1][1]; 697 698 debug_reduce((-act)-1, lhs_sym_num, handle_size); 699 700 /* pop the handle off the stack */ 701 for (int i = 0; i < handle_size; i++) 702 { 703 stack.pop(); 704 tos--; 705 } 706 707 /* look up the state to go to from the one popped back to */ 708 act = get_reduce(((symbol)stack.peek()).parse_state, lhs_sym_num); 709 710 /* shift to that state */ 711 lhs_sym.parse_state = act; 712 stack.push(lhs_sym); 713 tos++; 714 715 debug_message("# Goto state #" + act); 716 } 717 /* finally if the entry is zero, we have an error */ 718 else if (act == 0) 719 { 720 /* call user syntax error reporting routine */ 721 syntax_error(cur_token); 722 723 /* try to error recover */ 724 if (!error_recovery(true)) 725 { 726 /* if that fails give up with a fatal syntax error */ 727 unrecovered_syntax_error(cur_token); 728 729 /* just in case that wasn't fatal enough, end parse */ 730 done_parsing(); 731 } 732 } 733 } 734 } 735 736 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 737 /* Error recovery code */ 738 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 739 740 /** Attempt to recover from a syntax error. This returns false if recovery 741 * fails, true if it succeeds. Recovery happens in 4 steps. First we 742 * pop the parse stack down to a point at which we have a shift out 743 * of the top-most state on the error symbol. This represents the 744 * initial error recovery configuration. If no such configuration is 745 * found, then we fail. Next a small number of "lookahead" or "parse 746 * ahead" tokens are read into a buffer. The size of this buffer is 747 * determined by error_sync_size() and determines how many tokens beyond 748 * the error must be matched to consider the recovery a success. Next, 749 * we begin to discard tokens in attempt to get past the point of error 750 * to a point where we can continue parsing. After each token, we attempt 751 * to "parse ahead" though the buffered lookahead tokens. The "parse ahead" 752 * process simulates that actual parse, but does not modify the real 753 * parser's configuration, nor execute any actions. If we can parse all 754 * the stored tokens without error, then the recovery is considered a 755 * success. Once a successful recovery point is determined, we do an 756 * actual parse over the stored input -- modifying the real parse 757 * configuration and executing all actions. Finally, we return the the 758 * normal parser to continue with the overall parse. 759 * 760 * @param debug should we produce debugging messages as we parse. 761 */ error_recovery(boolean debug)762 protected boolean error_recovery(boolean debug) 763 throws java.lang.Exception 764 { 765 if (debug) debug_message("# Attempting error recovery"); 766 767 /* first pop the stack back into a state that can shift on error and 768 do that shift (if that fails, we fail) */ 769 if (!find_recovery_config(debug)) 770 { 771 if (debug) debug_message("# Error recovery fails"); 772 return false; 773 } 774 775 /* read ahead to create lookahead we can parse multiple times */ 776 read_lookahead(); 777 778 /* repeatedly try to parse forward until we make it the required dist */ 779 for (;;) 780 { 781 /* try to parse forward, if it makes it, bail out of loop */ 782 if (debug) debug_message("# Trying to parse ahead"); 783 if (try_parse_ahead(debug)) 784 { 785 break; 786 } 787 788 /* if we are now at EOF, we have failed */ 789 if (lookahead[0].sym == EOF_sym()) 790 { 791 if (debug) debug_message("# Error recovery fails at EOF"); 792 return false; 793 } 794 795 /* otherwise, we consume another token and try again */ 796 if (debug) 797 debug_message("# Consuming token #" + cur_err_token().sym); 798 restart_lookahead(); 799 } 800 801 /* we have consumed to a point where we can parse forward */ 802 if (debug) debug_message("# Parse-ahead ok, going back to normal parse"); 803 804 /* do the real parse (including actions) across the lookahead */ 805 parse_lookahead(debug); 806 807 /* we have success */ 808 return true; 809 } 810 811 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 812 813 /** Determine if we can shift under the special error symbol out of the 814 * state currently on the top of the (real) parse stack. 815 */ shift_under_error()816 protected boolean shift_under_error() 817 { 818 /* is there a shift under error symbol */ 819 return get_action(((symbol)stack.peek()).parse_state, error_sym()) > 0; 820 } 821 822 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 823 824 /** Put the (real) parse stack into error recovery configuration by 825 * popping the stack down to a state that can shift on the special 826 * error symbol, then doing the shift. If no suitable state exists on 827 * the stack we return false 828 * 829 * @param debug should we produce debugging messages as we parse. 830 */ find_recovery_config(boolean debug)831 protected boolean find_recovery_config(boolean debug) 832 { 833 token error_token; 834 int act; 835 836 if (debug) debug_message("# Finding recovery state on stack"); 837 838 /* pop down until we can shift under error token */ 839 while (!shift_under_error()) 840 { 841 /* pop the stack */ 842 if (debug) 843 debug_message("# Pop stack by one, state was # " + 844 ((symbol)stack.peek()).parse_state); 845 stack.pop(); 846 tos--; 847 848 /* if we have hit bottom, we fail */ 849 if (stack.empty()) 850 { 851 if (debug) debug_message("# No recovery state found on stack"); 852 return false; 853 } 854 } 855 856 /* state on top of the stack can shift under error, find the shift */ 857 act = get_action(((symbol)stack.peek()).parse_state, error_sym()); 858 if (debug) 859 { 860 debug_message("# Recover state found (#" + 861 ((symbol)stack.peek()).parse_state + ")"); 862 debug_message("# Shifting on error to state #" + (act-1)); 863 } 864 865 /* build and shift a special error token */ 866 error_token = new token(error_sym()); 867 error_token.parse_state = act-1; 868 stack.push(error_token); 869 tos++; 870 871 return true; 872 } 873 874 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 875 876 /** Lookahead tokens used for attempting error recovery "parse aheads". */ 877 protected token lookahead[]; 878 879 /** Position in lookahead input buffer used for "parse ahead". */ 880 protected int lookahead_pos; 881 882 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 883 884 /** Read from input to establish our buffer of "parse ahead" lookahead 885 * symbols. 886 */ read_lookahead()887 protected void read_lookahead() throws java.lang.Exception 888 { 889 /* create the lookahead array */ 890 lookahead = new token[error_sync_size()]; 891 892 /* fill in the array */ 893 for (int i = 0; i < error_sync_size(); i++) 894 { 895 lookahead[i] = cur_token; 896 cur_token = scan(); 897 } 898 899 /* start at the beginning */ 900 lookahead_pos = 0; 901 } 902 903 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 904 905 /** Return the current lookahead in our error "parse ahead" buffer. */ cur_err_token()906 protected token cur_err_token() { return lookahead[lookahead_pos]; } 907 908 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 909 910 /** Advance to next "parse ahead" input symbol. Return true if we have 911 * input to advance to, false otherwise. 912 */ advance_lookahead()913 protected boolean advance_lookahead() 914 { 915 /* advance the input location */ 916 lookahead_pos++; 917 918 /* return true if we didn't go off the end */ 919 return lookahead_pos < error_sync_size(); 920 } 921 922 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 923 924 /** Reset the parse ahead input to one token past where we started error 925 * recovery (this consumes one new token from the real input). 926 */ restart_lookahead()927 protected void restart_lookahead() throws java.lang.Exception 928 { 929 /* move all the existing input over */ 930 for (int i = 1; i < error_sync_size(); i++) 931 lookahead[i-1] = lookahead[i]; 932 933 /* read a new token into the last spot */ 934 cur_token = scan(); 935 lookahead[error_sync_size()-1] = cur_token; 936 937 /* reset our internal position marker */ 938 lookahead_pos = 0; 939 } 940 941 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 942 943 /** Do a simulated parse forward (a "parse ahead") from the current 944 * stack configuration using stored lookahead input and a virtual parse 945 * stack. Return true if we make it all the way through the stored 946 * lookahead input without error. This basically simulates the action of 947 * parse() using only our saved "parse ahead" input, and not executing any 948 * actions. 949 * 950 * @param debug should we produce debugging messages as we parse. 951 */ try_parse_ahead(boolean debug)952 protected boolean try_parse_ahead(boolean debug) 953 throws java.lang.Exception 954 { 955 int act; 956 short lhs, rhs_size; 957 958 /* create a virtual stack from the real parse stack */ 959 virtual_parse_stack vstack = new virtual_parse_stack(stack); 960 961 /* parse until we fail or get past the lookahead input */ 962 for (;;) 963 { 964 /* look up the action from the current state (on top of stack) */ 965 act = get_action(vstack.top(), cur_err_token().sym); 966 967 /* if its an error, we fail */ 968 if (act == 0) return false; 969 970 /* > 0 encodes a shift */ 971 if (act > 0) 972 { 973 /* push the new state on the stack */ 974 vstack.push(act-1); 975 976 if (debug) debug_message("# Parse-ahead shifts token #" + 977 cur_err_token().sym + " into state #" + (act-1)); 978 979 /* advance simulated input, if we run off the end, we are done */ 980 if (!advance_lookahead()) return true; 981 } 982 /* < 0 encodes a reduce */ 983 else 984 { 985 /* if this is a reduce with the start production we are done */ 986 if ((-act)-1 == start_production()) 987 { 988 if (debug) debug_message("# Parse-ahead accepts"); 989 return true; 990 } 991 992 /* get the lhs symbol and the rhs size */ 993 lhs = production_tab[(-act)-1][0]; 994 rhs_size = production_tab[(-act)-1][1]; 995 996 /* pop handle off the stack */ 997 for (int i = 0; i < rhs_size; i++) 998 vstack.pop(); 999 1000 if (debug) 1001 debug_message("# Parse-ahead reduces: handle size = " + 1002 rhs_size + " lhs = #" + lhs + " from state #" + vstack.top()); 1003 1004 /* look up goto and push it onto the stack */ 1005 vstack.push(get_reduce(vstack.top(), lhs)); 1006 if (debug) 1007 debug_message("# Goto state #" + vstack.top()); 1008 } 1009 } 1010 } 1011 1012 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 1013 1014 /** Parse forward using stored lookahead symbols. In this case we have 1015 * already verified that parsing will make it through the stored lookahead 1016 * symbols and we are now getting back to the point at which we can hand 1017 * control back to the normal parser. Consequently, this version of the 1018 * parser performs all actions and modifies the real parse configuration. 1019 * This returns once we have consumed all the stored input or we accept. 1020 * 1021 * @param debug should we produce debugging messages as we parse. 1022 */ parse_lookahead(boolean debug)1023 protected void parse_lookahead(boolean debug) 1024 throws java.lang.Exception 1025 { 1026 /* the current action code */ 1027 int act; 1028 1029 /* the symbol/stack element returned by a reduce */ 1030 symbol lhs_sym; 1031 1032 /* information about production being reduced with */ 1033 short handle_size, lhs_sym_num; 1034 1035 /* restart the saved input at the beginning */ 1036 lookahead_pos = 0; 1037 1038 if (debug) 1039 { 1040 debug_message("# Reparsing saved input with actions"); 1041 debug_message("# Current token is #" + cur_err_token().sym); 1042 debug_message("# Current state is #" + 1043 ((symbol)stack.peek()).parse_state); 1044 } 1045 1046 /* continue until we accept or have read all lookahead input */ 1047 while(!_done_parsing) 1048 { 1049 /* current state is always on the top of the stack */ 1050 1051 /* look up action out of the current state with the current input */ 1052 act = 1053 get_action(((symbol)stack.peek()).parse_state, cur_err_token().sym); 1054 1055 /* decode the action -- > 0 encodes shift */ 1056 if (act > 0) 1057 { 1058 /* shift to the encoded state by pushing it on the stack */ 1059 cur_err_token().parse_state = act-1; 1060 if (debug) debug_shift(cur_err_token()); 1061 stack.push(cur_err_token()); 1062 tos++; 1063 1064 /* advance to the next token, if there is none, we are done */ 1065 if (!advance_lookahead()) 1066 { 1067 if (debug) debug_message("# Completed reparse"); 1068 1069 /* scan next token so we can continue parse */ 1070 cur_token = scan(); 1071 1072 /* go back to normal parser */ 1073 return; 1074 } 1075 1076 if (debug) 1077 debug_message("# Current token is #" + cur_err_token().sym); 1078 } 1079 /* if its less than zero, then it encodes a reduce action */ 1080 else if (act < 0) 1081 { 1082 /* perform the action for the reduce */ 1083 lhs_sym = do_action((-act)-1, this, stack, tos); 1084 1085 /* look up information about the production */ 1086 lhs_sym_num = production_tab[(-act)-1][0]; 1087 handle_size = production_tab[(-act)-1][1]; 1088 1089 if (debug) debug_reduce((-act)-1, lhs_sym_num, handle_size); 1090 1091 /* pop the handle off the stack */ 1092 for (int i = 0; i < handle_size; i++) 1093 { 1094 stack.pop(); 1095 tos--; 1096 } 1097 1098 /* look up the state to go to from the one popped back to */ 1099 act = get_reduce(((symbol)stack.peek()).parse_state, lhs_sym_num); 1100 1101 /* shift to that state */ 1102 lhs_sym.parse_state = act; 1103 stack.push(lhs_sym); 1104 tos++; 1105 1106 if (debug) debug_message("# Goto state #" + act); 1107 1108 } 1109 /* finally if the entry is zero, we have an error 1110 (shouldn't happen here, but...)*/ 1111 else if (act == 0) 1112 { 1113 report_fatal_error("Syntax error", null); 1114 return; 1115 } 1116 } 1117 } 1118 1119 /*-----------------------------------------------------------*/ 1120 1121 }; 1122