1=========================================== 2Kaleidoscope: Implementing a Parser and AST 3=========================================== 4 5.. contents:: 6 :local: 7 8Chapter 2 Introduction 9====================== 10 11Welcome to Chapter 2 of the "`Implementing a language with 12LLVM <index.html>`_" tutorial. This chapter shows you how to use the 13lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full 14`parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope 15language. Once we have a parser, we'll define and build an `Abstract 16Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST). 17 18The parser we will build uses a combination of `Recursive Descent 19Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and 20`Operator-Precedence 21Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to 22parse the Kaleidoscope language (the latter for binary expressions and 23the former for everything else). Before we get to parsing though, lets 24talk about the output of the parser: the Abstract Syntax Tree. 25 26The Abstract Syntax Tree (AST) 27============================== 28 29The AST for a program captures its behavior in such a way that it is 30easy for later stages of the compiler (e.g. code generation) to 31interpret. We basically want one object for each construct in the 32language, and the AST should closely model the language. In 33Kaleidoscope, we have expressions, a prototype, and a function object. 34We'll start with expressions first: 35 36.. code-block:: c++ 37 38 /// ExprAST - Base class for all expression nodes. 39 class ExprAST { 40 public: 41 virtual ~ExprAST() {} 42 }; 43 44 /// NumberExprAST - Expression class for numeric literals like "1.0". 45 class NumberExprAST : public ExprAST { 46 double Val; 47 48 public: 49 NumberExprAST(double Val) : Val(Val) {} 50 }; 51 52The code above shows the definition of the base ExprAST class and one 53subclass which we use for numeric literals. The important thing to note 54about this code is that the NumberExprAST class captures the numeric 55value of the literal as an instance variable. This allows later phases 56of the compiler to know what the stored numeric value is. 57 58Right now we only create the AST, so there are no useful accessor 59methods on them. It would be very easy to add a virtual method to pretty 60print the code, for example. Here are the other expression AST node 61definitions that we'll use in the basic form of the Kaleidoscope 62language: 63 64.. code-block:: c++ 65 66 /// VariableExprAST - Expression class for referencing a variable, like "a". 67 class VariableExprAST : public ExprAST { 68 std::string Name; 69 70 public: 71 VariableExprAST(const std::string &Name) : Name(Name) {} 72 }; 73 74 /// BinaryExprAST - Expression class for a binary operator. 75 class BinaryExprAST : public ExprAST { 76 char Op; 77 std::unique_ptr<ExprAST> LHS, RHS; 78 79 public: 80 BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS, 81 std::unique_ptr<ExprAST> RHS) 82 : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {} 83 }; 84 85 /// CallExprAST - Expression class for function calls. 86 class CallExprAST : public ExprAST { 87 std::string Callee; 88 std::vector<std::unique_ptr<ExprAST>> Args; 89 90 public: 91 CallExprAST(const std::string &Callee, 92 std::vector<std::unique_ptr<ExprAST>> Args) 93 : Callee(Callee), Args(std::move(Args)) {} 94 }; 95 96This is all (intentionally) rather straight-forward: variables capture 97the variable name, binary operators capture their opcode (e.g. '+'), and 98calls capture a function name as well as a list of any argument 99expressions. One thing that is nice about our AST is that it captures 100the language features without talking about the syntax of the language. 101Note that there is no discussion about precedence of binary operators, 102lexical structure, etc. 103 104For our basic language, these are all of the expression nodes we'll 105define. Because it doesn't have conditional control flow, it isn't 106Turing-complete; we'll fix that in a later installment. The two things 107we need next are a way to talk about the interface to a function, and a 108way to talk about functions themselves: 109 110.. code-block:: c++ 111 112 /// PrototypeAST - This class represents the "prototype" for a function, 113 /// which captures its name, and its argument names (thus implicitly the number 114 /// of arguments the function takes). 115 class PrototypeAST { 116 std::string Name; 117 std::vector<std::string> Args; 118 119 public: 120 PrototypeAST(const std::string &name, std::vector<std::string> Args) 121 : Name(name), Args(std::move(Args)) {} 122 }; 123 124 /// FunctionAST - This class represents a function definition itself. 125 class FunctionAST { 126 std::unique_ptr<PrototypeAST> Proto; 127 std::unique_ptr<ExprAST> Body; 128 129 public: 130 FunctionAST(std::unique_ptr<PrototypeAST> Proto, 131 std::unique_ptr<ExprAST> Body) 132 : Proto(std::move(Proto)), Body(std::move(Body)) {} 133 }; 134 135In Kaleidoscope, functions are typed with just a count of their 136arguments. Since all values are double precision floating point, the 137type of each argument doesn't need to be stored anywhere. In a more 138aggressive and realistic language, the "ExprAST" class would probably 139have a type field. 140 141With this scaffolding, we can now talk about parsing expressions and 142function bodies in Kaleidoscope. 143 144Parser Basics 145============= 146 147Now that we have an AST to build, we need to define the parser code to 148build it. The idea here is that we want to parse something like "x+y" 149(which is returned as three tokens by the lexer) into an AST that could 150be generated with calls like this: 151 152.. code-block:: c++ 153 154 auto LHS = llvm::make_unique<VariableExprAST>("x"); 155 auto RHS = llvm::make_unique<VariableExprAST>("y"); 156 auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS), 157 std::move(RHS)); 158 159In order to do this, we'll start by defining some basic helper routines: 160 161.. code-block:: c++ 162 163 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current 164 /// token the parser is looking at. getNextToken reads another token from the 165 /// lexer and updates CurTok with its results. 166 static int CurTok; 167 static int getNextToken() { 168 return CurTok = gettok(); 169 } 170 171This implements a simple token buffer around the lexer. This allows us 172to look one token ahead at what the lexer is returning. Every function 173in our parser will assume that CurTok is the current token that needs to 174be parsed. 175 176.. code-block:: c++ 177 178 179 /// LogError* - These are little helper functions for error handling. 180 std::unique_ptr<ExprAST> LogError(const char *Str) { 181 fprintf(stderr, "LogError: %s\n", Str); 182 return nullptr; 183 } 184 std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) { 185 LogError(Str); 186 return nullptr; 187 } 188 189The ``LogError`` routines are simple helper routines that our parser will 190use to handle errors. The error recovery in our parser will not be the 191best and is not particular user-friendly, but it will be enough for our 192tutorial. These routines make it easier to handle errors in routines 193that have various return types: they always return null. 194 195With these basic helper functions, we can implement the first piece of 196our grammar: numeric literals. 197 198Basic Expression Parsing 199======================== 200 201We start with numeric literals, because they are the simplest to 202process. For each production in our grammar, we'll define a function 203which parses that production. For numeric literals, we have: 204 205.. code-block:: c++ 206 207 /// numberexpr ::= number 208 static std::unique_ptr<ExprAST> ParseNumberExpr() { 209 auto Result = llvm::make_unique<NumberExprAST>(NumVal); 210 getNextToken(); // consume the number 211 return std::move(Result); 212 } 213 214This routine is very simple: it expects to be called when the current 215token is a ``tok_number`` token. It takes the current number value, 216creates a ``NumberExprAST`` node, advances the lexer to the next token, 217and finally returns. 218 219There are some interesting aspects to this. The most important one is 220that this routine eats all of the tokens that correspond to the 221production and returns the lexer buffer with the next token (which is 222not part of the grammar production) ready to go. This is a fairly 223standard way to go for recursive descent parsers. For a better example, 224the parenthesis operator is defined like this: 225 226.. code-block:: c++ 227 228 /// parenexpr ::= '(' expression ')' 229 static std::unique_ptr<ExprAST> ParseParenExpr() { 230 getNextToken(); // eat (. 231 auto V = ParseExpression(); 232 if (!V) 233 return nullptr; 234 235 if (CurTok != ')') 236 return LogError("expected ')'"); 237 getNextToken(); // eat ). 238 return V; 239 } 240 241This function illustrates a number of interesting things about the 242parser: 243 2441) It shows how we use the LogError routines. When called, this function 245expects that the current token is a '(' token, but after parsing the 246subexpression, it is possible that there is no ')' waiting. For example, 247if the user types in "(4 x" instead of "(4)", the parser should emit an 248error. Because errors can occur, the parser needs a way to indicate that 249they happened: in our parser, we return null on an error. 250 2512) Another interesting aspect of this function is that it uses recursion 252by calling ``ParseExpression`` (we will soon see that 253``ParseExpression`` can call ``ParseParenExpr``). This is powerful 254because it allows us to handle recursive grammars, and keeps each 255production very simple. Note that parentheses do not cause construction 256of AST nodes themselves. While we could do it this way, the most 257important role of parentheses are to guide the parser and provide 258grouping. Once the parser constructs the AST, parentheses are not 259needed. 260 261The next simple production is for handling variable references and 262function calls: 263 264.. code-block:: c++ 265 266 /// identifierexpr 267 /// ::= identifier 268 /// ::= identifier '(' expression* ')' 269 static std::unique_ptr<ExprAST> ParseIdentifierExpr() { 270 std::string IdName = IdentifierStr; 271 272 getNextToken(); // eat identifier. 273 274 if (CurTok != '(') // Simple variable ref. 275 return llvm::make_unique<VariableExprAST>(IdName); 276 277 // Call. 278 getNextToken(); // eat ( 279 std::vector<std::unique_ptr<ExprAST>> Args; 280 if (CurTok != ')') { 281 while (1) { 282 if (auto Arg = ParseExpression()) 283 Args.push_back(std::move(Arg)); 284 else 285 return nullptr; 286 287 if (CurTok == ')') 288 break; 289 290 if (CurTok != ',') 291 return LogError("Expected ')' or ',' in argument list"); 292 getNextToken(); 293 } 294 } 295 296 // Eat the ')'. 297 getNextToken(); 298 299 return llvm::make_unique<CallExprAST>(IdName, std::move(Args)); 300 } 301 302This routine follows the same style as the other routines. (It expects 303to be called if the current token is a ``tok_identifier`` token). It 304also has recursion and error handling. One interesting aspect of this is 305that it uses *look-ahead* to determine if the current identifier is a 306stand alone variable reference or if it is a function call expression. 307It handles this by checking to see if the token after the identifier is 308a '(' token, constructing either a ``VariableExprAST`` or 309``CallExprAST`` node as appropriate. 310 311Now that we have all of our simple expression-parsing logic in place, we 312can define a helper function to wrap it together into one entry point. 313We call this class of expressions "primary" expressions, for reasons 314that will become more clear `later in the 315tutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary 316primary expression, we need to determine what sort of expression it is: 317 318.. code-block:: c++ 319 320 /// primary 321 /// ::= identifierexpr 322 /// ::= numberexpr 323 /// ::= parenexpr 324 static std::unique_ptr<ExprAST> ParsePrimary() { 325 switch (CurTok) { 326 default: 327 return LogError("unknown token when expecting an expression"); 328 case tok_identifier: 329 return ParseIdentifierExpr(); 330 case tok_number: 331 return ParseNumberExpr(); 332 case '(': 333 return ParseParenExpr(); 334 } 335 } 336 337Now that you see the definition of this function, it is more obvious why 338we can assume the state of CurTok in the various functions. This uses 339look-ahead to determine which sort of expression is being inspected, and 340then parses it with a function call. 341 342Now that basic expressions are handled, we need to handle binary 343expressions. They are a bit more complex. 344 345Binary Expression Parsing 346========================= 347 348Binary expressions are significantly harder to parse because they are 349often ambiguous. For example, when given the string "x+y\*z", the parser 350can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common 351definitions from mathematics, we expect the later parse, because "\*" 352(multiplication) has higher *precedence* than "+" (addition). 353 354There are many ways to handle this, but an elegant and efficient way is 355to use `Operator-Precedence 356Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_. 357This parsing technique uses the precedence of binary operators to guide 358recursion. To start with, we need a table of precedences: 359 360.. code-block:: c++ 361 362 /// BinopPrecedence - This holds the precedence for each binary operator that is 363 /// defined. 364 static std::map<char, int> BinopPrecedence; 365 366 /// GetTokPrecedence - Get the precedence of the pending binary operator token. 367 static int GetTokPrecedence() { 368 if (!isascii(CurTok)) 369 return -1; 370 371 // Make sure it's a declared binop. 372 int TokPrec = BinopPrecedence[CurTok]; 373 if (TokPrec <= 0) return -1; 374 return TokPrec; 375 } 376 377 int main() { 378 // Install standard binary operators. 379 // 1 is lowest precedence. 380 BinopPrecedence['<'] = 10; 381 BinopPrecedence['+'] = 20; 382 BinopPrecedence['-'] = 20; 383 BinopPrecedence['*'] = 40; // highest. 384 ... 385 } 386 387For the basic form of Kaleidoscope, we will only support 4 binary 388operators (this can obviously be extended by you, our brave and intrepid 389reader). The ``GetTokPrecedence`` function returns the precedence for 390the current token, or -1 if the token is not a binary operator. Having a 391map makes it easy to add new operators and makes it clear that the 392algorithm doesn't depend on the specific operators involved, but it 393would be easy enough to eliminate the map and do the comparisons in the 394``GetTokPrecedence`` function. (Or just use a fixed-size array). 395 396With the helper above defined, we can now start parsing binary 397expressions. The basic idea of operator precedence parsing is to break 398down an expression with potentially ambiguous binary operators into 399pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g". 400Operator precedence parsing considers this as a stream of primary 401expressions separated by binary operators. As such, it will first parse 402the leading primary expression "a", then it will see the pairs [+, b] 403[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are 404primary expressions, the binary expression parser doesn't need to worry 405about nested subexpressions like (c+d) at all. 406 407To start, an expression is a primary expression potentially followed by 408a sequence of [binop,primaryexpr] pairs: 409 410.. code-block:: c++ 411 412 /// expression 413 /// ::= primary binoprhs 414 /// 415 static std::unique_ptr<ExprAST> ParseExpression() { 416 auto LHS = ParsePrimary(); 417 if (!LHS) 418 return nullptr; 419 420 return ParseBinOpRHS(0, std::move(LHS)); 421 } 422 423``ParseBinOpRHS`` is the function that parses the sequence of pairs for 424us. It takes a precedence and a pointer to an expression for the part 425that has been parsed so far. Note that "x" is a perfectly valid 426expression: As such, "binoprhs" is allowed to be empty, in which case it 427returns the expression that is passed into it. In our example above, the 428code passes the expression for "a" into ``ParseBinOpRHS`` and the 429current token is "+". 430 431The precedence value passed into ``ParseBinOpRHS`` indicates the 432*minimal operator precedence* that the function is allowed to eat. For 433example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is 434passed in a precedence of 40, it will not consume any tokens (because 435the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS`` 436starts with: 437 438.. code-block:: c++ 439 440 /// binoprhs 441 /// ::= ('+' primary)* 442 static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec, 443 std::unique_ptr<ExprAST> LHS) { 444 // If this is a binop, find its precedence. 445 while (1) { 446 int TokPrec = GetTokPrecedence(); 447 448 // If this is a binop that binds at least as tightly as the current binop, 449 // consume it, otherwise we are done. 450 if (TokPrec < ExprPrec) 451 return LHS; 452 453This code gets the precedence of the current token and checks to see if 454if is too low. Because we defined invalid tokens to have a precedence of 455-1, this check implicitly knows that the pair-stream ends when the token 456stream runs out of binary operators. If this check succeeds, we know 457that the token is a binary operator and that it will be included in this 458expression: 459 460.. code-block:: c++ 461 462 // Okay, we know this is a binop. 463 int BinOp = CurTok; 464 getNextToken(); // eat binop 465 466 // Parse the primary expression after the binary operator. 467 auto RHS = ParsePrimary(); 468 if (!RHS) 469 return nullptr; 470 471As such, this code eats (and remembers) the binary operator and then 472parses the primary expression that follows. This builds up the whole 473pair, the first of which is [+, b] for the running example. 474 475Now that we parsed the left-hand side of an expression and one pair of 476the RHS sequence, we have to decide which way the expression associates. 477In particular, we could have "(a+b) binop unparsed" or "a + (b binop 478unparsed)". To determine this, we look ahead at "binop" to determine its 479precedence and compare it to BinOp's precedence (which is '+' in this 480case): 481 482.. code-block:: c++ 483 484 // If BinOp binds less tightly with RHS than the operator after RHS, let 485 // the pending operator take RHS as its LHS. 486 int NextPrec = GetTokPrecedence(); 487 if (TokPrec < NextPrec) { 488 489If the precedence of the binop to the right of "RHS" is lower or equal 490to the precedence of our current operator, then we know that the 491parentheses associate as "(a+b) binop ...". In our example, the current 492operator is "+" and the next operator is "+", we know that they have the 493same precedence. In this case we'll create the AST node for "a+b", and 494then continue parsing: 495 496.. code-block:: c++ 497 498 ... if body omitted ... 499 } 500 501 // Merge LHS/RHS. 502 LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS), 503 std::move(RHS)); 504 } // loop around to the top of the while loop. 505 } 506 507In our example above, this will turn "a+b+" into "(a+b)" and execute the 508next iteration of the loop, with "+" as the current token. The code 509above will eat, remember, and parse "(c+d)" as the primary expression, 510which makes the current pair equal to [+, (c+d)]. It will then evaluate 511the 'if' conditional above with "\*" as the binop to the right of the 512primary. In this case, the precedence of "\*" is higher than the 513precedence of "+" so the if condition will be entered. 514 515The critical question left here is "how can the if condition parse the 516right hand side in full"? In particular, to build the AST correctly for 517our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression 518variable. The code to do this is surprisingly simple (code from the 519above two blocks duplicated for context): 520 521.. code-block:: c++ 522 523 // If BinOp binds less tightly with RHS than the operator after RHS, let 524 // the pending operator take RHS as its LHS. 525 int NextPrec = GetTokPrecedence(); 526 if (TokPrec < NextPrec) { 527 RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS)); 528 if (!RHS) 529 return nullptr; 530 } 531 // Merge LHS/RHS. 532 LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS), 533 std::move(RHS)); 534 } // loop around to the top of the while loop. 535 } 536 537At this point, we know that the binary operator to the RHS of our 538primary has higher precedence than the binop we are currently parsing. 539As such, we know that any sequence of pairs whose operators are all 540higher precedence than "+" should be parsed together and returned as 541"RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function 542specifying "TokPrec+1" as the minimum precedence required for it to 543continue. In our example above, this will cause it to return the AST 544node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+' 545expression. 546 547Finally, on the next iteration of the while loop, the "+g" piece is 548parsed and added to the AST. With this little bit of code (14 549non-trivial lines), we correctly handle fully general binary expression 550parsing in a very elegant way. This was a whirlwind tour of this code, 551and it is somewhat subtle. I recommend running through it with a few 552tough examples to see how it works. 553 554This wraps up handling of expressions. At this point, we can point the 555parser at an arbitrary token stream and build an expression from it, 556stopping at the first token that is not part of the expression. Next up 557we need to handle function definitions, etc. 558 559Parsing the Rest 560================ 561 562The next thing missing is handling of function prototypes. In 563Kaleidoscope, these are used both for 'extern' function declarations as 564well as function body definitions. The code to do this is 565straight-forward and not very interesting (once you've survived 566expressions): 567 568.. code-block:: c++ 569 570 /// prototype 571 /// ::= id '(' id* ')' 572 static std::unique_ptr<PrototypeAST> ParsePrototype() { 573 if (CurTok != tok_identifier) 574 return LogErrorP("Expected function name in prototype"); 575 576 std::string FnName = IdentifierStr; 577 getNextToken(); 578 579 if (CurTok != '(') 580 return LogErrorP("Expected '(' in prototype"); 581 582 // Read the list of argument names. 583 std::vector<std::string> ArgNames; 584 while (getNextToken() == tok_identifier) 585 ArgNames.push_back(IdentifierStr); 586 if (CurTok != ')') 587 return LogErrorP("Expected ')' in prototype"); 588 589 // success. 590 getNextToken(); // eat ')'. 591 592 return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames)); 593 } 594 595Given this, a function definition is very simple, just a prototype plus 596an expression to implement the body: 597 598.. code-block:: c++ 599 600 /// definition ::= 'def' prototype expression 601 static std::unique_ptr<FunctionAST> ParseDefinition() { 602 getNextToken(); // eat def. 603 auto Proto = ParsePrototype(); 604 if (!Proto) return nullptr; 605 606 if (auto E = ParseExpression()) 607 return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); 608 return nullptr; 609 } 610 611In addition, we support 'extern' to declare functions like 'sin' and 612'cos' as well as to support forward declaration of user functions. These 613'extern's are just prototypes with no body: 614 615.. code-block:: c++ 616 617 /// external ::= 'extern' prototype 618 static std::unique_ptr<PrototypeAST> ParseExtern() { 619 getNextToken(); // eat extern. 620 return ParsePrototype(); 621 } 622 623Finally, we'll also let the user type in arbitrary top-level expressions 624and evaluate them on the fly. We will handle this by defining anonymous 625nullary (zero argument) functions for them: 626 627.. code-block:: c++ 628 629 /// toplevelexpr ::= expression 630 static std::unique_ptr<FunctionAST> ParseTopLevelExpr() { 631 if (auto E = ParseExpression()) { 632 // Make an anonymous proto. 633 auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>()); 634 return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); 635 } 636 return nullptr; 637 } 638 639Now that we have all the pieces, let's build a little driver that will 640let us actually *execute* this code we've built! 641 642The Driver 643========== 644 645The driver for this simply invokes all of the parsing pieces with a 646top-level dispatch loop. There isn't much interesting here, so I'll just 647include the top-level loop. See `below <#full-code-listing>`_ for full code in the 648"Top-Level Parsing" section. 649 650.. code-block:: c++ 651 652 /// top ::= definition | external | expression | ';' 653 static void MainLoop() { 654 while (1) { 655 fprintf(stderr, "ready> "); 656 switch (CurTok) { 657 case tok_eof: 658 return; 659 case ';': // ignore top-level semicolons. 660 getNextToken(); 661 break; 662 case tok_def: 663 HandleDefinition(); 664 break; 665 case tok_extern: 666 HandleExtern(); 667 break; 668 default: 669 HandleTopLevelExpression(); 670 break; 671 } 672 } 673 } 674 675The most interesting part of this is that we ignore top-level 676semicolons. Why is this, you ask? The basic reason is that if you type 677"4 + 5" at the command line, the parser doesn't know whether that is the 678end of what you will type or not. For example, on the next line you 679could type "def foo..." in which case 4+5 is the end of a top-level 680expression. Alternatively you could type "\* 6", which would continue 681the expression. Having top-level semicolons allows you to type "4+5;", 682and the parser will know you are done. 683 684Conclusions 685=========== 686 687With just under 400 lines of commented code (240 lines of non-comment, 688non-blank code), we fully defined our minimal language, including a 689lexer, parser, and AST builder. With this done, the executable will 690validate Kaleidoscope code and tell us if it is grammatically invalid. 691For example, here is a sample interaction: 692 693.. code-block:: bash 694 695 $ ./a.out 696 ready> def foo(x y) x+foo(y, 4.0); 697 Parsed a function definition. 698 ready> def foo(x y) x+y y; 699 Parsed a function definition. 700 Parsed a top-level expr 701 ready> def foo(x y) x+y ); 702 Parsed a function definition. 703 Error: unknown token when expecting an expression 704 ready> extern sin(a); 705 ready> Parsed an extern 706 ready> ^D 707 $ 708 709There is a lot of room for extension here. You can define new AST nodes, 710extend the language in many ways, etc. In the `next 711installment <LangImpl3.html>`_, we will describe how to generate LLVM 712Intermediate Representation (IR) from the AST. 713 714Full Code Listing 715================= 716 717Here is the complete code listing for this and the previous chapter. 718Note that it is fully self-contained: you don't need LLVM or any 719external libraries at all for this. (Besides the C and C++ standard 720libraries, of course.) To build this, just compile with: 721 722.. code-block:: bash 723 724 # Compile 725 clang++ -g -O3 toy.cpp 726 # Run 727 ./a.out 728 729Here is the code: 730 731.. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp 732 :language: c++ 733 734`Next: Implementing Code Generation to LLVM IR <LangImpl03.html>`_ 735 736