1=========================================== 2Kaleidoscope: Implementing a Parser and AST 3=========================================== 4 5.. contents:: 6 :local: 7 8Chapter 2 Introduction 9====================== 10 11Welcome to Chapter 2 of the "`Implementing a language with 12LLVM <index.html>`_" tutorial. This chapter shows you how to use the 13lexer, built in `Chapter 1 <LangImpl01.html>`_, to build a full 14`parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope 15language. Once we have a parser, we'll define and build an `Abstract 16Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST). 17 18The parser we will build uses a combination of `Recursive Descent 19Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and 20`Operator-Precedence 21Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to 22parse the Kaleidoscope language (the latter for binary expressions and 23the former for everything else). Before we get to parsing though, let's 24talk about the output of the parser: the Abstract Syntax Tree. 25 26The Abstract Syntax Tree (AST) 27============================== 28 29The AST for a program captures its behavior in such a way that it is 30easy for later stages of the compiler (e.g. code generation) to 31interpret. We basically want one object for each construct in the 32language, and the AST should closely model the language. In 33Kaleidoscope, we have expressions, a prototype, and a function object. 34We'll start with expressions first: 35 36.. code-block:: c++ 37 38 /// ExprAST - Base class for all expression nodes. 39 class ExprAST { 40 public: 41 virtual ~ExprAST() {} 42 }; 43 44 /// NumberExprAST - Expression class for numeric literals like "1.0". 45 class NumberExprAST : public ExprAST { 46 double Val; 47 48 public: 49 NumberExprAST(double Val) : Val(Val) {} 50 }; 51 52The code above shows the definition of the base ExprAST class and one 53subclass which we use for numeric literals. The important thing to note 54about this code is that the NumberExprAST class captures the numeric 55value of the literal as an instance variable. This allows later phases 56of the compiler to know what the stored numeric value is. 57 58Right now we only create the AST, so there are no useful accessor 59methods on them. It would be very easy to add a virtual method to pretty 60print the code, for example. Here are the other expression AST node 61definitions that we'll use in the basic form of the Kaleidoscope 62language: 63 64.. code-block:: c++ 65 66 /// VariableExprAST - Expression class for referencing a variable, like "a". 67 class VariableExprAST : public ExprAST { 68 std::string Name; 69 70 public: 71 VariableExprAST(const std::string &Name) : Name(Name) {} 72 }; 73 74 /// BinaryExprAST - Expression class for a binary operator. 75 class BinaryExprAST : public ExprAST { 76 char Op; 77 std::unique_ptr<ExprAST> LHS, RHS; 78 79 public: 80 BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS, 81 std::unique_ptr<ExprAST> RHS) 82 : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {} 83 }; 84 85 /// CallExprAST - Expression class for function calls. 86 class CallExprAST : public ExprAST { 87 std::string Callee; 88 std::vector<std::unique_ptr<ExprAST>> Args; 89 90 public: 91 CallExprAST(const std::string &Callee, 92 std::vector<std::unique_ptr<ExprAST>> Args) 93 : Callee(Callee), Args(std::move(Args)) {} 94 }; 95 96This is all (intentionally) rather straight-forward: variables capture 97the variable name, binary operators capture their opcode (e.g. '+'), and 98calls capture a function name as well as a list of any argument 99expressions. One thing that is nice about our AST is that it captures 100the language features without talking about the syntax of the language. 101Note that there is no discussion about precedence of binary operators, 102lexical structure, etc. 103 104For our basic language, these are all of the expression nodes we'll 105define. Because it doesn't have conditional control flow, it isn't 106Turing-complete; we'll fix that in a later installment. The two things 107we need next are a way to talk about the interface to a function, and a 108way to talk about functions themselves: 109 110.. code-block:: c++ 111 112 /// PrototypeAST - This class represents the "prototype" for a function, 113 /// which captures its name, and its argument names (thus implicitly the number 114 /// of arguments the function takes). 115 class PrototypeAST { 116 std::string Name; 117 std::vector<std::string> Args; 118 119 public: 120 PrototypeAST(const std::string &name, std::vector<std::string> Args) 121 : Name(name), Args(std::move(Args)) {} 122 123 const std::string &getName() const { return Name; } 124 }; 125 126 /// FunctionAST - This class represents a function definition itself. 127 class FunctionAST { 128 std::unique_ptr<PrototypeAST> Proto; 129 std::unique_ptr<ExprAST> Body; 130 131 public: 132 FunctionAST(std::unique_ptr<PrototypeAST> Proto, 133 std::unique_ptr<ExprAST> Body) 134 : Proto(std::move(Proto)), Body(std::move(Body)) {} 135 }; 136 137In Kaleidoscope, functions are typed with just a count of their 138arguments. Since all values are double precision floating point, the 139type of each argument doesn't need to be stored anywhere. In a more 140aggressive and realistic language, the "ExprAST" class would probably 141have a type field. 142 143With this scaffolding, we can now talk about parsing expressions and 144function bodies in Kaleidoscope. 145 146Parser Basics 147============= 148 149Now that we have an AST to build, we need to define the parser code to 150build it. The idea here is that we want to parse something like "x+y" 151(which is returned as three tokens by the lexer) into an AST that could 152be generated with calls like this: 153 154.. code-block:: c++ 155 156 auto LHS = llvm::make_unique<VariableExprAST>("x"); 157 auto RHS = llvm::make_unique<VariableExprAST>("y"); 158 auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS), 159 std::move(RHS)); 160 161In order to do this, we'll start by defining some basic helper routines: 162 163.. code-block:: c++ 164 165 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current 166 /// token the parser is looking at. getNextToken reads another token from the 167 /// lexer and updates CurTok with its results. 168 static int CurTok; 169 static int getNextToken() { 170 return CurTok = gettok(); 171 } 172 173This implements a simple token buffer around the lexer. This allows us 174to look one token ahead at what the lexer is returning. Every function 175in our parser will assume that CurTok is the current token that needs to 176be parsed. 177 178.. code-block:: c++ 179 180 181 /// LogError* - These are little helper functions for error handling. 182 std::unique_ptr<ExprAST> LogError(const char *Str) { 183 fprintf(stderr, "LogError: %s\n", Str); 184 return nullptr; 185 } 186 std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) { 187 LogError(Str); 188 return nullptr; 189 } 190 191The ``LogError`` routines are simple helper routines that our parser will 192use to handle errors. The error recovery in our parser will not be the 193best and is not particular user-friendly, but it will be enough for our 194tutorial. These routines make it easier to handle errors in routines 195that have various return types: they always return null. 196 197With these basic helper functions, we can implement the first piece of 198our grammar: numeric literals. 199 200Basic Expression Parsing 201======================== 202 203We start with numeric literals, because they are the simplest to 204process. For each production in our grammar, we'll define a function 205which parses that production. For numeric literals, we have: 206 207.. code-block:: c++ 208 209 /// numberexpr ::= number 210 static std::unique_ptr<ExprAST> ParseNumberExpr() { 211 auto Result = llvm::make_unique<NumberExprAST>(NumVal); 212 getNextToken(); // consume the number 213 return std::move(Result); 214 } 215 216This routine is very simple: it expects to be called when the current 217token is a ``tok_number`` token. It takes the current number value, 218creates a ``NumberExprAST`` node, advances the lexer to the next token, 219and finally returns. 220 221There are some interesting aspects to this. The most important one is 222that this routine eats all of the tokens that correspond to the 223production and returns the lexer buffer with the next token (which is 224not part of the grammar production) ready to go. This is a fairly 225standard way to go for recursive descent parsers. For a better example, 226the parenthesis operator is defined like this: 227 228.. code-block:: c++ 229 230 /// parenexpr ::= '(' expression ')' 231 static std::unique_ptr<ExprAST> ParseParenExpr() { 232 getNextToken(); // eat (. 233 auto V = ParseExpression(); 234 if (!V) 235 return nullptr; 236 237 if (CurTok != ')') 238 return LogError("expected ')'"); 239 getNextToken(); // eat ). 240 return V; 241 } 242 243This function illustrates a number of interesting things about the 244parser: 245 2461) It shows how we use the LogError routines. When called, this function 247expects that the current token is a '(' token, but after parsing the 248subexpression, it is possible that there is no ')' waiting. For example, 249if the user types in "(4 x" instead of "(4)", the parser should emit an 250error. Because errors can occur, the parser needs a way to indicate that 251they happened: in our parser, we return null on an error. 252 2532) Another interesting aspect of this function is that it uses recursion 254by calling ``ParseExpression`` (we will soon see that 255``ParseExpression`` can call ``ParseParenExpr``). This is powerful 256because it allows us to handle recursive grammars, and keeps each 257production very simple. Note that parentheses do not cause construction 258of AST nodes themselves. While we could do it this way, the most 259important role of parentheses are to guide the parser and provide 260grouping. Once the parser constructs the AST, parentheses are not 261needed. 262 263The next simple production is for handling variable references and 264function calls: 265 266.. code-block:: c++ 267 268 /// identifierexpr 269 /// ::= identifier 270 /// ::= identifier '(' expression* ')' 271 static std::unique_ptr<ExprAST> ParseIdentifierExpr() { 272 std::string IdName = IdentifierStr; 273 274 getNextToken(); // eat identifier. 275 276 if (CurTok != '(') // Simple variable ref. 277 return llvm::make_unique<VariableExprAST>(IdName); 278 279 // Call. 280 getNextToken(); // eat ( 281 std::vector<std::unique_ptr<ExprAST>> Args; 282 if (CurTok != ')') { 283 while (1) { 284 if (auto Arg = ParseExpression()) 285 Args.push_back(std::move(Arg)); 286 else 287 return nullptr; 288 289 if (CurTok == ')') 290 break; 291 292 if (CurTok != ',') 293 return LogError("Expected ')' or ',' in argument list"); 294 getNextToken(); 295 } 296 } 297 298 // Eat the ')'. 299 getNextToken(); 300 301 return llvm::make_unique<CallExprAST>(IdName, std::move(Args)); 302 } 303 304This routine follows the same style as the other routines. (It expects 305to be called if the current token is a ``tok_identifier`` token). It 306also has recursion and error handling. One interesting aspect of this is 307that it uses *look-ahead* to determine if the current identifier is a 308stand alone variable reference or if it is a function call expression. 309It handles this by checking to see if the token after the identifier is 310a '(' token, constructing either a ``VariableExprAST`` or 311``CallExprAST`` node as appropriate. 312 313Now that we have all of our simple expression-parsing logic in place, we 314can define a helper function to wrap it together into one entry point. 315We call this class of expressions "primary" expressions, for reasons 316that will become more clear `later in the 317tutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary 318primary expression, we need to determine what sort of expression it is: 319 320.. code-block:: c++ 321 322 /// primary 323 /// ::= identifierexpr 324 /// ::= numberexpr 325 /// ::= parenexpr 326 static std::unique_ptr<ExprAST> ParsePrimary() { 327 switch (CurTok) { 328 default: 329 return LogError("unknown token when expecting an expression"); 330 case tok_identifier: 331 return ParseIdentifierExpr(); 332 case tok_number: 333 return ParseNumberExpr(); 334 case '(': 335 return ParseParenExpr(); 336 } 337 } 338 339Now that you see the definition of this function, it is more obvious why 340we can assume the state of CurTok in the various functions. This uses 341look-ahead to determine which sort of expression is being inspected, and 342then parses it with a function call. 343 344Now that basic expressions are handled, we need to handle binary 345expressions. They are a bit more complex. 346 347Binary Expression Parsing 348========================= 349 350Binary expressions are significantly harder to parse because they are 351often ambiguous. For example, when given the string "x+y\*z", the parser 352can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common 353definitions from mathematics, we expect the later parse, because "\*" 354(multiplication) has higher *precedence* than "+" (addition). 355 356There are many ways to handle this, but an elegant and efficient way is 357to use `Operator-Precedence 358Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_. 359This parsing technique uses the precedence of binary operators to guide 360recursion. To start with, we need a table of precedences: 361 362.. code-block:: c++ 363 364 /// BinopPrecedence - This holds the precedence for each binary operator that is 365 /// defined. 366 static std::map<char, int> BinopPrecedence; 367 368 /// GetTokPrecedence - Get the precedence of the pending binary operator token. 369 static int GetTokPrecedence() { 370 if (!isascii(CurTok)) 371 return -1; 372 373 // Make sure it's a declared binop. 374 int TokPrec = BinopPrecedence[CurTok]; 375 if (TokPrec <= 0) return -1; 376 return TokPrec; 377 } 378 379 int main() { 380 // Install standard binary operators. 381 // 1 is lowest precedence. 382 BinopPrecedence['<'] = 10; 383 BinopPrecedence['+'] = 20; 384 BinopPrecedence['-'] = 20; 385 BinopPrecedence['*'] = 40; // highest. 386 ... 387 } 388 389For the basic form of Kaleidoscope, we will only support 4 binary 390operators (this can obviously be extended by you, our brave and intrepid 391reader). The ``GetTokPrecedence`` function returns the precedence for 392the current token, or -1 if the token is not a binary operator. Having a 393map makes it easy to add new operators and makes it clear that the 394algorithm doesn't depend on the specific operators involved, but it 395would be easy enough to eliminate the map and do the comparisons in the 396``GetTokPrecedence`` function. (Or just use a fixed-size array). 397 398With the helper above defined, we can now start parsing binary 399expressions. The basic idea of operator precedence parsing is to break 400down an expression with potentially ambiguous binary operators into 401pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g". 402Operator precedence parsing considers this as a stream of primary 403expressions separated by binary operators. As such, it will first parse 404the leading primary expression "a", then it will see the pairs [+, b] 405[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are 406primary expressions, the binary expression parser doesn't need to worry 407about nested subexpressions like (c+d) at all. 408 409To start, an expression is a primary expression potentially followed by 410a sequence of [binop,primaryexpr] pairs: 411 412.. code-block:: c++ 413 414 /// expression 415 /// ::= primary binoprhs 416 /// 417 static std::unique_ptr<ExprAST> ParseExpression() { 418 auto LHS = ParsePrimary(); 419 if (!LHS) 420 return nullptr; 421 422 return ParseBinOpRHS(0, std::move(LHS)); 423 } 424 425``ParseBinOpRHS`` is the function that parses the sequence of pairs for 426us. It takes a precedence and a pointer to an expression for the part 427that has been parsed so far. Note that "x" is a perfectly valid 428expression: As such, "binoprhs" is allowed to be empty, in which case it 429returns the expression that is passed into it. In our example above, the 430code passes the expression for "a" into ``ParseBinOpRHS`` and the 431current token is "+". 432 433The precedence value passed into ``ParseBinOpRHS`` indicates the 434*minimal operator precedence* that the function is allowed to eat. For 435example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is 436passed in a precedence of 40, it will not consume any tokens (because 437the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS`` 438starts with: 439 440.. code-block:: c++ 441 442 /// binoprhs 443 /// ::= ('+' primary)* 444 static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec, 445 std::unique_ptr<ExprAST> LHS) { 446 // If this is a binop, find its precedence. 447 while (1) { 448 int TokPrec = GetTokPrecedence(); 449 450 // If this is a binop that binds at least as tightly as the current binop, 451 // consume it, otherwise we are done. 452 if (TokPrec < ExprPrec) 453 return LHS; 454 455This code gets the precedence of the current token and checks to see if 456if is too low. Because we defined invalid tokens to have a precedence of 457-1, this check implicitly knows that the pair-stream ends when the token 458stream runs out of binary operators. If this check succeeds, we know 459that the token is a binary operator and that it will be included in this 460expression: 461 462.. code-block:: c++ 463 464 // Okay, we know this is a binop. 465 int BinOp = CurTok; 466 getNextToken(); // eat binop 467 468 // Parse the primary expression after the binary operator. 469 auto RHS = ParsePrimary(); 470 if (!RHS) 471 return nullptr; 472 473As such, this code eats (and remembers) the binary operator and then 474parses the primary expression that follows. This builds up the whole 475pair, the first of which is [+, b] for the running example. 476 477Now that we parsed the left-hand side of an expression and one pair of 478the RHS sequence, we have to decide which way the expression associates. 479In particular, we could have "(a+b) binop unparsed" or "a + (b binop 480unparsed)". To determine this, we look ahead at "binop" to determine its 481precedence and compare it to BinOp's precedence (which is '+' in this 482case): 483 484.. code-block:: c++ 485 486 // If BinOp binds less tightly with RHS than the operator after RHS, let 487 // the pending operator take RHS as its LHS. 488 int NextPrec = GetTokPrecedence(); 489 if (TokPrec < NextPrec) { 490 491If the precedence of the binop to the right of "RHS" is lower or equal 492to the precedence of our current operator, then we know that the 493parentheses associate as "(a+b) binop ...". In our example, the current 494operator is "+" and the next operator is "+", we know that they have the 495same precedence. In this case we'll create the AST node for "a+b", and 496then continue parsing: 497 498.. code-block:: c++ 499 500 ... if body omitted ... 501 } 502 503 // Merge LHS/RHS. 504 LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS), 505 std::move(RHS)); 506 } // loop around to the top of the while loop. 507 } 508 509In our example above, this will turn "a+b+" into "(a+b)" and execute the 510next iteration of the loop, with "+" as the current token. The code 511above will eat, remember, and parse "(c+d)" as the primary expression, 512which makes the current pair equal to [+, (c+d)]. It will then evaluate 513the 'if' conditional above with "\*" as the binop to the right of the 514primary. In this case, the precedence of "\*" is higher than the 515precedence of "+" so the if condition will be entered. 516 517The critical question left here is "how can the if condition parse the 518right hand side in full"? In particular, to build the AST correctly for 519our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression 520variable. The code to do this is surprisingly simple (code from the 521above two blocks duplicated for context): 522 523.. code-block:: c++ 524 525 // If BinOp binds less tightly with RHS than the operator after RHS, let 526 // the pending operator take RHS as its LHS. 527 int NextPrec = GetTokPrecedence(); 528 if (TokPrec < NextPrec) { 529 RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS)); 530 if (!RHS) 531 return nullptr; 532 } 533 // Merge LHS/RHS. 534 LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS), 535 std::move(RHS)); 536 } // loop around to the top of the while loop. 537 } 538 539At this point, we know that the binary operator to the RHS of our 540primary has higher precedence than the binop we are currently parsing. 541As such, we know that any sequence of pairs whose operators are all 542higher precedence than "+" should be parsed together and returned as 543"RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function 544specifying "TokPrec+1" as the minimum precedence required for it to 545continue. In our example above, this will cause it to return the AST 546node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+' 547expression. 548 549Finally, on the next iteration of the while loop, the "+g" piece is 550parsed and added to the AST. With this little bit of code (14 551non-trivial lines), we correctly handle fully general binary expression 552parsing in a very elegant way. This was a whirlwind tour of this code, 553and it is somewhat subtle. I recommend running through it with a few 554tough examples to see how it works. 555 556This wraps up handling of expressions. At this point, we can point the 557parser at an arbitrary token stream and build an expression from it, 558stopping at the first token that is not part of the expression. Next up 559we need to handle function definitions, etc. 560 561Parsing the Rest 562================ 563 564The next thing missing is handling of function prototypes. In 565Kaleidoscope, these are used both for 'extern' function declarations as 566well as function body definitions. The code to do this is 567straight-forward and not very interesting (once you've survived 568expressions): 569 570.. code-block:: c++ 571 572 /// prototype 573 /// ::= id '(' id* ')' 574 static std::unique_ptr<PrototypeAST> ParsePrototype() { 575 if (CurTok != tok_identifier) 576 return LogErrorP("Expected function name in prototype"); 577 578 std::string FnName = IdentifierStr; 579 getNextToken(); 580 581 if (CurTok != '(') 582 return LogErrorP("Expected '(' in prototype"); 583 584 // Read the list of argument names. 585 std::vector<std::string> ArgNames; 586 while (getNextToken() == tok_identifier) 587 ArgNames.push_back(IdentifierStr); 588 if (CurTok != ')') 589 return LogErrorP("Expected ')' in prototype"); 590 591 // success. 592 getNextToken(); // eat ')'. 593 594 return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames)); 595 } 596 597Given this, a function definition is very simple, just a prototype plus 598an expression to implement the body: 599 600.. code-block:: c++ 601 602 /// definition ::= 'def' prototype expression 603 static std::unique_ptr<FunctionAST> ParseDefinition() { 604 getNextToken(); // eat def. 605 auto Proto = ParsePrototype(); 606 if (!Proto) return nullptr; 607 608 if (auto E = ParseExpression()) 609 return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); 610 return nullptr; 611 } 612 613In addition, we support 'extern' to declare functions like 'sin' and 614'cos' as well as to support forward declaration of user functions. These 615'extern's are just prototypes with no body: 616 617.. code-block:: c++ 618 619 /// external ::= 'extern' prototype 620 static std::unique_ptr<PrototypeAST> ParseExtern() { 621 getNextToken(); // eat extern. 622 return ParsePrototype(); 623 } 624 625Finally, we'll also let the user type in arbitrary top-level expressions 626and evaluate them on the fly. We will handle this by defining anonymous 627nullary (zero argument) functions for them: 628 629.. code-block:: c++ 630 631 /// toplevelexpr ::= expression 632 static std::unique_ptr<FunctionAST> ParseTopLevelExpr() { 633 if (auto E = ParseExpression()) { 634 // Make an anonymous proto. 635 auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>()); 636 return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); 637 } 638 return nullptr; 639 } 640 641Now that we have all the pieces, let's build a little driver that will 642let us actually *execute* this code we've built! 643 644The Driver 645========== 646 647The driver for this simply invokes all of the parsing pieces with a 648top-level dispatch loop. There isn't much interesting here, so I'll just 649include the top-level loop. See `below <#full-code-listing>`_ for full code in the 650"Top-Level Parsing" section. 651 652.. code-block:: c++ 653 654 /// top ::= definition | external | expression | ';' 655 static void MainLoop() { 656 while (1) { 657 fprintf(stderr, "ready> "); 658 switch (CurTok) { 659 case tok_eof: 660 return; 661 case ';': // ignore top-level semicolons. 662 getNextToken(); 663 break; 664 case tok_def: 665 HandleDefinition(); 666 break; 667 case tok_extern: 668 HandleExtern(); 669 break; 670 default: 671 HandleTopLevelExpression(); 672 break; 673 } 674 } 675 } 676 677The most interesting part of this is that we ignore top-level 678semicolons. Why is this, you ask? The basic reason is that if you type 679"4 + 5" at the command line, the parser doesn't know whether that is the 680end of what you will type or not. For example, on the next line you 681could type "def foo..." in which case 4+5 is the end of a top-level 682expression. Alternatively you could type "\* 6", which would continue 683the expression. Having top-level semicolons allows you to type "4+5;", 684and the parser will know you are done. 685 686Conclusions 687=========== 688 689With just under 400 lines of commented code (240 lines of non-comment, 690non-blank code), we fully defined our minimal language, including a 691lexer, parser, and AST builder. With this done, the executable will 692validate Kaleidoscope code and tell us if it is grammatically invalid. 693For example, here is a sample interaction: 694 695.. code-block:: bash 696 697 $ ./a.out 698 ready> def foo(x y) x+foo(y, 4.0); 699 Parsed a function definition. 700 ready> def foo(x y) x+y y; 701 Parsed a function definition. 702 Parsed a top-level expr 703 ready> def foo(x y) x+y ); 704 Parsed a function definition. 705 Error: unknown token when expecting an expression 706 ready> extern sin(a); 707 ready> Parsed an extern 708 ready> ^D 709 $ 710 711There is a lot of room for extension here. You can define new AST nodes, 712extend the language in many ways, etc. In the `next 713installment <LangImpl03.html>`_, we will describe how to generate LLVM 714Intermediate Representation (IR) from the AST. 715 716Full Code Listing 717================= 718 719Here is the complete code listing for our running example. Because this 720uses the LLVM libraries, we need to link them in. To do this, we use the 721`llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform 722our makefile/command line about which options to use: 723 724.. code-block:: bash 725 726 # Compile 727 clang++ -g -O3 toy.cpp `llvm-config --cxxflags` 728 # Run 729 ./a.out 730 731Here is the code: 732 733.. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp 734 :language: c++ 735 736`Next: Implementing Code Generation to LLVM IR <LangImpl03.html>`_ 737 738