1========================================
2Kaleidoscope: Code generation to LLVM IR
3========================================
4
5.. contents::
6   :local:
7
8Chapter 3 Introduction
9======================
10
11Welcome to Chapter 3 of the "`Implementing a language with
12LLVM <index.html>`_" tutorial. This chapter shows you how to transform
13the `Abstract Syntax Tree <OCamlLangImpl2.html>`_, built in Chapter 2,
14into LLVM IR. This will teach you a little bit about how LLVM does
15things, as well as demonstrate how easy it is to use. It's much more
16work to build a lexer and parser than it is to generate LLVM IR code. :)
17
18**Please note**: the code in this chapter and later require LLVM 2.3 or
19LLVM SVN to work. LLVM 2.2 and before will not work with it.
20
21Code Generation Setup
22=====================
23
24In order to generate LLVM IR, we want some simple setup to get started.
25First we define virtual code generation (codegen) methods in each AST
26class:
27
28.. code-block:: ocaml
29
30    let rec codegen_expr = function
31      | Ast.Number n -> ...
32      | Ast.Variable name -> ...
33
34The ``Codegen.codegen_expr`` function says to emit IR for that AST node
35along with all the things it depends on, and they all return an LLVM
36Value object. "Value" is the class used to represent a "`Static Single
37Assignment
38(SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
39register" or "SSA value" in LLVM. The most distinct aspect of SSA values
40is that their value is computed as the related instruction executes, and
41it does not get a new value until (and if) the instruction re-executes.
42In other words, there is no way to "change" an SSA value. For more
43information, please read up on `Static Single
44Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
45- the concepts are really quite natural once you grok them.
46
47The second thing we want is an "Error" exception like we used for the
48parser, which will be used to report errors found during code generation
49(for example, use of an undeclared parameter):
50
51.. code-block:: ocaml
52
53    exception Error of string
54
55    let context = global_context ()
56    let the_module = create_module context "my cool jit"
57    let builder = builder context
58    let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
59    let double_type = double_type context
60
61The static variables will be used during code generation.
62``Codgen.the_module`` is the LLVM construct that contains all of the
63functions and global variables in a chunk of code. In many ways, it is
64the top-level structure that the LLVM IR uses to contain code.
65
66The ``Codegen.builder`` object is a helper object that makes it easy to
67generate LLVM instructions. Instances of the
68`IRBuilder <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_
69class keep track of the current place to insert instructions and has
70methods to create new instructions.
71
72The ``Codegen.named_values`` map keeps track of which values are defined
73in the current scope and what their LLVM representation is. (In other
74words, it is a symbol table for the code). In this form of Kaleidoscope,
75the only things that can be referenced are function parameters. As such,
76function parameters will be in this map when generating code for their
77function body.
78
79With these basics in place, we can start talking about how to generate
80code for each expression. Note that this assumes that the
81``Codgen.builder`` has been set up to generate code *into* something.
82For now, we'll assume that this has already been done, and we'll just
83use it to emit code.
84
85Expression Code Generation
86==========================
87
88Generating LLVM code for expression nodes is very straightforward: less
89than 30 lines of commented code for all four of our expression nodes.
90First we'll do numeric literals:
91
92.. code-block:: ocaml
93
94      | Ast.Number n -> const_float double_type n
95
96In the LLVM IR, numeric constants are represented with the
97``ConstantFP`` class, which holds the numeric value in an ``APFloat``
98internally (``APFloat`` has the capability of holding floating point
99constants of Arbitrary Precision). This code basically just creates
100and returns a ``ConstantFP``. Note that in the LLVM IR that constants
101are all uniqued together and shared. For this reason, the API uses "the
102foo::get(..)" idiom instead of "new foo(..)" or "foo::Create(..)".
103
104.. code-block:: ocaml
105
106      | Ast.Variable name ->
107          (try Hashtbl.find named_values name with
108            | Not_found -> raise (Error "unknown variable name"))
109
110References to variables are also quite simple using LLVM. In the simple
111version of Kaleidoscope, we assume that the variable has already been
112emitted somewhere and its value is available. In practice, the only
113values that can be in the ``Codegen.named_values`` map are function
114arguments. This code simply checks to see that the specified name is in
115the map (if not, an unknown variable is being referenced) and returns
116the value for it. In future chapters, we'll add support for `loop
117induction variables <LangImpl5.html#for-loop-expression>`_ in the symbol table, and for
118`local variables <LangImpl7.html#user-defined-local-variables>`_.
119
120.. code-block:: ocaml
121
122      | Ast.Binary (op, lhs, rhs) ->
123          let lhs_val = codegen_expr lhs in
124          let rhs_val = codegen_expr rhs in
125          begin
126            match op with
127            | '+' -> build_fadd lhs_val rhs_val "addtmp" builder
128            | '-' -> build_fsub lhs_val rhs_val "subtmp" builder
129            | '*' -> build_fmul lhs_val rhs_val "multmp" builder
130            | '<' ->
131                (* Convert bool 0/1 to double 0.0 or 1.0 *)
132                let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
133                build_uitofp i double_type "booltmp" builder
134            | _ -> raise (Error "invalid binary operator")
135          end
136
137Binary operators start to get more interesting. The basic idea here is
138that we recursively emit code for the left-hand side of the expression,
139then the right-hand side, then we compute the result of the binary
140expression. In this code, we do a simple switch on the opcode to create
141the right LLVM instruction.
142
143In the example above, the LLVM builder class is starting to show its
144value. IRBuilder knows where to insert the newly created instruction,
145all you have to do is specify what instruction to create (e.g. with
146``Llvm.create_add``), which operands to use (``lhs`` and ``rhs`` here)
147and optionally provide a name for the generated instruction.
148
149One nice thing about LLVM is that the name is just a hint. For instance,
150if the code above emits multiple "addtmp" variables, LLVM will
151automatically provide each one with an increasing, unique numeric
152suffix. Local value names for instructions are purely optional, but it
153makes it much easier to read the IR dumps.
154
155`LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict
156rules: for example, the Left and Right operators of an `add
157instruction <../LangRef.html#add-instruction>`_ must have the same type, and the
158result type of the add must match the operand types. Because all values
159in Kaleidoscope are doubles, this makes for very simple code for add,
160sub and mul.
161
162On the other hand, LLVM specifies that the `fcmp
163instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a
164one bit integer). The problem with this is that Kaleidoscope wants the
165value to be a 0.0 or 1.0 value. In order to get these semantics, we
166combine the fcmp instruction with a `uitofp
167instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its
168input integer into a floating point value by treating the input as an
169unsigned value. In contrast, if we used the `sitofp
170instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator
171would return 0.0 and -1.0, depending on the input value.
172
173.. code-block:: ocaml
174
175      | Ast.Call (callee, args) ->
176          (* Look up the name in the module table. *)
177          let callee =
178            match lookup_function callee the_module with
179            | Some callee -> callee
180            | None -> raise (Error "unknown function referenced")
181          in
182          let params = params callee in
183
184          (* If argument mismatch error. *)
185          if Array.length params == Array.length args then () else
186            raise (Error "incorrect # arguments passed");
187          let args = Array.map codegen_expr args in
188          build_call callee args "calltmp" builder
189
190Code generation for function calls is quite straightforward with LLVM.
191The code above initially does a function name lookup in the LLVM
192Module's symbol table. Recall that the LLVM Module is the container that
193holds all of the functions we are JIT'ing. By giving each function the
194same name as what the user specifies, we can use the LLVM symbol table
195to resolve function names for us.
196
197Once we have the function to call, we recursively codegen each argument
198that is to be passed in, and create an LLVM `call
199instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C
200calling conventions by default, allowing these calls to also call into
201standard library functions like "sin" and "cos", with no additional
202effort.
203
204This wraps up our handling of the four basic expressions that we have so
205far in Kaleidoscope. Feel free to go in and add some more. For example,
206by browsing the `LLVM language reference <../LangRef.html>`_ you'll find
207several other interesting instructions that are really easy to plug into
208our basic framework.
209
210Function Code Generation
211========================
212
213Code generation for prototypes and functions must handle a number of
214details, which make their code less beautiful than expression code
215generation, but allows us to illustrate some important points. First,
216lets talk about code generation for prototypes: they are used both for
217function bodies and external function declarations. The code starts
218with:
219
220.. code-block:: ocaml
221
222    let codegen_proto = function
223      | Ast.Prototype (name, args) ->
224          (* Make the function type: double(double,double) etc. *)
225          let doubles = Array.make (Array.length args) double_type in
226          let ft = function_type double_type doubles in
227          let f =
228            match lookup_function name the_module with
229
230This code packs a lot of power into a few lines. Note first that this
231function returns a "Function\*" instead of a "Value\*" (although at the
232moment they both are modeled by ``llvalue`` in ocaml). Because a
233"prototype" really talks about the external interface for a function
234(not the value computed by an expression), it makes sense for it to
235return the LLVM Function it corresponds to when codegen'd.
236
237The call to ``Llvm.function_type`` creates the ``Llvm.llvalue`` that
238should be used for a given Prototype. Since all function arguments in
239Kaleidoscope are of type double, the first line creates a vector of "N"
240LLVM double types. It then uses the ``Llvm.function_type`` method to
241create a function type that takes "N" doubles as arguments, returns one
242double as a result, and that is not vararg (that uses the function
243``Llvm.var_arg_function_type``). Note that Types in LLVM are uniqued
244just like ``Constant``'s are, so you don't "new" a type, you "get" it.
245
246The final line above checks if the function has already been defined in
247``Codegen.the_module``. If not, we will create it.
248
249.. code-block:: ocaml
250
251            | None -> declare_function name ft the_module
252
253This indicates the type and name to use, as well as which module to
254insert into. By default we assume a function has
255``Llvm.Linkage.ExternalLinkage``. "`external
256linkage <../LangRef.html#linkage>`_" means that the function may be defined
257outside the current module and/or that it is callable by functions
258outside the module. The "``name``" passed in is the name the user
259specified: this name is registered in "``Codegen.the_module``"s symbol
260table, which is used by the function call code above.
261
262In Kaleidoscope, I choose to allow redefinitions of functions in two
263cases: first, we want to allow 'extern'ing a function more than once, as
264long as the prototypes for the externs match (since all arguments have
265the same type, we just have to check that the number of arguments
266match). Second, we want to allow 'extern'ing a function and then
267defining a body for it. This is useful when defining mutually recursive
268functions.
269
270.. code-block:: ocaml
271
272            (* If 'f' conflicted, there was already something named 'name'. If it
273             * has a body, don't allow redefinition or reextern. *)
274            | Some f ->
275                (* If 'f' already has a body, reject this. *)
276                if Array.length (basic_blocks f) == 0 then () else
277                  raise (Error "redefinition of function");
278
279                (* If 'f' took a different number of arguments, reject. *)
280                if Array.length (params f) == Array.length args then () else
281                  raise (Error "redefinition of function with different # args");
282                f
283          in
284
285In order to verify the logic above, we first check to see if the
286pre-existing function is "empty". In this case, empty means that it has
287no basic blocks in it, which means it has no body. If it has no body, it
288is a forward declaration. Since we don't allow anything after a full
289definition of the function, the code rejects this case. If the previous
290reference to a function was an 'extern', we simply verify that the
291number of arguments for that definition and this one match up. If not,
292we emit an error.
293
294.. code-block:: ocaml
295
296          (* Set names for all arguments. *)
297          Array.iteri (fun i a ->
298            let n = args.(i) in
299            set_value_name n a;
300            Hashtbl.add named_values n a;
301          ) (params f);
302          f
303
304The last bit of code for prototypes loops over all of the arguments in
305the function, setting the name of the LLVM Argument objects to match,
306and registering the arguments in the ``Codegen.named_values`` map for
307future use by the ``Ast.Variable`` variant. Once this is set up, it
308returns the Function object to the caller. Note that we don't check for
309conflicting argument names here (e.g. "extern foo(a b a)"). Doing so
310would be very straight-forward with the mechanics we have already used
311above.
312
313.. code-block:: ocaml
314
315    let codegen_func = function
316      | Ast.Function (proto, body) ->
317          Hashtbl.clear named_values;
318          let the_function = codegen_proto proto in
319
320Code generation for function definitions starts out simply enough: we
321just codegen the prototype (Proto) and verify that it is ok. We then
322clear out the ``Codegen.named_values`` map to make sure that there isn't
323anything in it from the last function we compiled. Code generation of
324the prototype ensures that there is an LLVM Function object that is
325ready to go for us.
326
327.. code-block:: ocaml
328
329          (* Create a new basic block to start insertion into. *)
330          let bb = append_block context "entry" the_function in
331          position_at_end bb builder;
332
333          try
334            let ret_val = codegen_expr body in
335
336Now we get to the point where the ``Codegen.builder`` is set up. The
337first line creates a new `basic
338block <http://en.wikipedia.org/wiki/Basic_block>`_ (named "entry"),
339which is inserted into ``the_function``. The second line then tells the
340builder that new instructions should be inserted into the end of the new
341basic block. Basic blocks in LLVM are an important part of functions
342that define the `Control Flow
343Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we
344don't have any control flow, our functions will only contain one block
345at this point. We'll fix this in `Chapter 5 <OCamlLangImpl5.html>`_ :).
346
347.. code-block:: ocaml
348
349            let ret_val = codegen_expr body in
350
351            (* Finish off the function. *)
352            let _ = build_ret ret_val builder in
353
354            (* Validate the generated code, checking for consistency. *)
355            Llvm_analysis.assert_valid_function the_function;
356
357            the_function
358
359Once the insertion point is set up, we call the ``Codegen.codegen_func``
360method for the root expression of the function. If no error happens,
361this emits code to compute the expression into the entry block and
362returns the value that was computed. Assuming no error, we then create
363an LLVM `ret instruction <../LangRef.html#ret-instruction>`_, which completes the
364function. Once the function is built, we call
365``Llvm_analysis.assert_valid_function``, which is provided by LLVM. This
366function does a variety of consistency checks on the generated code, to
367determine if our compiler is doing everything right. Using this is
368important: it can catch a lot of bugs. Once the function is finished and
369validated, we return it.
370
371.. code-block:: ocaml
372
373          with e ->
374            delete_function the_function;
375            raise e
376
377The only piece left here is handling of the error case. For simplicity,
378we handle this by merely deleting the function we produced with the
379``Llvm.delete_function`` method. This allows the user to redefine a
380function that they incorrectly typed in before: if we didn't delete it,
381it would live in the symbol table, with a body, preventing future
382redefinition.
383
384This code does have a bug, though. Since the ``Codegen.codegen_proto``
385can return a previously defined forward declaration, our code can
386actually delete a forward declaration. There are a number of ways to fix
387this bug, see what you can come up with! Here is a testcase:
388
389::
390
391    extern foo(a b);     # ok, defines foo.
392    def foo(a b) c;      # error, 'c' is invalid.
393    def bar() foo(1, 2); # error, unknown function "foo"
394
395Driver Changes and Closing Thoughts
396===================================
397
398For now, code generation to LLVM doesn't really get us much, except that
399we can look at the pretty IR calls. The sample code inserts calls to
400Codegen into the "``Toplevel.main_loop``", and then dumps out the LLVM
401IR. This gives a nice way to look at the LLVM IR for simple functions.
402For example:
403
404::
405
406    ready> 4+5;
407    Read top-level expression:
408    define double @""() {
409    entry:
410            %addtmp = fadd double 4.000000e+00, 5.000000e+00
411            ret double %addtmp
412    }
413
414Note how the parser turns the top-level expression into anonymous
415functions for us. This will be handy when we add `JIT
416support <OCamlLangImpl4.html#adding-a-jit-compiler>`_ in the next chapter. Also note that
417the code is very literally transcribed, no optimizations are being
418performed. We will `add
419optimizations <OCamlLangImpl4.html#trivial-constant-folding>`_ explicitly in the
420next chapter.
421
422::
423
424    ready> def foo(a b) a*a + 2*a*b + b*b;
425    Read function definition:
426    define double @foo(double %a, double %b) {
427    entry:
428            %multmp = fmul double %a, %a
429            %multmp1 = fmul double 2.000000e+00, %a
430            %multmp2 = fmul double %multmp1, %b
431            %addtmp = fadd double %multmp, %multmp2
432            %multmp3 = fmul double %b, %b
433            %addtmp4 = fadd double %addtmp, %multmp3
434            ret double %addtmp4
435    }
436
437This shows some simple arithmetic. Notice the striking similarity to the
438LLVM builder calls that we use to create the instructions.
439
440::
441
442    ready> def bar(a) foo(a, 4.0) + bar(31337);
443    Read function definition:
444    define double @bar(double %a) {
445    entry:
446            %calltmp = call double @foo(double %a, double 4.000000e+00)
447            %calltmp1 = call double @bar(double 3.133700e+04)
448            %addtmp = fadd double %calltmp, %calltmp1
449            ret double %addtmp
450    }
451
452This shows some function calls. Note that this function will take a long
453time to execute if you call it. In the future we'll add conditional
454control flow to actually make recursion useful :).
455
456::
457
458    ready> extern cos(x);
459    Read extern:
460    declare double @cos(double)
461
462    ready> cos(1.234);
463    Read top-level expression:
464    define double @""() {
465    entry:
466            %calltmp = call double @cos(double 1.234000e+00)
467            ret double %calltmp
468    }
469
470This shows an extern for the libm "cos" function, and a call to it.
471
472::
473
474    ready> ^D
475    ; ModuleID = 'my cool jit'
476
477    define double @""() {
478    entry:
479            %addtmp = fadd double 4.000000e+00, 5.000000e+00
480            ret double %addtmp
481    }
482
483    define double @foo(double %a, double %b) {
484    entry:
485            %multmp = fmul double %a, %a
486            %multmp1 = fmul double 2.000000e+00, %a
487            %multmp2 = fmul double %multmp1, %b
488            %addtmp = fadd double %multmp, %multmp2
489            %multmp3 = fmul double %b, %b
490            %addtmp4 = fadd double %addtmp, %multmp3
491            ret double %addtmp4
492    }
493
494    define double @bar(double %a) {
495    entry:
496            %calltmp = call double @foo(double %a, double 4.000000e+00)
497            %calltmp1 = call double @bar(double 3.133700e+04)
498            %addtmp = fadd double %calltmp, %calltmp1
499            ret double %addtmp
500    }
501
502    declare double @cos(double)
503
504    define double @""() {
505    entry:
506            %calltmp = call double @cos(double 1.234000e+00)
507            ret double %calltmp
508    }
509
510When you quit the current demo, it dumps out the IR for the entire
511module generated. Here you can see the big picture with all the
512functions referencing each other.
513
514This wraps up the third chapter of the Kaleidoscope tutorial. Up next,
515we'll describe how to `add JIT codegen and optimizer
516support <OCamlLangImpl4.html>`_ to this so we can actually start running
517code!
518
519Full Code Listing
520=================
521
522Here is the complete code listing for our running example, enhanced with
523the LLVM code generator. Because this uses the LLVM libraries, we need
524to link them in. To do this, we use the
525`llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform
526our makefile/command line about which options to use:
527
528.. code-block:: bash
529
530    # Compile
531    ocamlbuild toy.byte
532    # Run
533    ./toy.byte
534
535Here is the code:
536
537\_tags:
538    ::
539
540        <{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
541        <*.{byte,native}>: g++, use_llvm, use_llvm_analysis
542
543myocamlbuild.ml:
544    .. code-block:: ocaml
545
546        open Ocamlbuild_plugin;;
547
548        ocaml_lib ~extern:true "llvm";;
549        ocaml_lib ~extern:true "llvm_analysis";;
550
551        flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);;
552
553token.ml:
554    .. code-block:: ocaml
555
556        (*===----------------------------------------------------------------------===
557         * Lexer Tokens
558         *===----------------------------------------------------------------------===*)
559
560        (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
561         * these others for known things. *)
562        type token =
563          (* commands *)
564          | Def | Extern
565
566          (* primary *)
567          | Ident of string | Number of float
568
569          (* unknown *)
570          | Kwd of char
571
572lexer.ml:
573    .. code-block:: ocaml
574
575        (*===----------------------------------------------------------------------===
576         * Lexer
577         *===----------------------------------------------------------------------===*)
578
579        let rec lex = parser
580          (* Skip any whitespace. *)
581          | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
582
583          (* identifier: [a-zA-Z][a-zA-Z0-9] *)
584          | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
585              let buffer = Buffer.create 1 in
586              Buffer.add_char buffer c;
587              lex_ident buffer stream
588
589          (* number: [0-9.]+ *)
590          | [< ' ('0' .. '9' as c); stream >] ->
591              let buffer = Buffer.create 1 in
592              Buffer.add_char buffer c;
593              lex_number buffer stream
594
595          (* Comment until end of line. *)
596          | [< ' ('#'); stream >] ->
597              lex_comment stream
598
599          (* Otherwise, just return the character as its ascii value. *)
600          | [< 'c; stream >] ->
601              [< 'Token.Kwd c; lex stream >]
602
603          (* end of stream. *)
604          | [< >] -> [< >]
605
606        and lex_number buffer = parser
607          | [< ' ('0' .. '9' | '.' as c); stream >] ->
608              Buffer.add_char buffer c;
609              lex_number buffer stream
610          | [< stream=lex >] ->
611              [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
612
613        and lex_ident buffer = parser
614          | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
615              Buffer.add_char buffer c;
616              lex_ident buffer stream
617          | [< stream=lex >] ->
618              match Buffer.contents buffer with
619              | "def" -> [< 'Token.Def; stream >]
620              | "extern" -> [< 'Token.Extern; stream >]
621              | id -> [< 'Token.Ident id; stream >]
622
623        and lex_comment = parser
624          | [< ' ('\n'); stream=lex >] -> stream
625          | [< 'c; e=lex_comment >] -> e
626          | [< >] -> [< >]
627
628ast.ml:
629    .. code-block:: ocaml
630
631        (*===----------------------------------------------------------------------===
632         * Abstract Syntax Tree (aka Parse Tree)
633         *===----------------------------------------------------------------------===*)
634
635        (* expr - Base type for all expression nodes. *)
636        type expr =
637          (* variant for numeric literals like "1.0". *)
638          | Number of float
639
640          (* variant for referencing a variable, like "a". *)
641          | Variable of string
642
643          (* variant for a binary operator. *)
644          | Binary of char * expr * expr
645
646          (* variant for function calls. *)
647          | Call of string * expr array
648
649        (* proto - This type represents the "prototype" for a function, which captures
650         * its name, and its argument names (thus implicitly the number of arguments the
651         * function takes). *)
652        type proto = Prototype of string * string array
653
654        (* func - This type represents a function definition itself. *)
655        type func = Function of proto * expr
656
657parser.ml:
658    .. code-block:: ocaml
659
660        (*===---------------------------------------------------------------------===
661         * Parser
662         *===---------------------------------------------------------------------===*)
663
664        (* binop_precedence - This holds the precedence for each binary operator that is
665         * defined *)
666        let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
667
668        (* precedence - Get the precedence of the pending binary operator token. *)
669        let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
670
671        (* primary
672         *   ::= identifier
673         *   ::= numberexpr
674         *   ::= parenexpr *)
675        let rec parse_primary = parser
676          (* numberexpr ::= number *)
677          | [< 'Token.Number n >] -> Ast.Number n
678
679          (* parenexpr ::= '(' expression ')' *)
680          | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
681
682          (* identifierexpr
683           *   ::= identifier
684           *   ::= identifier '(' argumentexpr ')' *)
685          | [< 'Token.Ident id; stream >] ->
686              let rec parse_args accumulator = parser
687                | [< e=parse_expr; stream >] ->
688                    begin parser
689                      | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
690                      | [< >] -> e :: accumulator
691                    end stream
692                | [< >] -> accumulator
693              in
694              let rec parse_ident id = parser
695                (* Call. *)
696                | [< 'Token.Kwd '(';
697                     args=parse_args [];
698                     'Token.Kwd ')' ?? "expected ')'">] ->
699                    Ast.Call (id, Array.of_list (List.rev args))
700
701                (* Simple variable ref. *)
702                | [< >] -> Ast.Variable id
703              in
704              parse_ident id stream
705
706          | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
707
708        (* binoprhs
709         *   ::= ('+' primary)* *)
710        and parse_bin_rhs expr_prec lhs stream =
711          match Stream.peek stream with
712          (* If this is a binop, find its precedence. *)
713          | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
714              let token_prec = precedence c in
715
716              (* If this is a binop that binds at least as tightly as the current binop,
717               * consume it, otherwise we are done. *)
718              if token_prec < expr_prec then lhs else begin
719                (* Eat the binop. *)
720                Stream.junk stream;
721
722                (* Parse the primary expression after the binary operator. *)
723                let rhs = parse_primary stream in
724
725                (* Okay, we know this is a binop. *)
726                let rhs =
727                  match Stream.peek stream with
728                  | Some (Token.Kwd c2) ->
729                      (* If BinOp binds less tightly with rhs than the operator after
730                       * rhs, let the pending operator take rhs as its lhs. *)
731                      let next_prec = precedence c2 in
732                      if token_prec < next_prec
733                      then parse_bin_rhs (token_prec + 1) rhs stream
734                      else rhs
735                  | _ -> rhs
736                in
737
738                (* Merge lhs/rhs. *)
739                let lhs = Ast.Binary (c, lhs, rhs) in
740                parse_bin_rhs expr_prec lhs stream
741              end
742          | _ -> lhs
743
744        (* expression
745         *   ::= primary binoprhs *)
746        and parse_expr = parser
747          | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
748
749        (* prototype
750         *   ::= id '(' id* ')' *)
751        let parse_prototype =
752          let rec parse_args accumulator = parser
753            | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
754            | [< >] -> accumulator
755          in
756
757          parser
758          | [< 'Token.Ident id;
759               'Token.Kwd '(' ?? "expected '(' in prototype";
760               args=parse_args [];
761               'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
762              (* success. *)
763              Ast.Prototype (id, Array.of_list (List.rev args))
764
765          | [< >] ->
766              raise (Stream.Error "expected function name in prototype")
767
768        (* definition ::= 'def' prototype expression *)
769        let parse_definition = parser
770          | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
771              Ast.Function (p, e)
772
773        (* toplevelexpr ::= expression *)
774        let parse_toplevel = parser
775          | [< e=parse_expr >] ->
776              (* Make an anonymous proto. *)
777              Ast.Function (Ast.Prototype ("", [||]), e)
778
779        (*  external ::= 'extern' prototype *)
780        let parse_extern = parser
781          | [< 'Token.Extern; e=parse_prototype >] -> e
782
783codegen.ml:
784    .. code-block:: ocaml
785
786        (*===----------------------------------------------------------------------===
787         * Code Generation
788         *===----------------------------------------------------------------------===*)
789
790        open Llvm
791
792        exception Error of string
793
794        let context = global_context ()
795        let the_module = create_module context "my cool jit"
796        let builder = builder context
797        let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
798        let double_type = double_type context
799
800        let rec codegen_expr = function
801          | Ast.Number n -> const_float double_type n
802          | Ast.Variable name ->
803              (try Hashtbl.find named_values name with
804                | Not_found -> raise (Error "unknown variable name"))
805          | Ast.Binary (op, lhs, rhs) ->
806              let lhs_val = codegen_expr lhs in
807              let rhs_val = codegen_expr rhs in
808              begin
809                match op with
810                | '+' -> build_add lhs_val rhs_val "addtmp" builder
811                | '-' -> build_sub lhs_val rhs_val "subtmp" builder
812                | '*' -> build_mul lhs_val rhs_val "multmp" builder
813                | '<' ->
814                    (* Convert bool 0/1 to double 0.0 or 1.0 *)
815                    let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
816                    build_uitofp i double_type "booltmp" builder
817                | _ -> raise (Error "invalid binary operator")
818              end
819          | Ast.Call (callee, args) ->
820              (* Look up the name in the module table. *)
821              let callee =
822                match lookup_function callee the_module with
823                | Some callee -> callee
824                | None -> raise (Error "unknown function referenced")
825              in
826              let params = params callee in
827
828              (* If argument mismatch error. *)
829              if Array.length params == Array.length args then () else
830                raise (Error "incorrect # arguments passed");
831              let args = Array.map codegen_expr args in
832              build_call callee args "calltmp" builder
833
834        let codegen_proto = function
835          | Ast.Prototype (name, args) ->
836              (* Make the function type: double(double,double) etc. *)
837              let doubles = Array.make (Array.length args) double_type in
838              let ft = function_type double_type doubles in
839              let f =
840                match lookup_function name the_module with
841                | None -> declare_function name ft the_module
842
843                (* If 'f' conflicted, there was already something named 'name'. If it
844                 * has a body, don't allow redefinition or reextern. *)
845                | Some f ->
846                    (* If 'f' already has a body, reject this. *)
847                    if block_begin f <> At_end f then
848                      raise (Error "redefinition of function");
849
850                    (* If 'f' took a different number of arguments, reject. *)
851                    if element_type (type_of f) <> ft then
852                      raise (Error "redefinition of function with different # args");
853                    f
854              in
855
856              (* Set names for all arguments. *)
857              Array.iteri (fun i a ->
858                let n = args.(i) in
859                set_value_name n a;
860                Hashtbl.add named_values n a;
861              ) (params f);
862              f
863
864        let codegen_func = function
865          | Ast.Function (proto, body) ->
866              Hashtbl.clear named_values;
867              let the_function = codegen_proto proto in
868
869              (* Create a new basic block to start insertion into. *)
870              let bb = append_block context "entry" the_function in
871              position_at_end bb builder;
872
873              try
874                let ret_val = codegen_expr body in
875
876                (* Finish off the function. *)
877                let _ = build_ret ret_val builder in
878
879                (* Validate the generated code, checking for consistency. *)
880                Llvm_analysis.assert_valid_function the_function;
881
882                the_function
883              with e ->
884                delete_function the_function;
885                raise e
886
887toplevel.ml:
888    .. code-block:: ocaml
889
890        (*===----------------------------------------------------------------------===
891         * Top-Level parsing and JIT Driver
892         *===----------------------------------------------------------------------===*)
893
894        open Llvm
895
896        (* top ::= definition | external | expression | ';' *)
897        let rec main_loop stream =
898          match Stream.peek stream with
899          | None -> ()
900
901          (* ignore top-level semicolons. *)
902          | Some (Token.Kwd ';') ->
903              Stream.junk stream;
904              main_loop stream
905
906          | Some token ->
907              begin
908                try match token with
909                | Token.Def ->
910                    let e = Parser.parse_definition stream in
911                    print_endline "parsed a function definition.";
912                    dump_value (Codegen.codegen_func e);
913                | Token.Extern ->
914                    let e = Parser.parse_extern stream in
915                    print_endline "parsed an extern.";
916                    dump_value (Codegen.codegen_proto e);
917                | _ ->
918                    (* Evaluate a top-level expression into an anonymous function. *)
919                    let e = Parser.parse_toplevel stream in
920                    print_endline "parsed a top-level expr";
921                    dump_value (Codegen.codegen_func e);
922                with Stream.Error s | Codegen.Error s ->
923                  (* Skip token for error recovery. *)
924                  Stream.junk stream;
925                  print_endline s;
926              end;
927              print_string "ready> "; flush stdout;
928              main_loop stream
929
930toy.ml:
931    .. code-block:: ocaml
932
933        (*===----------------------------------------------------------------------===
934         * Main driver code.
935         *===----------------------------------------------------------------------===*)
936
937        open Llvm
938
939        let main () =
940          (* Install standard binary operators.
941           * 1 is the lowest precedence. *)
942          Hashtbl.add Parser.binop_precedence '<' 10;
943          Hashtbl.add Parser.binop_precedence '+' 20;
944          Hashtbl.add Parser.binop_precedence '-' 20;
945          Hashtbl.add Parser.binop_precedence '*' 40;    (* highest. *)
946
947          (* Prime the first token. *)
948          print_string "ready> "; flush stdout;
949          let stream = Lexer.lex (Stream.of_channel stdin) in
950
951          (* Run the main "interpreter loop" now. *)
952          Toplevel.main_loop stream;
953
954          (* Print out all the generated code. *)
955          dump_module Codegen.the_module
956        ;;
957
958        main ()
959
960`Next: Adding JIT and Optimizer Support <OCamlLangImpl4.html>`_
961
962