1<!--===- docs/C++17.md 2 3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4 See https://llvm.org/LICENSE.txt for license information. 5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 6 7--> 8 9# C++14/17 features used in f18 10 11```eval_rst 12.. contents:: 13 :local: 14``` 15 16The C++ dialect used in this project constitutes a subset of the 17standard C++ programming language and library features. 18We want our dialect to be compatible with the LLVM C++ language 19subset that will be in use at the time that we integrate with that 20project. 21We also want to maximize portability, future-proofing, 22compile-time error checking, and use of best practices. 23 24To that end, we have a C++ style guide (q.v.) that lays 25out the details of how our C++ code should look and gives 26guidance about feature usage. 27 28We have chosen to use some features of the recent C++17 29language standard in f18. 30The most important of these are: 31* sum types (discriminated unions) in the form of `std::variant` 32* `using` template parameter packs 33* generic lambdas with `auto` argument types 34* product types in the form of `std::tuple` 35* `std::optional` 36 37(`std::tuple` is actually a C++11 feature, but I include it 38in this list because it's not particularly well known.) 39 40## Sum types 41 42First, some background information to explain the need for sum types 43in f18. 44 45Fortran is notoriously problematic to lex and parse, as tokenization 46depends on the state of the partial parse; 47the language has no reserved words in the sense that C++ does. 48Fortran parsers implemented with distinct lexing and parsing phases 49(generated by hand or with tools) need to implement them as 50coroutines with complicated state, and experience has shown that 51it's hard to get them right and harder to extend them as the language 52evolves. 53 54Alternatively, with the use of backtracking, one can parse Fortran with 55a unified lexer/parser. 56We have chosen to do so because it is simpler and should reduce 57both initial bugs and long-term maintenance. 58 59Specifically, f18's parser uses the technique of recursive descent with 60backtracking. 61It is constructed as the incremental composition of pure parsing functions 62that each, when given a context (location in the input stream plus some state), 63either _succeeds_ or _fails_ to recognize some piece of Fortran. 64On success, they return a new state and some semantic value, and this is 65usually an instance of a C++ `struct` type that encodes the semantic 66content of a production in the Fortran grammar. 67 68This technique allows us to specify both the Fortran grammar and the 69representation of successfully parsed programs with C++ code 70whose functions and data structures correspond closely to the productions 71of Fortran. 72 73The specification of Fortran uses a form of BNF with alternatives, 74optional elements, sequences, and lists. Each of these constructs 75in the Fortran grammar maps directly in the f18 parser to both 76the means of combining other parsers as alternatives, &c., and to 77the declarations of the parse tree data structures that represent 78the results of successful parses. 79Move semantics are used in the parsing functions to acquire and 80combine the results of sub-parses into the result of a larger 81parse. 82 83To represent nodes in the Fortran parse tree, we need a means of 84handling sum types for productions that have multiple alternatives. 85The bounded polymorphism supplied by the C++17 `std::variant` fits 86those needs exactly. 87For example, production R502 in Fortran defines the top-level 88program unit of Fortran as being a function, subroutine, module, &c. 89The `struct ProgramUnit` in the f18 parse tree header file 90represents each program unit with a member that is a `std::variant` 91over the six possibilities. 92Similarly, the parser for that type in the f18 grammar has six alternatives, 93each of which constructs an instance of `ProgramUnit` upon the result of 94parsing a `Module`, `FunctionSubprogram`, and so on. 95 96Code that performs semantic analysis on the result of a successful 97parse is typically implemented with overloaded functions. 98A function instantiated on `ProgramUnit` will use `std::visit` to 99identify the right alternative and perform the right actions. 100The call to `std::visit` must pass a visitor that can handle all 101of the possibilities, and f18 will fail to build if one is missing. 102 103Were we unable to use `std::variant` directly, we would likely 104have chosen to implement a local `SumType` replacement; in the 105absence of C++17's abilities of `using` a template parameter pack 106and allowing `auto` arguments in anonymous lambda functions, 107it would be less convenient to use. 108 109The other options for polymorphism in C++ at the level of C++11 110would be to: 111* loosen up compile-time type safety and use a unified parse tree node 112 representation with an enumeration type for an operator and generic 113 subtree pointers, or 114* define the sum types for the parse tree as abstract base classes from 115 which each particular alternative would derive, and then use virtual 116 functions (or the forbidden `dynamic_cast`) to identify alternatives 117 during analysis 118 119## Product types 120 121Many productions in the Fortran grammar describe a sequence of various 122sub-parses. 123For example, R504 defines the things that may appear in the "specification 124part" of a subprogram in the order in which they are allowed: `USE` 125statements, then `IMPORT` statements, and so on. 126 127The parse tree node that represents such a thing needs to incorporate 128the representations of those parses, of course. 129It turns out to be convenient to allow these data members to be anonymous 130components of a `std::tuple` product type. 131This type facilitates the automation of code that walks over all of the 132members in a type-safe fashion and avoids the need to invent and remember 133needless member names -- the components of a `std::tuple` instance can 134be identified and accessed in terms of their types, and those tend to be 135distinct. 136 137So we use `std::tuple` for such things. 138It has also been handy for template metaprogramming that needs to work 139with lists of types. 140 141## `std::optional` 142 143This simple little type is used wherever a value might or might not be 144present. 145It is especially useful for function results and 146rvalue reference arguments. 147It corresponds directly to the optional elements in the productions 148of the Fortran grammar. 149It is also used as a wrapper around a parse tree node type to define the 150results of the various parsing functions, where presence of a value 151signifies a successful recognition and absence denotes a failed parse. 152It is used in data structures in place of nullable pointers to 153avoid indirection as well as the possible confusion over whether a pointer 154is allowed to be null. 155