1<!--===- docs/Semantics.md 2 3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4 See https://llvm.org/LICENSE.txt for license information. 5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 6 7--> 8 9# Semantic Analysis 10 11```eval_rst 12.. contents:: 13 :local: 14``` 15 16The semantic analysis pass determines if a syntactically correct Fortran 17program is is legal by enforcing the constraints of the language. 18 19The input is a parse tree with a `Program` node at the root; 20and a "cooked" character stream, a contiguous stream of characters 21containing a normalized form of the Fortran source. 22 23The semantic analysis pass takes a parse tree for a syntactically 24correct Fortran program and determines whether it is legal by enforcing 25the constraints of the language. 26 27If the program is not legal, the results of the semantic pass will be a list of 28errors associated with the program. 29 30If the program is legal, the semantic pass will produce a (possibly modified) 31parse tree for the semantically correct program with each name mapped to a symbol 32and each expression fully analyzed. 33 34All user errors are detected either prior to or during semantic analysis. 35After it completes successfully the program should compile with no error messages. 36There may still be warnings or informational messages. 37 38## Phases of Semantic Analysis 39 401. [Validate labels](#validate-labels) - 41 Check all constraints on labels and branches 422. [Rewrite DO loops](#rewrite-do-loops) - 43 Convert all occurrences of `LabelDoStmt` to `DoConstruct`. 443. [Name resolution](#name-resolution) - 45 Analyze names and declarations, build a tree of Scopes containing Symbols, 46 and fill in the `Name::symbol` data member in the parse tree 474. [Rewrite parse tree](#rewrite-parse-tree) - 48 Fix incorrect parses based on symbol information 495. [Expression analysis](#expression-analysis) - 50 Analyze all expressions in the parse tree and fill in `Expr::typedExpr` and 51 `Variable::typedExpr` with analyzed expressions; fix incorrect parses 52 based on the result of this analysis 536. [Statement semantics](#statement-semantics) - 54 Perform remaining semantic checks on the execution parts of subprograms 557. [Write module files](#write-module-files) - 56 If no errors have occurred, write out `.mod` files for modules and submodules 57 58If phase 1 or phase 2 encounter an error on any of the program units, 59compilation terminates. Otherwise, phases 3-6 are all performed even if 60errors occur. 61Module files are written (phase 7) only if there are no errors. 62 63### Validate labels 64 65Perform semantic checks related to labels and branches: 66- check that any labels that are referenced are defined and in scope 67- check branches into loop bodies 68- check that labeled `DO` loops are properly nested 69- check labels in data transfer statements 70 71### Rewrite DO loops 72 73This phase normalizes the parse tree by removing all unstructured `DO` loops 74and replacing them with `DO` constructs. 75 76### Name resolution 77 78The name resolution phase walks the parse tree and constructs the symbol table. 79 80The symbol table consists of a tree of `Scope` objects rooted at the global scope. 81The global scope is owned by the `SemanticsContext` object. 82It contains a `Scope` for each program unit in the compilation. 83 84Each `Scope` in the scope tree contains child scopes representing other scopes 85lexically nested in it. 86Each `Scope` also contains a map of `CharBlock` to `Symbol` representing names 87declared in that scope. (All names in the symbol table are represented as 88`CharBlock` objects, i.e. as substrings of the cooked character stream.) 89 90All `Symbol` objects are owned by the symbol table data structures. 91They should be accessed as `Symbol *` or `Symbol &` outside of the symbol 92table classes as they can't be created, copied, or moved. 93The `Symbol` class has functions and data common across all symbols, and a 94`details` field that contains more information specific to that type of symbol. 95Many symbols also have types, represented by `DeclTypeSpec`. 96Types are also owned by scopes. 97 98Name resolution happens on the parse tree in this order: 991. Process the specification of a program unit: 100 1. Create a new scope for the unit 101 2. Create a symbol for each contained subprogram containing just the name 102 3. Process the opening statement of the unit (`ModuleStmt`, `FunctionStmt`, etc.) 103 4. Process the specification part of the unit 1042. Apply the same process recursively to nested subprograms 1053. Process the execution part of the program unit 1064. Process the execution parts of nested subprograms recursively 107 108After the completion of this phase, every `Name` corresponds to a `Symbol` 109unless an error occurred. 110 111### Rewrite parse tree 112 113The parser cannot build a completely correct parse tree without symbol information. 114This phase corrects mis-parses based on symbols: 115- Array element assignments may be parsed as statement functions: `a(i) = ...` 116- Namelist group names without `NML=` may be parsed as format expressions 117- A file unit number expression may be parsed as a character variable 118 119This phase also produces an internal error if it finds a `Name` that does not 120have its `symbol` data member filled in. This error is suppressed if other 121errors have occurred because in that case a `Name` corresponding to an erroneous 122symbol may not be resolved. 123 124### Expression analysis 125 126Expressions that occur in the specification part are analyzed during name 127resolution, for example, initial values, array bounds, type parameters. 128Any remaining expressions are analyzed in this phase. 129 130For each `Variable` and top-level `Expr` (i.e. one that is not nested below 131another `Expr` in the parse tree) the analyzed form of the expression is saved 132in the `typedExpr` data member. After this phase has completed, the analyzed 133expression can be accessed using `semantics::GetExpr()`. 134 135This phase also corrects mis-parses based on the result of expression analysis: 136- An expression like `a(b)` is parsed as a function reference but may need 137 to be rewritten to an array element reference (if `a` is an object entity) 138 or to a structure constructor (if `a` is a derive type) 139- An expression like `a(b:c)` is parsed as an array section but may need to be 140 rewritten as a substring if `a` is an object with type CHARACTER 141 142### Statement semantics 143 144Multiple independent checkers driven by the `SemanticsVisitor` framework 145perform the remaining semantic checks. 146By this phase, all names and expressions that can be successfully resolved 147have been. But there may be names without symbols or expressions without 148analyzed form if errors occurred earlier. 149 150### Initialization processing 151 152Fortran supports many means of specifying static initializers for variables, 153object pointers, and procedure pointers, as well as default initializers for 154derived type object components, pointers, and type parameters. 155 156Non-pointer static initializers of variables and named constants are 157scanned, analyzed, folded, scalar-expanded, and validated as they are 158traversed during declaration processing in name resolution. 159So are the default initializers of non-pointer object components in 160non-parameterized derived types. 161Name constant arrays with implied shapes take their actual shape from 162the initialization expression. 163 164Default initializers of non-pointer components and type parameters 165in distinct parameterized 166derived type instantiations are similarly processed as those instances 167are created, as their expressions may depend on the values of type 168parameters. 169Error messages produced during parameterized derived type instantiation 170are decorated with contextual attachments that point to the declarations 171or other type specifications that caused the instantiation. 172 173Static initializations in `DATA` statements are collected, validated, 174and converted into static initialization in the symbol table, as if 175the initialized objects had used the newer style of static initialization 176in their entity declarations. 177 178All statically initialized pointers, and default component initializers for 179pointers, are processed late in name resolution after all specification parts 180have been traversed. 181This allows for forward references even in the presence of `IMPLICIT NONE`. 182Object pointer initializers in parameterized derived type instantiations are 183also cloned and folded at this late stage. 184Validation of pointer initializers takes place later in declaration 185checking (below). 186 187### Declaration checking 188 189Whenever possible, the enforcement of constraints and "shalls" pertaining to 190properties of symbols is deferred to a single read-only pass over the symbol table 191that takes place after all name resolution and typing is complete. 192 193### Write module files 194 195Separate compilation information is written out on successful compilation 196of modules and submodules. These are used as input to name resolution 197in program units that `USE` the modules. 198 199Module files are stripped down Fortran source for the module. 200Parts that aren't needed to compile dependent program units (e.g. action statements) 201are omitted. 202 203The module file for module `m` is named `m.mod` and the module file for 204submodule `s` of module `m` is named `m-s.mod`. 205