1<!--===- docs/IORuntimeInternals.md
2
3   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4   See https://llvm.org/LICENSE.txt for license information.
5   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6
7-->
8
9# Fortran I/O Runtime Library Internal Design
10
11```eval_rst
12.. contents::
13   :local:
14```
15
16This note is meant to be an overview of the design of the *implementation*
17of the f18 Fortran compiler's runtime support library for I/O statements.
18
19The *interface* to the I/O runtime support library is defined in the
20C++ header file `runtime/io-api.h`.
21This interface was designed to minimize the amount of complexity exposed
22to its clients, which are of course the sequences of calls generated by
23the compiler to implement each I/O statement.
24By keeping this interface as simple as possible, we hope that we have
25lowered the risk of future incompatible changes that would necessitate
26recompilation of Fortran codes in order to link with later versions of
27the runtime library.
28As one will see in `io-api.h`, the interface is also directly callable
29from C and C++ programs.
30
31The I/O facilities of the Fortran 2018 language are specified in the
32language standard in its clauses 12 (I/O statements) and 13 (`FORMAT`).
33It's a complicated collection of language features:
34 * Files can comprise *records* or *streams*.
35 * Records can be fixed-length or variable-length.
36 * Record files can be accessed sequentially or directly (random access).
37 * Files can be *formatted*, or *unformatted* raw bits.
38 * `CHARACTER` scalars and arrays can be used as if they were
39fixed-length formatted sequential record files.
40 * Formatted I/O can be under control of a `FORMAT` statement
41or `FMT=` specifier, *list-directed* with default formatting chosen
42by the runtime, or `NAMELIST`, in which a collection of variables
43can be given a name and passed as a group to the runtime library.
44 * Sequential records of a file can be partially processed by one
45or more *non-advancing* I/O statements and eventually completed by
46another.
47 * `FORMAT` strings can manipulate the position in the current
48record arbitrarily, causing re-reading or overwriting.
49 * Floating-point output formatting supports more rounding modes
50than the IEEE standard for floating-point arithmetic.
51
52The Fortran I/O runtime support library is written in C++17, and
53uses some C++17 standard library facilities, but it is intended
54to not have any link-time dependences on the C++ runtime support
55library or any LLVM libraries.
56This is important because there are at least two C++ runtime support
57libraries, and we don't want Fortran application builders to have to
58build multiple versions of their codes; neither do we want to require
59them to ship LLVM libraries along with their products.
60
61Consequently, dynamic memory allocation in the Fortran runtime
62uses only C's `malloc()` and `free()` functions, and the few
63C++ standard class templates that we instantiate in the library have been
64modified with optional template arguments that override their
65allocators and deallocators.
66
67Conversions between the many binary floating-point formats supported
68by f18 and their decimal representations are performed with the same
69template library of fast conversion algorithms used to interpret
70floating-point values in Fortran source programs and to emit them
71to module files.
72
73## Overview of Classes
74
75A suite of C++ classes and class templates are composed to construct
76the Fortran I/O runtime support library.
77They (mostly) reside in the C++ namespace `Fortran::runtime::io`.
78They are summarized here in a bottom-up order of dependence.
79
80The header and C++ implementation source file names of these
81classes are in the process of being vigorously rearranged and
82modified; use `grep` or an IDE to discover these classes in
83the source for now.  (Sorry!)
84
85### `Terminator`
86
87A general facility for the entire library, `Terminator` latches a
88source program statement location in terms of an unowned pointer to
89its source file path name and line number and uses them to construct
90a fatal error message if needed.
91It is used for both user program errors and internal runtime library crashes.
92
93### `IoErrorHandler`
94
95When I/O error conditions arise at runtime that the Fortran program
96might have the privilege to handle itself via `ERR=`, `END=`, or
97`EOR=` labels and/or by an `IOSTAT=` variable, this subclass of
98`Terminator` is used to either latch the error indication or to crash.
99It sorts out priorities in the case of multiple errors and determines
100the final `IOSTAT=` value at the end of an I/O statement.
101
102### `MutableModes`
103
104Fortran's formatted I/O statements are affected by a suite of
105modes that can be configured by `OPEN` statements, overridden by
106data transfer I/O statement control lists, and further overridden
107between data items with control edit descriptors in a `FORMAT` string.
108These modes are represented with a `MutableModes` instance, and these
109are instantiated and copied where one would expect them to be in
110order to properly isolate their modifications.
111The modes in force at the time each data item is processed constitute
112a member of each `DataEdit`.
113
114### `DataEdit`
115
116Represents a single data edit descriptor from a `FORMAT` statement
117or `FMT=` character value, with some hidden extensions to also
118support formatting of list-directed transfers.
119It holds an instance of `MutableModes`, and also has a repetition
120count for when an array appears as a data item in the *io-list*.
121For simplicity and efficiency, each data edit descriptor is
122encoded in the `DataEdit` as a simple capitalized character
123(or two) and some optional field widths.
124
125### `FormatControl<>`
126
127This class template traverses a `FORMAT` statement's contents (or `FMT=`
128character value) to extract data edit descriptors like `E20.14` to
129serve each item in an I/O data transfer statement's *io-list*,
130making callbacks to an instance of its class template argument
131along the way to effect character literal output and record
132positioning.
133The Fortran language standard defines formatted I/O as if the `FORMAT`
134string were driving the traversal of the data items in the *io-list*,
135but our implementation reverses that perspective to allow a more
136convenient (for the compiler) I/O runtime support library API design
137in which each data item is presented to the library with a distinct
138type-dependent call.
139
140Clients of `FormatControl` instantiations call its `GetNextDataEdit()`
141member function to acquire the next data edit descriptor to be processed
142from the format, and `FinishOutput()` to flush out any remaining
143output strings or record positionings at the end of the *io-list*.
144
145The `DefaultFormatControlCallbacks` structure summarizes the API
146expected by `FormatControl` from its class template actual arguments.
147
148### `OpenFile`
149
150This class encapsulates all (I hope) the operating system interfaces
151used to interact with the host's filesystems for operations on
152external units.
153Asynchronous I/O interfaces are faked for now with synchronous
154operations and deferred results.
155
156### `ConnectionState`
157
158An active connection to an external or internal unit maintains
159the common parts of its state in this subclass of `ConnectionAttributes`.
160The base class holds state that should not change during the
161lifetime of the connection, while the subclass maintains state
162that may change during I/O statement execution.
163
164### `InternalDescriptorUnit`
165
166When I/O is being performed from/to a Fortran `CHARACTER` array
167rather than an external file, this class manages the standard
168interoperable descriptor used to access its elements as records.
169It has the necessary interfaces to serve as an actual argument
170to the `FormatControl` class template.
171
172### `FileFrame<>`
173
174This CRTP class template isolates all of the complexity involved between
175an external unit's `OpenFile` and the buffering requirements
176imposed by the capabilities of Fortran `FORMAT` control edit
177descriptors that allow repositioning within the current record.
178Its interface enables its clients to define a "frame" (my term,
179not Fortran's) that is a contiguous range of bytes that are
180or may soon be in the file.
181This frame is defined as a file offset and a byte size.
182The `FileFrame` instance manages an internal circular buffer
183with two essential guarantees:
184
1851. The most recently requested frame is present in the buffer
186and contiguous in memory.
1871. Any extra data after the frame that may have been read from
188the external unit will be preserved, so that it's safe to
189read from a socket, pipe, or tape and not have to worry about
190repositioning and rereading.
191
192In end-of-file situations, it's possible that a request to read
193a frame may come up short.
194
195As a CRTP class template, `FileFrame` accesses the raw filesystem
196facilities it needs from `*this`.
197
198### `ExternalFileUnit`
199
200This class mixes in `ConnectionState`, `OpenFile`, and
201`FileFrame<ExternalFileUnit>` to represent the state of an open
202(or soon to be opened) external file descriptor as a Fortran
203I/O unit.
204It has the contextual APIs required to serve as a template actual
205argument to `FormatControl`.
206And it contains a `std::variant<>` suitable for holding the
207state of the active I/O statement in progress on the unit
208(see below).
209
210`ExternalFileUnit` instances reside in a `Map` that is allocated
211as a static variable and indexed by Fortran unit number.
212Static member functions `LookUp()`, `LookUpOrCrash()`, and `LookUpOrCreate()`
213probe the map to convert Fortran `UNIT=` numbers from I/O statements
214into references to active units.
215
216### `IoStatementBase`
217
218The subclasses of `IoStatementBase` each encapsulate and maintain
219the state of one active Fortran I/O statement across the several
220I/O runtime library API function calls it may comprise.
221The subclasses handle the distinctions between internal vs. external I/O,
222formatted vs. list-directed vs. unformatted I/O, input vs. output,
223and so on.
224
225`IoStatementBase` inherits default `FORMAT` processing callbacks and
226an `IoErrorHandler`.
227Each of the `IoStatementBase` classes that pertain to formatted I/O
228support the contextual callback interfaces needed by `FormatControl`,
229overriding the default callbacks of the base class, which crash if
230called inappropriately (e.g., if a `CLOSE` statement somehow
231passes a data item from an *io-list*).
232
233The lifetimes of these subclasses' instances each begin with a user
234program call to an I/O API routine with a name like `BeginExternalListOutput()`
235and persist until `EndIoStatement()` is called.
236
237To reduce dynamic memory allocation, *external* I/O statements allocate
238their per-statement state class instances in space reserved in the
239`ExternalFileUnit` instance.
240Internal I/O statements currently use dynamic allocation, but
241the I/O API supports a means whereby the code generated for the Fortran
242program may supply stack space to the I/O runtime support library
243for this purpose.
244
245### `IoStatementState`
246
247F18's Fortran I/O runtime support library defines and implements an API
248that uses a sequence of function calls to implement each Fortran I/O
249statement.
250The state of each I/O statement in progress is maintained in some
251subclass of `IoStatementBase`, as noted above.
252The purpose of `IoStatementState` is to provide generic access
253to the specific state classes without recourse to C++ `virtual`
254functions or function pointers, language features that may not be
255available to us in some important execution environments.
256`IoStatementState` comprises a `std::variant<>` of wrapped references
257to the various possibilities, and uses `std::visit()` to
258access them as needed by the I/O API calls that process each specifier
259in the I/O *control-list* and each item in the *io-list*.
260
261Pointers to `IoStatementState` instances are the `Cookie` type returned
262in the I/O API for `Begin...` I/O statement calls, passed back for
263the *control-list* specifiers and *io-list* data items, and consumed
264by the `EndIoStatement()` call at the end of the statement.
265
266Storage for `IoStatementState` is reserved in `ExternalFileUnit` for
267external I/O units, and in the various final subclasses for internal
268I/O statement states otherwise.
269
270Since Fortran permits a `CLOSE` statement to reference a nonexistent
271unit, the library has to treat that (expected to be rare) situation
272as a weird variation of internal I/O since there's no `ExternalFileUnit`
273available to hold its `IoStatementBase` subclass or `IoStatementState`.
274
275## A Narrative Overview Of `PRINT *, 'HELLO, WORLD'`
276
2771. When the compiled Fortran program begins execution at the `main()`
278entry point exported from its main program, it calls `ProgramStart()`
279with its arguments and environment.
2801. The generated code calls `BeginExternalListOutput()` to
281start the sequence of calls that implement the `PRINT` statement.
282Since the Fortran runtime I/O library has not yet been used in
283this process, its data structures are initialized on this
284first call, and Fortran I/O units 5 and 6 are connected with
285the stadard input and output file descriptors (respectively).
286The default unit code is converted to 6 and passed to
287`ExternalFileUnit::LookUpOrCrash()`, which returns a reference to
288unit 6's instance.
2891. We check that the unit was opened for formatted I/O.
2901. `ExternalFileUnit::BeginIoStatement<>()` is called to initialize
291an instance of `ExternalListIoStatementState<false>` in the unit,
292point to it with an `IoStatementState`, and return a reference to
293that object whose address will be the `Cookie` for this statement.
2941. The generated code calls `OutputAscii()` with that cookie and the
295address and length of the string.
2961. `OutputAscii()` confirms that the cookie corresponds to an output
297statement and determines that it's list-directed.
2981. `ListDirectedStatementState<false>::EmitLeadingSpaceOrAdvance()`
299emits the required initial space on the new current output record
300by calling `IoStatementState::GetConnectionState()` to locate
301the connection state, determining from the record position state
302that the space is necessary, and calling `IoStatementState::Emit()`
303to cough it out.  That call is redirected to `ExternalFileUnit::Emit()`,
304which calls `FileFrame<ExternalFileUnit>::WriteFrame()` to extend
305the frame of the current record and then `memcpy()` to fill its
306first byte with the space.
3071. Back in `OutputAscii()`, the mutable modes and connection state
308of the `IoStatementState` are queried to see whether we're in an
309`WRITE(UNIT=,FMT=,DELIM=)` statement with a delimited specifier.
310If we were, the library would emit the appropriate quote marks,
311double up any instances of that character in the text, and split the
312text over multiple records if it's long.
3131. But we don't have a delimiter, so `OutputAscii()` just carves
314up the text into record-sized chunks and emits them.  There's just
315one chunk for our short `CHARACTER` string value in this example.
316It's passed to `IoStatementState::Emit()`, which (as above) is
317redirected to `ExternalFileUnit::Emit()`, which interacts with the
318frame to extend the frame and `memcpy` data into the buffer.
3191. A flag is set in `ListDirectedStatementState<false>` to remember
320that the last item emitted in this list-directed output statement
321was an undelimited `CHARACTER` value, so that if the next item is
322also an undelimited `CHARACTER`, no interposing space will be emitted
323between them.
3241. `OutputAscii()` return `true` to its caller.
3251. The generated code calls `EndIoStatement()`, which is redirected to
326`ExternalIoStatementState<false>`'s override of that function.
327As this is not a non-advancing I/O statement, `ExternalFileUnit::AdvanceRecord()`
328is called to end the record.  Since this is a sequential formatted
329file, a newline is emitted.
3301. If unit 6 is connected to a terminal, the buffer is flushed.
331`FileFrame<ExternalFileUnit>::Flush()` drives `ExternalFileUnit::Write()`
332to push out the data in maximal contiguous chunks, dealing with any
333short writes that might occur, and collecting I/O errors along the way.
334This statement has no `ERR=` label or `IOSTAT=` specifier, so errors
335arriving at `IoErrorHandler::SignalErrno()` will cause an immediate
336crash.
3371. `ExternalIoStatementBase::EndIoStatement()` is called.
338It gets the final `IOSTAT=` value from `IoStatementBase::EndIoStatement()`,
339tells the `ExternalFileUnit` that no I/O statement remains active, and
340returns the I/O status value back to the program.
3411. Eventually, the program calls `ProgramEndStatement()`, which
342calls `ExternalFileUnit::CloseAll()`, which flushes and closes all
343open files.  If the standard output were not a terminal, the output
344would be written now with the same sequence of calls as above.
3451. `exit(EXIT_SUCCESS)`.
346