1 // This file does not contain any code; it just contains additional text and formatting
2 // for doxygen.
3 
4 
5 //===----------------------------------------------------------------------===//
6 //
7 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
8 // See https://llvm.org/LICENSE.txt for license information.
9 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
10 //
11 //===----------------------------------------------------------------------===//
12 
13 
14 /*! @mainpage LLVM  OpenMP* Runtime Library Interface
15 @section sec_intro Introduction
16 
17 This document describes the interface provided by the
18 LLVM  OpenMP\other runtime library to the compiler.
19 Routines that are directly called as simple functions by user code are
20 not currently described here, since their definition is in the OpenMP
21 specification available from http://openmp.org
22 
23 The aim here is to explain the interface from the compiler to the runtime.
24 
25 The overall design is described, and each function in the interface
26 has its own description. (At least, that's the ambition, we may not be there yet).
27 
28 @section sec_building Quickly Building the Runtime
29 For the impatient, we cover building the runtime as the first topic here.
30 
31 CMake is used to build the OpenMP runtime.  For details and a full list of options for the CMake build system,
32 see <tt>README.rst</tt> in the source code repository.  These instructions will provide the most typical build.
33 
34 In-LLVM-tree build:.
35 @code
36 $ cd where-you-want-to-live
37 Check out openmp into llvm/projects
38 $ cd where-you-want-to-build
39 $ mkdir build && cd build
40 $ cmake path/to/llvm -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
41 $ make omp
42 @endcode
43 Out-of-LLVM-tree build:
44 @code
45 $ cd where-you-want-to-live
46 Check out openmp
47 $ cd where-you-want-to-live/openmp
48 $ mkdir build && cd build
49 $ cmake path/to/openmp -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
50 $ make
51 @endcode
52 
53 @section sec_supported Supported RTL Build Configurations
54 
55 The architectures supported are IA-32 architecture, Intel&reg;&nbsp; 64, and
56 Intel&reg;&nbsp; Many Integrated Core Architecture.  The build configurations
57 supported are shown in the table below.
58 
59 <table border=1>
60 <tr><th> <th>icc/icl<th>gcc<th>clang
61 <tr><td>Linux\other OS<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7)
62 <tr><td>FreeBSD\other<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7,8)
63 <tr><td>OS X\other<td>Yes(1,3,4)<td>No<td>Yes(4,6,7)
64 <tr><td>Windows\other OS<td>Yes(1,4)<td>No<td>No
65 </table>
66 (1) On IA-32 architecture and Intel&reg;&nbsp; 64, icc/icl versions 12.x
67     are supported (12.1 is recommended).<br>
68 (2) gcc version 4.7 is supported.<br>
69 (3) For icc on OS X\other, OS X\other version 10.5.8 is supported.<br>
70 (4) Intel&reg;&nbsp; Many Integrated Core Architecture not supported.<br>
71 (5) On Intel&reg;&nbsp; Many Integrated Core Architecture, icc/icl versions 13.0 or later are required.<br>
72 (6) Clang\other version 3.3 is supported.<br>
73 (7) Clang\other currently does not offer a software-implemented 128 bit extended
74     precision type.  Thus, all entry points reliant on this type are removed
75     from the library and cannot be called in the user program.  The following
76     functions are not available:
77 @code
78     __kmpc_atomic_cmplx16_*
79     __kmpc_atomic_float16_*
80     __kmpc_atomic_*_fp
81 @endcode
82 (8) Community contribution provided AS IS, not tested by Intel.
83 
84 Supported Architectures: IBM(R) Power 7 and Power 8
85 <table border=1>
86 <tr><th> <th>gcc<th>clang
87 <tr><td>Linux\other OS<td>Yes(1,2)<td>Yes(3,4)
88 </table>
89 (1) On Power 7, gcc version 4.8.2 is supported.<br>
90 (2) On Power 8, gcc version 4.8.2 is supported.<br>
91 (3) On Power 7, clang version 3.7 is supported.<br>
92 (4) On Power 8, clang version 3.7 is supported.<br>
93 
94 @section sec_frontend Front-end Compilers that work with this RTL
95 
96 The following compilers are known to do compatible code generation for
97 this RTL: icc/icl, gcc.  Code generation is discussed in more detail
98 later in this document.
99 
100 @section sec_outlining Outlining
101 
102 The runtime interface is based on the idea that the compiler
103 "outlines" sections of code that are to run in parallel into separate
104 functions that can then be invoked in multiple threads.  For instance,
105 simple code like this
106 
107 @code
108 void foo()
109 {
110 #pragma omp parallel
111     {
112         ... do something ...
113     }
114 }
115 @endcode
116 is converted into something that looks conceptually like this (where
117 the names used are merely illustrative; the real library function
118 names will be used later after we've discussed some more issues...)
119 
120 @code
121 static void outlinedFooBody()
122 {
123     ... do something ...
124 }
125 
126 void foo()
127 {
128     __OMP_runtime_fork(outlinedFooBody, (void*)0);   // Not the real function name!
129 }
130 @endcode
131 
132 @subsection SEC_SHAREDVARS Addressing shared variables
133 
134 In real uses of the OpenMP\other API there are normally references
135 from the outlined code  to shared variables that are in scope in the containing function.
136 Therefore the containing function must be able to address
137 these variables. The runtime supports two alternate ways of doing
138 this.
139 
140 @subsubsection SEC_SEC_OT Current Technique
141 The technique currently supported by the runtime library is to receive
142 a separate pointer to each shared variable that can be accessed from
143 the outlined function.  This is what is shown in the example below.
144 
145 We hope soon to provide an alternative interface to support the
146 alternate implementation described in the next section. The
147 alternative implementation has performance advantages for small
148 parallel regions that have many shared variables.
149 
150 @subsubsection SEC_SEC_PT Future Technique
151 The idea is to treat the outlined function as though it
152 were a lexically nested function, and pass it a single argument which
153 is the pointer to the parent's stack frame. Provided that the compiler
154 knows the layout of the parent frame when it is generating the outlined
155 function it can then access the up-level variables at appropriate
156 offsets from the parent frame.  This is a classical compiler technique
157 from the 1960s to support languages like Algol (and its descendants)
158 that support lexically nested functions.
159 
160 The main benefit of this technique is that there is no code required
161 at the fork point to marshal the arguments to the outlined function.
162 Since the runtime knows statically how many arguments must be passed to the
163 outlined function, it can easily copy them to the thread's stack
164 frame.  Therefore the performance of the fork code is independent of
165 the number of shared variables that are accessed by the outlined
166 function.
167 
168 If it is hard to determine the stack layout of the parent while generating the
169 outlined code, it is still possible to use this approach by collecting all of
170 the variables in the parent that are accessed from outlined functions into
171 a single `struct` which is placed on the stack, and whose address is passed
172 to the outlined functions. In this way the offsets of the shared variables
173 are known (since they are inside the struct) without needing to know
174 the complete layout of the parent stack-frame. From the point of view
175 of the runtime either of these techniques is equivalent, since in either
176 case it only has to pass a single argument to the outlined function to allow
177 it to access shared variables.
178 
179 A scheme like this is how gcc\other generates outlined functions.
180 
181 @section SEC_INTERFACES Library Interfaces
182 The library functions used for specific parts of the OpenMP\other language implementation
183 are documented in different modules.
184 
185  - @ref BASIC_TYPES fundamental types used by the runtime in many places
186  - @ref DEPRECATED  functions that are in the library but are no longer required
187  - @ref STARTUP_SHUTDOWN functions for initializing and finalizing the runtime
188  - @ref PARALLEL functions for implementing `omp parallel`
189  - @ref THREAD_STATES functions for supporting thread state inquiries
190  - @ref WORK_SHARING functions for work sharing constructs such as `omp for`, `omp sections`
191  - @ref THREADPRIVATE functions to support thread private data, copyin etc
192  - @ref SYNCHRONIZATION functions to support `omp critical`, `omp barrier`, `omp master`, reductions etc
193  - @ref ATOMIC_OPS functions to support atomic operations
194  - @ref STATS_GATHERING macros to support developer profiling of libomp
195  - Documentation on tasking has still to be written...
196 
197 @section SEC_EXAMPLES Examples
198 @subsection SEC_WORKSHARING_EXAMPLE Work Sharing Example
199 This example shows the code generated for a parallel for with reduction and dynamic scheduling.
200 
201 @code
202 extern float foo( void );
203 
204 int main () {
205     int i;
206     float r = 0.0;
207     #pragma omp parallel for schedule(dynamic) reduction(+:r)
208     for ( i = 0; i < 10; i ++ ) {
209         r += foo();
210     }
211 }
212 @endcode
213 
214 The transformed code looks like this.
215 @code
216 extern float foo( void );
217 
218 int main () {
219     static int zero = 0;
220     auto int gtid;
221     auto float r = 0.0;
222     __kmpc_begin( & loc3, 0 );
223     // The gtid is not actually required in this example so could be omitted;
224     // We show its initialization here because it is often required for calls into
225     // the runtime and should be locally cached like this.
226     gtid = __kmpc_global thread num( & loc3 );
227     __kmpc_fork call( & loc7, 1, main_7_parallel_3, & r );
228     __kmpc_end( & loc0 );
229     return 0;
230 }
231 
232 struct main_10_reduction_t_5 { float r_10_rpr; };
233 
234 static kmp_critical_name lck = { 0 };
235 static ident_t loc10; // loc10.flags should contain KMP_IDENT_ATOMIC_REDUCE bit set
236                       // if compiler has generated an atomic reduction.
237 
238 void main_7_parallel_3( int *gtid, int *btid, float *r_7_shp ) {
239     auto int i_7_pr;
240     auto int lower, upper, liter, incr;
241     auto struct main_10_reduction_t_5 reduce;
242     reduce.r_10_rpr = 0.F;
243     liter = 0;
244     __kmpc_dispatch_init_4( & loc7,*gtid, 35, 0, 9, 1, 1 );
245     while ( __kmpc_dispatch_next_4( & loc7, *gtid, & liter, & lower, & upper, & incr ) ) {
246         for( i_7_pr = lower; upper >= i_7_pr; i_7_pr ++ )
247           reduce.r_10_rpr += foo();
248     }
249     switch( __kmpc_reduce_nowait( & loc10, *gtid, 1, 4, & reduce, main_10_reduce_5, & lck ) ) {
250         case 1:
251            *r_7_shp += reduce.r_10_rpr;
252            __kmpc_end_reduce_nowait( & loc10, *gtid, & lck );
253            break;
254         case 2:
255            __kmpc_atomic_float4_add( & loc10, *gtid, r_7_shp, reduce.r_10_rpr );
256            break;
257         default:;
258     }
259 }
260 
261 void main_10_reduce_5( struct main_10_reduction_t_5 *reduce_lhs,
262                        struct main_10_reduction_t_5 *reduce_rhs )
263 {
264     reduce_lhs->r_10_rpr += reduce_rhs->r_10_rpr;
265 }
266 @endcode
267 
268 @defgroup BASIC_TYPES Basic Types
269 Types that are used throughout the runtime.
270 
271 @defgroup DEPRECATED Deprecated Functions
272 Functions in this group are for backwards compatibility only, and
273 should not be used in new code.
274 
275 @defgroup STARTUP_SHUTDOWN Startup and Shutdown
276 These functions are for library initialization and shutdown.
277 
278 @defgroup PARALLEL Parallel (fork/join)
279 These functions are used for implementing <tt>\#pragma omp parallel</tt>.
280 
281 @defgroup THREAD_STATES Thread Information
282 These functions return information about the currently executing thread.
283 
284 @defgroup WORK_SHARING Work Sharing
285 These functions are used for implementing
286 <tt>\#pragma omp for</tt>, <tt>\#pragma omp sections</tt>, <tt>\#pragma omp single</tt> and
287 <tt>\#pragma omp master</tt> constructs.
288 
289 When handling loops, there are different functions for each of the signed and unsigned 32 and 64 bit integer types
290 which have the name suffixes `_4`, `_4u`, `_8` and `_8u`. The semantics of each of the functions is the same,
291 so they are only described once.
292 
293 Static loop scheduling is handled by  @ref __kmpc_for_static_init_4 and friends. Only a single call is needed,
294 since the iterations to be executed by any give thread can be determined as soon as the loop parameters are known.
295 
296 Dynamic scheduling is handled by the @ref __kmpc_dispatch_init_4 and @ref __kmpc_dispatch_next_4 functions.
297 The init function is called once in each thread outside the loop, while the next function is called each
298 time that the previous chunk of work has been exhausted.
299 
300 @defgroup SYNCHRONIZATION Synchronization
301 These functions are used for implementing barriers.
302 
303 @defgroup THREADPRIVATE Thread private data support
304 These functions support copyin/out and thread private data.
305 
306 @defgroup STATS_GATHERING Statistics Gathering from OMPTB
307 These macros support profiling the libomp library.  Use --stats=on when building with build.pl to enable
308 and then use the KMP_* macros to profile (through counts or clock ticks) libomp during execution of an OpenMP program.
309 
310 @section sec_stats_env_vars Environment Variables
311 
312 This section describes the environment variables relevant to stats-gathering in libomp
313 
314 @code
315 KMP_STATS_FILE
316 @endcode
317 This environment variable is set to an output filename that will be appended *NOT OVERWRITTEN* if it exists.  If this environment variable is undefined, the statistics will be output to stderr
318 
319 @code
320 KMP_STATS_THREADS
321 @endcode
322 This environment variable indicates to print thread-specific statistics as well as aggregate statistics.  Each thread's statistics will be shown as well as the collective sum of all threads.  The values "true", "on", "1", "yes" will all indicate to print per thread statistics.
323 
324 @defgroup TASKING Tasking support
325 These functions support tasking constructs.
326 
327 @defgroup USER User visible functions
328 These functions can be called directly by the user, but are runtime library specific, rather than being OpenMP interfaces.
329 
330 */
331 
332