CompileCudaWithLLVM.rst - OpenGrok cross reference for /external/llvm/docs/CompileCudaWithLLVM.rst

Lines Matching +refs:is +refs:effective +refs:target
12 C/C++ with LLVM. It is aimed at both users who want to compile CUDA with LLVM
21 CUDA support is still in development and works the best in the trunk version
22 of LLVM. Below is a quick summary of downloading and building the trunk
103 The command line for compilation is similar to what you would use for C++.
116 ``<CUDA install path>`` is the root directory where you installed CUDA SDK,
117 typically ``/usr/local/cuda``. ``<GPU arch>`` is `the compute capability of
125 Although clang's CUDA implementation is largely compatible with NVCC's, you may
128 This is tricky, because NVCC may invoke clang as part of its own compilation
132 When clang is actually compiling CUDA code -- rather than being used as a
133 subtool of NVCC's -- it defines the ``__CUDA__`` macro.  ``__CUDA_ARCH__`` is
134 defined only in device mode (but will be defined if NVCC is using clang as a
174   equivalents, but because the intermediate result in an fma is not rounded,
177 * ``-fcuda-flush-denormals-to-zero`` (default: off) When this is enabled,
183 * ``-fcuda-approx-transcendentals`` (default: off) When this is enabled, the
189   This is implied by ``-ffast-math``.
195 typical CPU has branch prediction, out-of-order execution, and is superscalar,
203 customizable target-independent optimization pipeline.
216   control flow transfer in GPU is more expensive. They also promote other
218   code by over 10x. An empirical inline threshold for GPUs is 1100. This
219   configuration has yet to be upstreamed with a target-specific optimization
226   <http://llvm.org/docs/doxygen/html/SpeculativeExecution_8cpp_source.html>`_ is
228   effective on code along dominator paths.
234   target-specific alias analysis.