1Name
2
3    MESA_shader_integer_functions
4
5Name Strings
6
7    GL_MESA_shader_integer_functions
8
9Contact
10
11    Ian Romanick <ian.d.romanick@intel.com>
12
13Contributors
14
15    All the contributors of GL_ARB_gpu_shader5
16
17Status
18
19    Supported by all GLSL 1.30 capable drivers in Mesa 12.1 and later
20
21Version
22
23    Version 3, March 31, 2017
24
25Number
26
27    OpenGL Extension #495
28
29Dependencies
30
31    This extension is written against the OpenGL 3.2 (Compatibility Profile)
32    Specification.
33
34    This extension is written against Version 1.50 (Revision 09) of the OpenGL
35    Shading Language Specification.
36
37    GLSL 1.30 (OpenGL) or GLSL ES 3.00 (OpenGL ES) is required.
38
39    This extension interacts with ARB_gpu_shader5.
40
41    This extension interacts with ARB_gpu_shader_fp64.
42
43    This extension interacts with NV_gpu_shader5.
44
45Overview
46
47    GL_ARB_gpu_shader5 extends GLSL in a number of useful ways.  Much of this
48    added functionality requires significant hardware support.  There are many
49    aspects, however, that can be easily implmented on any GPU with "real"
50    integer support (as opposed to simulating integers using floating point
51    calculations).
52
53    This extension provides a set of new features to the OpenGL Shading
54    Language to support capabilities of these GPUs, extending the
55    capabilities of version 1.30 of the OpenGL Shading Language and version
56    3.00 of the OpenGL ES Shading Language.  Shaders using the new
57    functionality provided by this extension should enable this
58    functionality via the construct
59
60      #extension GL_MESA_shader_integer_functions : require   (or enable)
61
62    This extension provides a variety of new features for all shader types,
63    including:
64
65      * support for implicitly converting signed integer types to unsigned
66        types, as well as more general implicit conversion and function
67        overloading infrastructure to support new data types introduced by
68        other extensions;
69
70      * new built-in functions supporting:
71
72        * splitting a floating-point number into a significand and exponent
73          (frexp), or building a floating-point number from a significand and
74          exponent (ldexp);
75
76        * integer bitfield manipulation, including functions to find the
77          position of the most or least significant set bit, count the number
78          of one bits, and bitfield insertion, extraction, and reversal;
79
80        * extended integer precision math, including add with carry, subtract
81          with borrow, and extenended multiplication;
82
83    The resulting extension is a strict subset of GL_ARB_gpu_shader5.
84
85IP Status
86
87    No known IP claims.
88
89New Procedures and Functions
90
91    None
92
93New Tokens
94
95    None
96
97Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
98(OpenGL Operation)
99
100    None.
101
102Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
103(Rasterization)
104
105    None.
106
107Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
108(Per-Fragment Operations and the Frame Buffer)
109
110    None.
111
112Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
113(Special Functions)
114
115    None.
116
117Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
118(State and State Requests)
119
120    None.
121
122Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
123Specification (Invariance)
124
125    None.
126
127Additions to the AGL/GLX/WGL Specifications
128
129    None.
130
131Modifications to The OpenGL Shading Language Specification, Version 1.50
132(Revision 09)
133
134    Including the following line in a shader can be used to control the
135    language features described in this extension:
136
137      #extension GL_MESA_shader_integer_functions : <behavior>
138
139    where <behavior> is as specified in section 3.3.
140
141    New preprocessor #defines are added to the OpenGL Shading Language:
142
143      #define GL_MESA_shader_integer_functions        1
144
145
146    Modify Section 4.1.10, Implicit Conversions, p. 27
147
148    (modify table of implicit conversions)
149
150                                Can be implicitly
151        Type of expression        converted to
152        ---------------------   -----------------
153        int                     uint, float
154        ivec2                   uvec2, vec2
155        ivec3                   uvec3, vec3
156        ivec4                   uvec4, vec4
157
158        uint                    float
159        uvec2                   vec2
160        uvec3                   vec3
161        uvec4                   vec4
162
163    (modify second paragraph of the section) No implicit conversions are
164    provided to convert from unsigned to signed integer types or from
165    floating-point to integer types.  There are no implicit array or structure
166    conversions.
167
168    (insert before the final paragraph of the section) When performing
169    implicit conversion for binary operators, there may be multiple data types
170    to which the two operands can be converted.  For example, when adding an
171    int value to a uint value, both values can be implicitly converted to uint
172    and float.  In such cases, a floating-point type is chosen if either
173    operand has a floating-point type.  Otherwise, an unsigned integer type is
174    chosen if either operand has an unsigned integer type.  Otherwise, a
175    signed integer type is chosen.
176
177
178    Modify Section 5.9, Expressions, p. 57
179
180    (modify bulleted list as follows, adding support for implicit conversion
181    between signed and unsigned types)
182
183    Expressions in the shading language are built from the following:
184
185    * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector
186      types, and all matrix types.
187
188    ...
189
190    * The operator modulus (%) operates on signed or unsigned integer scalars
191      or vectors.  If the fundamental types of the operands do not match, the
192      conversions from Section 4.1.10 "Implicit Conversions" are applied to
193      produce matching types.  ...
194
195
196    Modify Section 6.1, Function Definitions, p. 63
197
198    (modify description of overloading, beginning at the top of p. 64)
199
200     Function names can be overloaded.  The same function name can be used for
201     multiple functions, as long as the parameter types differ.  If a function
202     name is declared twice with the same parameter types, then the return
203     types and all qualifiers must also match, and it is the same function
204     being declared.  For example,
205
206       vec4 f(in vec4 x, out vec4  y);   // (A)
207       vec4 f(in vec4 x, out uvec4 y);   // (B) okay, different argument type
208       vec4 f(in ivec4 x, out uvec4 y);  // (C) okay, different argument type
209
210       int  f(in vec4 x, out ivec4 y);  // error, only return type differs
211       vec4 f(in vec4 x, in  vec4  y);  // error, only qualifier differs
212       vec4 f(const in vec4 x, out vec4 y);  // error, only qualifier differs
213
214     When function calls are resolved, an exact type match for all the
215     arguments is sought.  If an exact match is found, all other functions are
216     ignored, and the exact match is used.  If no exact match is found, then
217     the implicit conversions in Section 4.1.10 (Implicit Conversions) will be
218     applied to find a match.  Mismatched types on input parameters (in or
219     inout or default) must have a conversion from the calling argument type
220     to the formal parameter type.  Mismatched types on output parameters (out
221     or inout) must have a conversion from the formal parameter type to the
222     calling argument type.
223
224     If implicit conversions can be used to find more than one matching
225     function, a single best-matching function is sought.  To determine a best
226     match, the conversions between calling argument and formal parameter
227     types are compared for each function argument and pair of matching
228     functions.  After these comparisons are performed, each pair of matching
229     functions are compared.  A function definition A is considered a better
230     match than function definition B if:
231
232       * for at least one function argument, the conversion for that argument
233         in A is better than the corresponding conversion in B; and
234
235       * there is no function argument for which the conversion in B is better
236         than the corresponding conversion in A.
237
238     If a single function definition is considered a better match than every
239     other matching function definition, it will be used.  Otherwise, a
240     semantic error occurs and the shader will fail to compile.
241
242     To determine whether the conversion for a single argument in one match is
243     better than that for another match, the following rules are applied, in
244     order:
245
246       1. An exact match is better than a match involving any implicit
247          conversion.
248
249       2. A match involving an implicit conversion from float to double is
250          better than a match involving any other implicit conversion.
251
252       3. A match involving an implicit conversion from either int or uint to
253          float is better than a match involving an implicit conversion from
254          either int or uint to double.
255
256     If none of the rules above apply to a particular pair of conversions,
257     neither conversion is considered better than the other.
258
259     For the function prototypes (A), (B), and (C) above, the following
260     examples show how the rules apply to different sets of calling argument
261     types:
262
263       f(vec4, vec4);        // exact match of vec4 f(in vec4 x, out vec4 y)
264       f(vec4, uvec4);       // exact match of vec4 f(in vec4 x, out ivec4 y)
265       f(vec4, ivec4);       // matched to vec4 f(in vec4 x, out vec4 y)
266                             //   (C) not relevant, can't convert vec4 to
267                             //   ivec4.  (A) better than (B) for 2nd
268                             //   argument (rule 2), same on first argument.
269       f(ivec4, vec4);       // NOT matched.  All three match by implicit
270                             //   conversion.  (C) is better than (A) and (B)
271                             //   on the first argument.  (A) is better than
272                             //   (B) and (C).
273
274
275    Modify Section 8.3, Common Functions, p. 84
276
277    (add support for single-precision frexp and ldexp functions)
278
279    Syntax:
280
281      genType frexp(genType x, out genIType exp);
282      genType ldexp(genType x, in genIType exp);
283
284    The function frexp() splits each single-precision floating-point number in
285    <x> into a binary significand, a floating-point number in the range [0.5,
286    1.0), and an integral exponent of two, such that:
287
288      x = significand * 2 ^ exponent
289
290    The significand is returned by the function; the exponent is returned in
291    the parameter <exp>.  For a floating-point value of zero, the significant
292    and exponent are both zero.  For a floating-point value that is an
293    infinity or is not a number, the results of frexp() are undefined.
294
295    If the input <x> is a vector, this operation is performed in a
296    component-wise manner; the value returned by the function and the value
297    written to <exp> are vectors with the same number of components as <x>.
298
299    The function ldexp() builds a single-precision floating-point number from
300    each significand component in <x> and the corresponding integral exponent
301    of two in <exp>, returning:
302
303      significand * 2 ^ exponent
304
305    If this product is too large to be represented as a single-precision
306    floating-point value, the result is considered undefined.
307
308    If the input <x> is a vector, this operation is performed in a
309    component-wise manner; the value passed in <exp> and returned by the
310    function are vectors with the same number of components as <x>.
311
312
313    (add support for new integer built-in functions)
314
315    Syntax:
316
317      genIType bitfieldExtract(genIType value, int offset, int bits);
318      genUType bitfieldExtract(genUType value, int offset, int bits);
319
320      genIType bitfieldInsert(genIType base, genIType insert, int offset,
321                              int bits);
322      genUType bitfieldInsert(genUType base, genUType insert, int offset,
323                              int bits);
324
325      genIType bitfieldReverse(genIType value);
326      genUType bitfieldReverse(genUType value);
327
328      genIType bitCount(genIType value);
329      genIType bitCount(genUType value);
330
331      genIType findLSB(genIType value);
332      genIType findLSB(genUType value);
333
334      genIType findMSB(genIType value);
335      genIType findMSB(genUType value);
336
337    The function bitfieldExtract() extracts bits <offset> through
338    <offset>+<bits>-1 from each component in <value>, returning them in the
339    least significant bits of corresponding component of the result.  For
340    unsigned data types, the most significant bits of the result will be set
341    to zero.  For signed data types, the most significant bits will be set to
342    the value of bit <offset>+<base>-1.  If <bits> is zero, the result will be
343    zero.  The result will be undefined if <offset> or <bits> is negative, or
344    if the sum of <offset> and <bits> is greater than the number of bits used
345    to store the operand.  Note that for vector versions of bitfieldExtract(),
346    a single pair of <offset> and <bits> values is shared for all components.
347
348    The function bitfieldInsert() inserts the <bits> least significant bits of
349    each component of <insert> into the corresponding component of <base>.
350    The result will have bits numbered <offset> through <offset>+<bits>-1
351    taken from bits 0 through <bits>-1 of <insert>, and all other bits taken
352    directly from the corresponding bits of <base>.  If <bits> is zero, the
353    result will simply be <base>.  The result will be undefined if <offset> or
354    <bits> is negative, or if the sum of <offset> and <bits> is greater than
355    the number of bits used to store the operand.  Note that for vector
356    versions of bitfieldInsert(), a single pair of <offset> and <bits> values
357    is shared for all components.
358
359    The function bitfieldReverse() reverses the bits of <value>.  The bit
360    numbered <n> of the result will be taken from bit (<bits>-1)-<n> of
361    <value>, where <bits> is the total number of bits used to represent
362    <value>.
363
364    The function bitCount() returns the number of one bits in the binary
365    representation of <value>.
366
367    The function findLSB() returns the bit number of the least significant one
368    bit in the binary representation of <value>.  If <value> is zero, -1 will
369    be returned.
370
371    The function findMSB() returns the bit number of the most significant bit
372    in the binary representation of <value>.  For positive integers, the
373    result will be the bit number of the most significant one bit.  For
374    negative integers, the result will be the bit number of the most
375    significant zero bit.  For a <value> of zero or negative one, -1 will be
376    returned.
377
378
379    (support for unsigned integer add/subtract with carry-out)
380
381    Syntax:
382
383      genUType uaddCarry(genUType x, genUType y, out genUType carry);
384      genUType usubBorrow(genUType x, genUType y, out genUType borrow);
385
386    The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and
387    <y>, returning the sum modulo 2^32.  The value <carry> is set to zero if
388    the sum was less than 2^32, or one otherwise.
389
390    The function usubBorrow() subtracts the 32-bit unsigned integer or vector
391    <y> from <x>, returning the difference if non-negative or 2^32 plus the
392    difference, otherwise.  The value <borrow> is set to zero if x >= y, or
393    one otherwise.
394
395
396    (support for signed and unsigned multiplies, with 32-bit inputs and a
397     64-bit result spanning two 32-bit outputs)
398
399    Syntax:
400
401      void umulExtended(genUType x, genUType y, out genUType msb,
402                        out genUType lsb);
403      void imulExtended(genIType x, genIType y, out genIType msb,
404                        out genIType lsb);
405
406    The functions umulExtended() and imulExtended() multiply 32-bit unsigned
407    or signed integers or vectors <x> and <y>, producing a 64-bit result.  The
408    32 least significant bits are returned in <lsb>; the 32 most significant
409    bits are returned in <msb>.
410
411
412GLX Protocol
413
414    None.
415
416Dependencies on ARB_gpu_shader_fp64
417
418    This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
419    of implicit conversions supported in the OpenGL Shading Language.  If more
420    than one of these extensions is supported, an expression of one type may
421    be converted to another type if that conversion is allowed by any of these
422    specifications.
423
424    If ARB_gpu_shader_fp64 or a similar extension introducing new data types
425    is not supported, the function overloading rule in the GLSL specification
426    preferring promotion an input parameters to smaller type to a larger type
427    is never applicable, as all data types are of the same size.  That rule
428    and the example referring to "double" should be removed.
429
430
431Dependencies on NV_gpu_shader5
432
433    This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
434    of implicit conversions supported in the OpenGL Shading Language.  If more
435    than one of these extensions is supported, an expression of one type may
436    be converted to another type if that conversion is allowed by any of these
437    specifications.
438
439    If NV_gpu_shader5 is supported, integer data types are supported with four
440    different precisions (8-, 16, 32-, and 64-bit) and floating-point data
441    types are supported with three different precisions (16-, 32-, and
442    64-bit).  The extension adds the following rule for output parameters,
443    which is similar to the one present in this extension for input
444    parameters:
445
446       5. If the formal parameters in both matches are output parameters, a
447          conversion from a type with a larger number of bits per component is
448          better than a conversion from a type with a smaller number of bits
449          per component.  For example, a conversion from an "int16_t" formal
450          parameter type to "int"  is better than one from an "int8_t" formal
451          parameter type to "int".
452
453    Such a rule is not provided in this extension because there is no
454    combination of types in this extension and ARB_gpu_shader_fp64 where this
455    rule has any effect.
456
457
458Errors
459
460    None
461
462
463New State
464
465    None
466
467New Implementation Dependent State
468
469    None
470
471Issues
472
473    (1) What should this extension be called?
474
475      UNRESOLVED.  This extension borrows from GL_ARB_gpu_shader5, so creating
476      some sort of a play on that name would be viable.  However, nothing in
477      this extension should require SM5 hardware, so such a name would be a
478      little misleading and weird.
479
480      Since the primary purpose is to add integer related functions from
481      GL_ARB_gpu_shader5, call this extension GL_MESA_shader_integer_functions
482      for now.
483
484    (2) Why is some of the formatting in this extension weird?
485
486      RESOLVED: This extension is formatted to minimize the differences (as
487      reported by 'diff --side-by-side -W180') with the GL_ARB_gpu_shader5
488      specification.
489
490    (3) Should ldexp and frexp be included?
491
492      RESOLVED: Yes.  Few GPUs have native instructions to implement these
493      functions.  These are generally implemented using existing GLSL built-in
494      functions and the other functions provided by this extension.
495
496    (4) Should umulExtended and imulExtended be included?
497
498      RESOLVED: Yes.  These functions should be implementable on any GPU that
499      can support the rest of this extension, but the implementation may be
500      complex.  The implementation on a GPU that only supports 32bit x 32bit =
501      32bit multiplication would be quite expensive.  However, many GPUs
502      (including OpenGL 4.0 GPUs that already support this function) have a
503      32bit x 16bit = 48bit multiplier.  The implementation there is only
504      trivially more expensive than regular 32bit multiplication.
505
506    (5) Should the pack and unpack functions be included?
507
508      RESOLVED: No.  These functions are already available via
509      GL_ARB_shading_language_packing.
510
511    (6) Should the "BitsTo" functions be included?
512
513      RESOLVED: No.  These functions are already available via
514      GL_ARB_shader_bit_encoding.
515
516Revision History
517
518    Rev.      Date     Author    Changes
519    ----  -----------  --------  -----------------------------------------
520     3    31-Mar-2017  Jon Leech Add ES support (OpenGL-Registry/issues/3)
521     2     7-Jul-2016  idr       Fix typo in #extension line
522     1    20-Jun-2016  idr       Initial version based on GL_ARB_gpu_shader5.
523