1<?xml version="1.0"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
3               "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
4  <!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'">
5  <!ENTITY version SYSTEM "version.xml">
6]>
7<chapter id="shaping-and-shape-plans">
8  <title>Shaping and shape plans</title>
9  <para>
10    Once you have your face and font objects configured as desired and
11    your input buffer is filled with the characters you need to shape,
12    all you need to do is call <function>hb_shape()</function>.
13  </para>
14  <para>
15    HarfBuzz will return the shaped version of the text in the same
16    buffer that you provided, but it will be in output mode. At that
17    point, you can iterate through the glyphs in the buffer, drawing
18    each one at the specified position or handing them off to the
19    appropriate graphics library.
20  </para>
21  <para>
22    For the most part, HarfBuzz's shaping step is straightforward from
23    the outside. But that doesn't mean there will never be cases where
24    you want to look under the hood and see what is happening on the
25    inside. HarfBuzz provides facilities for doing that, too.
26  </para>
27
28  <section id="shaping-buffer-output">
29    <title>Shaping and buffer output</title>
30    <para>
31      The <function>hb_shape()</function> function call takes four arguments: the font
32      object to use, the buffer of characters to shape, an array of
33      user-specified features to apply, and the length of that feature
34      array. The feature array can be NULL, so for the sake of
35      simplicity we will start with that case.
36    </para>
37    <para>
38      Internally, HarfBuzz looks  at the tables of the font file to
39      determine where glyph classes, substitutions, and positioning
40      are defined, using that information to decide which
41      <emphasis>shaper</emphasis> to use (<literal>ot</literal> for
42      OpenType fonts, <literal>aat</literal> for Apple Advanced
43      Typography fonts, and so on). It also looks at the direction,
44      script, and language properties of the segment to figure out
45      which script-specific shaping model is needed (at least, in
46      shapers that support multiple options).
47    </para>
48    <para>
49      If a font has a GDEF table, then that is used for
50      glyph classes; if not, HarfBuzz will fall back to Unicode
51      categorization by code point. If a font has an AAT "morx" table,
52      then it is used for substitutions; if not, but there is a GSUB
53      table, then the GSUB table is used. If the font has an AAT
54      "kerx" table, then it is used for positioning; if not, but
55      there is a GPOS table, then the GPOS table is used. If neither
56      table is found, but there is a "kern" table, then HarfBuzz will
57      use the "kern" table. If there is no "kerx", no GPOS, and no
58      "kern", HarfBuzz will fall back to positioning marks itself.
59    </para>
60    <para>
61      With a well-behaved OpenType font, you expect GDEF, GSUB, and
62      GPOS tables to all be applied. HarfBuzz implements the
63      script-specific shaping models in internal functions, rather
64      than in the public API.
65    </para>
66    <para>
67      The algorithms
68      used for complex scripts can be quite involved; HarfBuzz tries
69      to be compatible with the OpenType Layout specification
70      and, wherever there is any ambiguity, HarfBuzz attempts to replicate the
71      output of Microsoft's Uniscribe engine. See the <ulink
72      url="https://docs.microsoft.com/en-us/typography/script-development/standard">Microsoft
73      Typography pages</ulink> for more detail.
74    </para>
75    <para>
76      In general, though, all that you need to know is that
77      <function>hb_shape()</function> returns the results of shaping
78      in the same buffer that you provided. The buffer's content type
79      will now be set to
80      <literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal>, indicating
81      that it contains shaped output, rather than input text. You can
82      now extract the glyph information and positioning arrays:
83    </para>
84    <programlisting language="C">
85      hb_glyph_info_t *glyph_info    = hb_buffer_get_glyph_infos(buf, &amp;glyph_count);
86      hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &amp;glyph_count);
87    </programlisting>
88    <para>
89      The glyph information array holds a <type>hb_glyph_info_t</type>
90      for each output glyph, which has two fields:
91      <parameter>codepoint</parameter> and
92      <parameter>cluster</parameter>. Whereas, in the input buffer,
93      the <parameter>codepoint</parameter> field contained the Unicode
94      code point, it now contains the glyph ID of the corresponding
95      glyph in the font. The <parameter>cluster</parameter> field is
96      an integer that you can use to help identify when shaping has
97      reordered, split, or combined code points; we will say more
98      about that in the next chapter.
99    </para>
100    <para>
101      The glyph positions array holds a corresponding
102      <type>hb_glyph_position_t</type> for each output glyph,
103      containing four fields: <parameter>x_advance</parameter>,
104      <parameter>y_advance</parameter>,
105      <parameter>x_offset</parameter>, and
106      <parameter>y_offset</parameter>. The advances tell you how far
107      you need to move the drawing point after drawing this glyph,
108      depending on whether you are setting horizontal text (in which
109      case you will have x advances) or vertical text (for which you
110      will have y advances). The x and y offsets tell you where to
111      move to start drawing the glyph; usually you will have both and
112      x and a y offset, regardless of the text direction.
113    </para>
114    <para>
115      Most of the time, you will rely on a font-rendering library or
116      other graphics library to do the actual drawing of glyphs, so
117      you will need to iterate through the glyphs in the buffer and
118      pass the corresponding values off.
119    </para>
120  </section>
121
122  <section id="shaping-opentype-features">
123    <title>OpenType features</title>
124    <para>
125      OpenType features enable fonts to include smart behavior,
126      implemented as "lookup" rules stored in the GSUB and GPOS
127      tables. The OpenType specification defines a long list of
128      standard features that fonts can use for these behaviors; each
129      feature has a four-character reserved name and a well-defined
130      semantic meaning.
131    </para>
132    <para>
133      Some OpenType features are defined for the purpose of supporting
134      complex-script shaping, and are automatically activated, but
135      only when a buffer's script property is set to a script that the
136      feature supports.
137    </para>
138    <para>
139      Other features are more generic and can apply to several (or
140      any) script, and shaping engines are expected to implement
141      them. By default, HarfBuzz activates several of these features
142      on every text run. They include <literal>abvm</literal>,
143      <literal>blwm</literal>, <literal>ccmp</literal>,
144      <literal>locl</literal>, <literal>mark</literal>,
145      <literal>mkmk</literal>, and <literal>rlig</literal>.
146    </para>
147    <para>
148      In addition, if the text direction is horizontal, HarfBuzz
149      also applies the <literal>calt</literal>,
150      <literal>clig</literal>, <literal>curs</literal>,
151      <literal>dist</literal>, <literal>kern</literal>,
152      <literal>liga</literal>, <literal>rclt</literal>,
153      and <literal>frac</literal> features.
154    </para>
155    <para>
156      If the text direction is vertical, HarfBuzz applies
157      the <literal>vert</literal> feature by default.
158    </para>
159    <para>
160      Still other features are designed to be purely optional and left
161      up to the application or the end user to enable or disable as desired.
162    </para>
163    <para>
164      You can adjust the set of features that HarfBuzz applies to a
165      buffer by supplying an array of <type>hb_feature_t</type>
166      features as the third argument to
167      <function>hb_shape()</function>. For a simple case, let's just
168      enable the <literal>dlig</literal> feature, which turns on any
169      "discretionary" ligatures in the font:
170    </para>
171    <programlisting language="C">
172      hb_feature_t userfeatures[1];
173      userfeatures[0].tag = HB_TAG('d','l','i','g');
174      userfeatures[0].value = 1;
175      userfeatures[0].start = HB_FEATURE_GLOBAL_START;
176      userfeatures[0].end = HB_FEATURE_GLOBAL_END;
177    </programlisting>
178    <para>
179      <literal>HB_FEATURE_GLOBAL_END</literal> and
180      <literal>HB_FEATURE_GLOBAL_END</literal> are macros we can use
181      to indicate that the features will be applied to the entire
182      buffer. We could also have used a literal <literal>0</literal>
183      for the start and a <literal>-1</literal> to indicate the end of
184      the buffer (or have selected other start and end positions, if needed).
185    </para>
186    <para>
187      When we pass the <varname>userfeatures</varname> array to
188      <function>hb_shape()</function>, any discretionary ligature
189      substitutions from our font that match the text in our buffer
190      will get performed:
191    </para>
192    <programlisting language="C">
193      hb_shape(font, buf, userfeatures, num_features);
194    </programlisting>
195    <para>
196      Just like we enabled the <literal>dlig</literal> feature by
197      setting its <parameter>value</parameter> to
198      <literal>1</literal>, you would disable a feature by setting its
199      <parameter>value</parameter> to <literal>0</literal>. Some
200      features can take other <parameter>value</parameter> settings;
201      be sure you read the full specification of each feature tag to
202      understand what it does and how to control it.
203    </para>
204  </section>
205
206  <section id="shaping-shaper-selection">
207    <title>Shaper selection</title>
208    <para>
209      The basic version of <function>hb_shape()</function> determines
210      its shaping strategy based on examining the capabilities of the
211      font file. OpenType font tables cause HarfBuzz to try the
212      <literal>ot</literal> shaper, while AAT font tables cause HarfBuzz to try the
213      <literal>aat</literal> shaper.
214    </para>
215    <para>
216      In the real world, however, a font might include some unusual
217      mix of tables, or one of the tables might simply be broken for
218      the script you need to shape. So, sometimes, you might not
219      want to rely on HarfBuzz's process for deciding what to do, and
220      just tell <function>hb_shape()</function> what you want it to try.
221    </para>
222    <para>
223      <function>hb_shape_full()</function> is an alternate shaping
224      function that lets you supply a list of shapers for HarfBuzz to
225      try, in order, when shaping your buffer. For example, if you
226      have determined that HarfBuzz's attempts to work around broken
227      tables gives you better results than the AAT shaper itself does,
228      you might move the AAT shaper to the end of your list of
229      preferences and call <function>hb_shape_full()</function>
230    </para>
231    <programlisting language="C">
232      char *shaperprefs[3] = {"ot", "default", "aat"};
233      ...
234      hb_shape_full(font, buf, userfeatures, num_features, shaperprefs);
235    </programlisting>
236    <para>
237      to get results you are happier with.
238    </para>
239    <para>
240      You may also want to call
241      <function>hb_shape_list_shapers()</function> to get a list of
242      the shapers that were built at compile time in your copy of HarfBuzz.
243    </para>
244  </section>
245
246  <section id="shaping-plans-and-caching">
247    <title>Plans and caching</title>
248    <para>
249      Internally, HarfBuzz uses a structure called a shape plan to
250      track its decisions about how to shape the contents of a
251      buffer. The <function>hb_shape()</function> function builds up the shape plan by
252      examining segment properties and by inspecting the contents of
253      the font.
254    </para>
255    <para>
256      This process can involve some decision-making and
257      trade-offs — for example, HarfBuzz inspects the GSUB and GPOS
258      lookups for the script and language tags set on the segment
259      properties, but it falls back on the lookups under the
260      <literal>DFLT</literal> tag (and sometimes other common tags)
261      if there are actually no lookups for the tag requested.
262    </para>
263    <para>
264      HarfBuzz also includes some work-arounds for
265      handling well-known older font conventions that do not follow
266      OpenType or Unicode specifications, for buggy system fonts, and for
267      peculiarities of Microsoft Uniscribe. All of that means that a
268      shape plan, while not something that you should edit directly in
269      client code, still might be an object that you want to
270      inspect. Furthermore, if resources are tight, you might want to
271      cache the shape plan that HarfBuzz builds for your buffer and
272      font, so that you do not have to rebuild it for every shaping call.
273    </para>
274    <para>
275      You can create a cacheable shape plan with
276      <function>hb_shape_plan_create_cached(face, props,
277      user_features, num_user_features, shaper_list)</function>, where
278      <parameter>face</parameter> is a face object (not a font object,
279      notably), <parameter>props</parameter> is an
280      <type>hb_segment_properties_t</type>,
281      <parameter>user_features</parameter> is an array of
282      <type>hb_feature_t</type>s (with length
283      <parameter>num_user_features</parameter>), and
284      <parameter>shaper_list</parameter> is a list of shapers to try.
285    </para>
286    <para>
287      Shape plans are objects in HarfBuzz, so there are
288      reference-counting functions and user-data attachment functions
289      you can
290      use. <function>hb_shape_plan_reference(shape_plan)</function>
291      increases the reference count on a shape plan, while
292      <function>hb_shape_plan_destroy(shape_plan)</function> decreases
293      the reference count, destroying the shape plan when the last
294      reference is dropped.
295    </para>
296    <para>
297      You can attach user data to a shaper (with a key) using the
298      <function>hb_shape_plan_set_user_data(shape_plan,key,data,destroy,replace)</function>
299      function, optionally supplying a <function>destroy</function>
300      callback to use. You can then fetch the user data attached to a
301      shape plan with
302      <function>hb_shape_plan_get_user_data(shape_plan, key)</function>.
303    </para>
304  </section>
305
306</chapter>
307