1PDF Theory of Operation 2======================= 3 4<!-- 5PRE-GIT DOCUMENT VERSION HISTORY 6 2012-06-25 Steve VanDeBogart 7 * Original version 8 2015-01-14 Hal Canary. 9 * Add section "Using the PDF backend" 10 * Markdown formatting 11--> 12 13 14Internally, SkPDFDocument and SkPDFDevice represents PDF documents and 15pages. This document describes how the backend operates, but **these 16interfaces are not part of the public API and are subject to perpetual 17change.** 18 19See [Using Skia's PDF Backend](../../user/sample/pdf) to find out how 20to use SkPDF as a client calling Skia's public API. 21 22* * * 23 24### Contents ### 25 26* [Typical usage of the PDF backend](#Typical_usage_of_the_PDF_backend) 27* [PDF Objects and Document Structure](#PDF_Objects_and_Document_Structure) 28* [PDF drawing](#PDF_drawing) 29* [Interned objects](#Interned_objects) 30* [Graphic States](#Graphic_States) 31* [Clip and Transform](#Clip_and_Transform) 32* [Generating a content stream](#Generating_a_content_stream) 33* [Drawing details](#Drawing_details) 34 + [Layers](#Layers) 35 + [Fonts](#Fonts) 36 + [Shaders](#Shaders) 37 + [Xfer modes](#Xfer_modes) 38* [Known issues](#Known_issues) 39 40 41<span id="Typical_usage_of_the_PDF_backend">Typical usage of the PDF backend</span> 42----------------------------------------------------------------------------------- 43 44SkPDFDevice is the main interface to the PDF backend. This child of 45SkDevice can be set on an SkPDFCanvas and drawn to. Once drawing to 46the canvas is complete (SkDocument::onEndPage() is called), the 47device's content and resouces are added to the SkPDFDocument that owns 48the device. A new SkPDFDevice should be created for each page or 49layer desired in the document. After all the pages have been added to 50the document, `SkPDFDocument::onClose()` is called to finish 51serializing the PDF file. 52 53 54<span id="PDF_Objects_and_Document_Structure">PDF Objects and Document Structure</span> 55--------------------------------------------------------------------------------------- 56 57![PDF Logical Document Structure](/dev/design/PdfLogicalDocumentStructure.png) 58 59**Background**: The PDF file format has a header, a set of objects and 60then a footer that contains a table of contents for all of the objects 61in the document (the cross-reference table). The table of contents 62lists the specific byte position for each object. The objects may have 63references to other objects and the ASCII size of those references is 64dependent on the object number assigned to the referenced object; 65therefore we can’t calculate the table of contents until the size of 66objects is known, which requires assignment of object numbers. The 67document uses SkWStream::bytesWritten() to query the offsets of each 68object and build the cross-reference table. 69 70Furthermore, PDF files can support a *linearized* mode, where objects 71are in a specific order so that pdf-viewers can more easily retrieve 72just the objects they need to display a specific page, i.e. by 73byte-range requests over the web. Linearization also requires that all 74objects used or referenced on the first page of the PDF have object 75numbers before the rest of the objects. Consequently, before 76generating a linearized PDF, all objects, their sizes, and object 77references must be known. Skia has no plans to implement linearized 78PDFs. 79 80 %PDF-1.4 81 …objects... 82 xref 83 0 31 % Total number of entries in the table of contents. 84 0000000000 65535 f 85 0000210343 00000 n 86 … 87 0000117055 00000 n 88 trailer 89 <</Size 31 /Root 1 0 R>> 90 startxref 91 210399 % Byte offset to the start of the table of contents. 92 %%EOF 93 94The class SkPDFObjNumMap and the virtual class SkPDFObject are used to 95manage the needs of the file format. Any object that will represent a 96PDF object must inherit from SkPDFObject and implement the methods to 97generate the binary representation and report any other SkPDFObjects 98used as resources. SkPDFTypes.h defines most of the basic PDF object 99types: bool, int, scalar, string, name, array, dictionary, and stream. 100(A stream is a dictionary containing at least a Length entry followed 101by the data of the stream.) 102 103All of these PDF object types except the stream type can be used in 104both a direct and an indirect fashion, i.e. an array can have an int 105or a dictionary as an inline entry, which does not require an object 106number. The stream type, cannot be inlined and must be referred to 107with an object reference. Most of the time, other objects types can be 108referred to with an object reference, but there are specific rules in 109the PDF specification that requires an inline reference in some place 110or an indirect reference in other places. All indirect objects must 111have an object number assigned. 112 113* **bools**: `true` `false` 114* **ints**: `42` `0` `-1` 115* **scalars**: `0.001` 116* **strings**: `(strings are in parentheses or byte encoded)` `<74657374>` 117* **name**: `/Name` `/Name#20with#20spaces` 118* **array**: `[/Foo 42 (arrays can contain multiple types)]` 119* **dictionary**: `<</Key1 (value1) /key2 42>>` 120* **indirect object**: 121 `5 0 obj 122 (An indirect string. Indirect objects have an object number and a 123 generation number, Skia always uses generation 0 objects) 124 endobj` 125* **object reference**: `5 0 R` 126* **stream**: `<</Length 56>> 127 stream 128 ...stream contents can be arbitrary, including binary... 129 endstream` 130 131The PDF backend requires all indirect objects used in a PDF to be 132added to the SkPDFObjNumMap of the SkPDFDocument. The catalog is 133responsible for assigning object numbers and generating the table of 134contents required at the end of PDF files. In some sense, generating a 135PDF is a three step process. In the first step all the objects and 136references among them are created (mostly done by SkPDFDevice). In the 137second step, SkPDFObjNumMap assigns and remembers object numbers. 138Finally, in the third 139step, the header is printed, each object is printed, and then the 140table of contents and trailer are printed. SkPDFDocument takes care of 141collecting all the objects from the various SkPDFDevice instances, 142adding them to an SkPDFObjNumMap, iterating through the objects once to 143set their file positions, and iterating again to generate the final 144PDF. 145 146As an optimization, many leaf nodes in the direct graph of indirect 147objects can be assigned object numbers and serialized early. 148 149 %PDF-1.4 150 2 0 obj << 151 /Type /Catalog 152 /Pages 1 0 R 153 >> 154 endobj 155 3 0 obj << 156 /Type /Page 157 /Parent 1 0 R 158 /Resources <> 159 /MediaBox [0 0 612 792] 160 /Contents 4 0 R 161 >> 162 endobj 163 4 0 obj <> stream 164 endstream 165 endobj 166 1 0 obj << 167 /Type /Pages 168 /Kids [3 0 R] 169 /Count 1 170 >> 171 endobj 172 xref 173 0 5 174 0000000000 65535 f 175 0000000236 00000 n 176 0000000009 00000 n 177 0000000062 00000 n 178 0000000190 00000 n 179 trailer 180 <</Size 5 /Root 2 0 R>> 181 startxref 182 299 183 %%EOF 184 185 186<span id="PDF_drawing">PDF drawing</span> 187----------------------------------------- 188 189Most drawing in PDF is specified by the text of a stream, referred to 190as a content stream. The syntax of the content stream is different 191than the syntax of the file format described above and is much closer 192to PostScript in nature. The commands in the content stream tell the 193PDF interpreter to draw things, like a rectangle (`x y w h re`), an 194image, or text, or to do meta operations like set the drawing color, 195apply a transform to the drawing coordinates, or clip future drawing 196operations. The page object that references a content stream has a 197list of resources that can be used in the content stream using the 198dictionary name to reference the resources. Resources are things like 199font objects, images objects, graphic state objects (a set of meta 200operations like miter limit, line width, etc). Because of a mismatch 201between Skia and PDF’s support for transparency (which will be 202explained later), SkPDFDevice records each drawing operation into an 203internal structure (ContentEntry) and only when the content stream is 204needed does it flatten that list of structures into the final content 205stream. 206 207 4 0 obj << 208 /Type /Page 209 /Resources << 210 /Font <</F1 9 0 R>> 211 /XObject <</Image1 22 0 R /Image2 73 0 R>> 212 >> 213 /Content 5 0 R 214 >> endobj 215 216 5 0 obj <</Length 227>> stream 217 % In the font specified in object 9 and a height 218 % of 12 points, at (72, 96) draw ‘Hello World.’ 219 BT 220 /F1 12 Tf 221 72 96 Td 222 (Hello World) Tj 223 ET 224 % Draw a filled rectange. 225 200 96 72 72 re B 226 ... 227 endstream 228 endobj 229 230<span id="Interned_objects">Interned objects</span> 231--------------------------------------------------- 232 233There are a number of high level PDF objects (like fonts, graphic 234states, etc) that are likely to be referenced multiple times in a 235single PDF. To ensure that there is only one copy of each object 236instance these objects an implemented with an 237[interning pattern](http://en.wikipedia.org/wiki/String_interning). 238As such, the classes representing these objects (like 239SkPDFGraphicState) have private constructors and static methods to 240retrieve an instance of the class. 241 242The SkPDFCanon object owns the interned objects. For obvious reasons, 243the returned instance should not be modified. A mechanism to ensure 244that interned classes are immutable is needed. See [issue 2452683](https://bug.skia.org/2683). 246 247<span id="Graphic_States">Graphic States</span> 248----------------------------------------------- 249 250PDF has a number of parameters that affect how things are drawn. The 251ones that correspond to drawing options in Skia are: color, alpha, 252line cap, line join type, line width, miter limit, and xfer/blend mode 253(see later section for xfer modes). With the exception of color, these 254can all be specified in a single pdf object, represented by the 255SkPDFGraphicState class. A simple command in the content stream can 256then set the drawing parameters to the values specified in that 257graphic state object. PDF does not allow specifying color in the 258graphic state object, instead it must be specified directly in the 259content stream. Similarly the current font and font size are set 260directly in the content stream. 261 262 6 0 obj << 263 /Type /ExtGState 264 /CA 1 % Opaque - alpha = 1 265 /LC 0 % Butt linecap 266 /LJ 0 % Miter line-join 267 /LW 2 % Line width of 2 268 /ML 6 % Miter limit of 6 269 /BM /Normal % Blend mode is normal i.e. source over 270 >> 271 endobj 272 273<span id="Clip_and_Transform">Clip and Transform</span> 274------------------------------------------------------- 275 276Similar to Skia, PDF allows drawing to be clipped or 277transformed. However, there are a few caveats that affect the design 278of the PDF backend. PDF does not support perspective transforms 279(perspective transform are treated as identity transforms). Clips, 280however, have more issues to cotend with. PDF clips cannot be directly 281unapplied or expanded. i.e. once an area has been clipped off, there 282is no way to draw to it. However, PDF provides a limited depth stack 283for the PDF graphic state (which includes the drawing parameters 284mentioned above in the Graphic States section as well as the clip and 285transform). Therefore to undo a clip, the PDF graphic state must be 286pushed before the clip is applied, then popped to revert to the state 287of the graphic state before the clip was applied. 288 289As the canvas makes drawing calls into SkPDFDevice, the active 290transform, clip region, and clip stack are stored in a ContentEntry 291structure. Later, when the ContentEntry structures are flattened into 292a valid PDF content stream, the transforms and clips are compared to 293decide on an efficient set of operations to transition between the 294states needed. Currently, a local optimization is used, to figure out 295the best transition from one state to the next. A global optimization 296could improve things by more effectively using the graphics state 297stack provided in the PDF format. 298 299<span id="Generating_a_content_stream">Generating a content stream</span> 300------------------------------------------------------------------------- 301 302For each draw call on an SkPDFDevice, a new ContentEntry is created, 303which stores the matrix, clip region, and clip stack as well as the 304paint parameters. Most of the paint parameters are bundled into an 305SkPDFGraphicState (interned) with the rest (color, font size, etc) 306explicitly stored in the ContentEntry. After populating the 307ContentEntry with all the relevant context, it is compared to the the 308most recently used ContentEntry. If the context matches, then the 309previous one is appended to instead of using the new one. In either 310case, with the context populated into the ContentEntry, the 311appropriate draw call is allowed to append to the content stream 312snippet in the ContentEntry to affect the core of the drawing call, 313i.e. drawing a shape, an image, text, etc. 314 315When all drawing is complete, SkPDFDocument::onEndPage() will call 316SkPDFDevice::content() to request the complete content stream for the 317page. The first thing done is to apply the initial transform specified 318in part in the constructor, this transform takes care of changing the 319coordinate space from an origin in the lower left (PDF default) to the 320upper left (Skia default) as well as any translation or scaling 321requested by the user (i.e. to achieve a margin or scale the 322canvas). Next (well almost next, see the next section), a clip is 323applied to restrict drawing to the content area (the part of the page 324inside the margins) of the page. Then, each ContentEntry is applied to 325the content stream with the help of a helper class, GraphicStackState, 326which tracks the state of the PDF graphics stack and optimizes the 327output. For each ContentEntry, commands are emitted to the final 328content entry to update the clip from its current state to the state 329specified in the ContentEntry, similarly the Matrix and drawing state 330(color, line joins, etc) are updated, then the content entry fragment 331(the actual drawing operation) is appended. 332 333<span id="Drawing_details">Drawing details</span> 334------------------------------------------------- 335 336Certain objects have specific properties that need to be dealt 337with. Images, layers (see below), and fonts assume the standard PDF 338coordinate system, so we have to undo any flip to the Skia coordinate 339system before drawing these entities. We don’t currently support 340inverted paths, so filling an inverted path will give the wrong result 341([issue 241](https://bug.skia.org/241)). PDF doesn’t draw zero length 342lines that have butt of square caps, so that is emulated. 343 344### <span id="Layers">Layers</span> ### 345 346PDF has a higher level object called a form x-object (form external 347object) that is basically a PDF page, with resources and a content 348stream, but can be transformed and drawn on an existing page. This is 349used to implement layers. SkPDFDevice has a method, 350makeFormXObjectFromDevice(), which uses the SkPDFDevice::content() 351method to construct a form x-object from the the 352device. SkPDFDevice::drawDevice() works by creating a form x-object of 353the passed device and then drawing that form x-object in the root 354device. There are a couple things to be aware of in this process. As 355noted previously, we have to be aware of any flip to the coordinate 356system - flipping it an even number of times will lead to the wrong 357result unless it is corrected for. The SkClipStack passed to drawing 358commands includes the entire clip stack, including the clipping 359operations done on the base layer. Since the form x-object will be 360drawn as a single operation onto the base layer, we can assume that 361all of those clips are in effect and need not apply them within the 362layer. 363 364### <span id="Fonts">Fonts</span> ### 365 366There are many details for dealing with fonts, so this document will 367only talk about some of the more important ones. A couple short 368details: 369 370* We can’t assume that an arbitrary font will be available at PDF view 371 time, so we embed all fonts in accordance with modern PDF 372 guidelines. 373* Most fonts these days are TrueType fonts, so this is where most of 374 the effort has been concentrated. 375* Because Skia may only be given a glyph-id encoding of the text to 376 render and there is no perfect way to reverse the encoding, the 377 PDF backend always uses the glyph-id encoding of the text. 378 379#### *Type1/Type3 fonts* #### 380 381Linux supports Type1 fonts, but Windows and Mac seem to lack the 382functionality required to extract the required information from the 383font without parsing the font file. When a non TrueType font is used 384any any platform (except for Type1 on Linux), it is encoded as a Type3 385font. In this context, a Type3 font is an array of form x-objects 386(content streams) that draw each glyph of the font. No hinting or 387kerning information is included in a Type3 font, just the shape of 388each glyph. Any font that has the do-not embed copy protection bit set 389will also get embedded as a Type3 font. From what I understand, shapes 390are not copyrightable, but programs are, so by stripping all the 391programmatic information and only embedding the shape of the glyphs we 392are honoring the do-not embed bit as much as required by law. 393 394PDF only supports an 8-bit encoding for Type1 or Type3 fonts. However, 395they can contain more than 256 glyphs. The PDF backend handles this by 396segmenting the glyphs into groups of 255 (glyph id 0 is always the 397unknown glyph) and presenting the font as multiple fonts, each with up 398to 255 glyphs. 399 400#### *Font subsetting* #### 401 402Many fonts, especially fonts with CJK support are fairly large, so it 403is desirable to subset them. Chrome uses the SFNTLY package to provide 404subsetting support to Skia for TrueType fonts. 405 406### <span id="Shaders">Shaders</span> ### 407 408Skia has two types of predefined shaders, image shaders and gradient 409shaders. In both cases, shaders are effectively positioned absolutely, 410so the initial position and bounds of where they are visible is part 411of the immutable state of the shader object. Each of the Skia’s tile 412modes needs to be considered and handled explicitly. The image shader 413we generate will be tiled, so tiling is handled by default. To support 414mirroring, we draw the image, reversed, on the appropriate axis, or on 415both axes plus a fourth in the vacant quadrant. For clamp mode, we 416extract the pixels along the appropriate edge and stretch the single 417pixel wide/long image to fill the bounds. For both x and y in clamp 418mode, we fill the corners with a rectangle of the appropriate 419color. The composed shader is then rotated or scaled as appropriate 420for the request. 421 422Gradient shaders are handled purely mathematically. First, the matrix 423is transformed so that specific points in the requested gradient are 424at pre-defined locations, for example, the linear distance of the 425gradient is always normalized to one. Then, a type 4 PDF function is 426created that achieves the desired gradient. A type 4 function is a 427function defined by a resticted postscript language. The generated 428functions clamp at the edges so if the desired tiling mode is tile or 429mirror, we hav to add a bit more postscript code to map any input 430parameter into the 0-1 range appropriately. The code to generate the 431postscript code is somewhat obtuse, since it is trying to generate 432optimized (for space) postscript code, but there is a significant 433number of comments to explain the intent. 434 435### <span id="Xfer_modes">Xfer modes</span> ### 436 437PDF supports some of the xfer modes used in Skia directly. For those, 438it is simply a matter of setting the blend mode in the graphic state 439to the appropriate value (Normal/SrcOver, Multiply, Screen, Overlay, 440Darken, Lighten, !ColorDOdge, ColorBurn, HardLight, SoftLight, 441Difference, Exclusion). Aside from the standard SrcOver mode, PDF does 442not directly support the porter-duff xfer modes though. Most of them 443(Clear, SrcMode, DstMode, DstOver, SrcIn, DstIn, SrcOut, DstOut) can 444be emulated by various means, mostly by creating form x-objects out of 445part of the content and drawing it with a another form x-object as a 446mask. I have not figured out how to emulate the following modes: 447SrcATop, DstATop, Xor, Plus. 448 449At the time of writing [2012-06-25], I have a [CL outstanding to fix a 450misunderstanding I had about the meaning of some of the emulated 451modes](https://codereview.appspot.com/4631078/). 452I will describe the system with this change applied. 453 454First, a bit of terminology and definition. When drawing something 455with an emulated xfer mode, what’s already drawn to the device is 456called the destination or Dst, and what’s about to be drawn is the 457source or Src. Src (and Dst) can have regions where it is transparent 458(alpha equals zero), but it also has an inherent shape. For most kinds 459of drawn objects, the shape is the same as where alpha is not 460zero. However, for things like images and layers, the shape is the 461bounds of the item, not where the alpha is non-zero. For example, a 46210x10 image, that is transparent except for a 1x1 dot in the center 463has a shape that is 10x10. The xfermodes gm test demonstrates the 464interaction between shape and alpha in combination with the port-duff 465xfer modes. 466 467The clear xfer mode removes any part of Dst that is within Src’s 468shape. This is accomplished by bundling the current content of the 469device (Dst) into a single entity and then drawing that with the 470inverse of Src’s shape used as a mask (we want Dst where Src 471isn’t). The implementation of that takes a couple more steps. You may 472have to refer back to [the content stream section](#Generating_a_content_stream). For any draw call, a 473ContentEntry is created through a method called 474SkPDFDevice::setUpContentEntry(). This method examines the xfer modes 475in effect for that drawing operation and if it is an xfer mode that 476needs emulation, it creates a form x-object from the device, 477i.e. creates Dst, and stores it away for later use. This also clears 478all of that existing ContentEntry's on that device. The drawing 479operation is then allowed to proceed as normal (in most cases, see 480note about shape below), but into the now empty device. Then, when the 481drawing operation in done, a complementary method is 482called,SkPDFDevice::finishContentEntry(), which takes action if the 483current xfer mode is emulated. In the case of Clear, it packages what 484was just drawn into another form x-object, and then uses the Src form 485x-object, an invert function, and the Dst form x-object to draw Dst 486with the inverse shape of Src as a mask. This works well when the 487shape of Src is the same as the opaque part of the drawing, since PDF 488uses the alpha channel of the mask form x-object to do masking. When 489shape doesn’t match the alpha channel, additional action is 490required. The drawing routines where shape and alpha don’t match, set 491state to indicate the shape (always rectangular), which 492finishContentEntry uses. The clear xfer mode is a special case; if 493shape is needed, then Src isn’t used, so there is code to not bother 494drawing Src if shape is required and the xfer mode is clear. 495 496SrcMode is clear plus Src being drawn afterward. DstMode simply omits 497drawing Src. DstOver is the same as SrcOver with Src and Dst swapped - 498this is accomplished by inserting the new ContentEntry at the 499beginning of the list of ContentEntry’s in setUpContentEntry instead 500of at the end. SrcIn, SrcOut, DstIn, DstOut are similar to each, the 501difference being an inverted or non-inverted mask and swapping Src and 502Dst (or not). SrcIn is SrcMode with Src drawn with Dst as a 503mask. SrcOut is like SrcMode, but with Src drawn with an inverted Dst 504as a mask. DstIn is SrcMode with Dst drawn with Src as a 505mask. Finally, DstOut is SrcMode with Dst draw with an inverted Src as 506a mask. 507 508<span id="Known_issues">Known issues</span> 509------------------------------------------- 510 511* [issue 237](https://bug.skia.org/237) 512 SkMaskFilter is not supported. 513* [issue 238](https://bug.skia.org/238) 514 SkColorFilter is not supported. 515* [issue 249](https://bug.skia.org/249) 516 SrcAtop Xor, and Plus xfer modes are not supported. 517* [issue 240](https://bug.skia.org/240) 518 drawVerticies is not implemented. 519* [issue 244](https://bug.skia.org/244) 520 Mostly, only TTF fonts are *directly* supported. 521 (User metrics show that almost all fonts are truetype.) 522* [issue 260](https://bug.skia.org/260) 523 Page rotation is accomplished by specifying a different 524 size page instead of including the appropriate rotation 525 annotation. 526 527* * * 528 529