1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2<html>
3<head>
4
5<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
6<title>Ogg Documentation</title>
7
8<style type="text/css">
9body {
10  margin: 0 18px 0 18px;
11  padding-bottom: 30px;
12  font-family: Verdana, Arial, Helvetica, sans-serif;
13  color: #333333;
14  font-size: .8em;
15}
16
17a {
18  color: #3366cc;
19}
20
21img {
22  border: 0;
23}
24
25#xiphlogo {
26  margin: 30px 0 16px 0;
27}
28
29#content p {
30  line-height: 1.4;
31}
32
33h1, h1 a, h2, h2 a, h3, h3 a {
34  font-weight: bold;
35  color: #ff9900;
36  margin: 1.3em 0 8px 0;
37}
38
39h1 {
40  font-size: 1.3em;
41}
42
43h2 {
44  font-size: 1.2em;
45}
46
47h3 {
48  font-size: 1.1em;
49}
50
51li {
52  line-height: 1.4;
53}
54
55#copyright {
56  margin-top: 30px;
57  line-height: 1.5em;
58  text-align: center;
59  font-size: .8em;
60  color: #888888;
61  clear: both;
62}
63</style>
64
65</head>
66
67<body>
68
69<div id="xiphlogo">
70  <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
71</div>
72
73<h1>Ogg bitstream overview</h1>
74
75This document serves as starting point for understanding the design
76and implementation of the Ogg container format.  If you're new to Ogg
77or merely want a high-level technical overview, start reading here.
78Other documents linked from the <a href="index.html">index page</a>
79give distilled technical descriptions and references of the container
80mechanisms.  This document is intended to aid understanding.
81
82<h2>Container format design points</h2>
83
84<p>Ogg is intended to be a simplest-possible container, concerned only
85with framing, ordering, and interleave. It can be used as a stream delivery
86mechanism, for media file storage, or as a building block toward
87implementing a more complex, non-linear container (for example, see
88the <a href="skeleton.html">Skeleton</a> or <a
89href="http://en.wikipedia.org/wiki/Annodex">Annodex/CMML</a>).
90
91<p>The Ogg container is not intended to be a monolithic
92'kitchen-sink'.  It exists only to frame and deliver in-order stream
93data and as such is vastly simpler than most other containers.
94Elementary and multiplexed streams are both constructed entirely from a
95single building block (an Ogg page) comprised of eight fields
96totalling twenty-eight bytes (the page header) a list of packet lengths
97(up to 255 bytes) and payload data (up to 65025 bytes).  The structure
98of every page is the same.  There are no optional fields or alternate
99encodings.
100
101<p>Stream and media metadata is contained in Ogg and not built into
102the Ogg container itself.  Metadata is thus compartmentalized and
103layered rather than part of a monolithic design, an especially good
104idea as no two groups seem able to agree on what a complete or
105complete-enough metadata set should be. In this way, the container and
106container implementation are isolated from unnecessary design flux.
107
108<h3>Streaming</h3>
109
110<p>The Ogg container is primarily a streaming format,
111encapsulating chronological, time-linear mixed media into a single
112delivery stream or file. The design is such that an application can
113always encode and/or decode all features of a bitstream in one pass
114with no seeking and minimal buffering.  Seeking to provide optimized
115encoding (such as two-pass encoding) or interactive decoding (such as
116scrubbing or instant replay) is not disallowed or discouraged, however
117no container feature requires nonlinear access of the bitstream.
118
119<h3>Variable Bit Rate, Variable Payload Size</h3>
120
121<p>Ogg is designed to contain any size data payload with bounded,
122predictable efficiency.  Ogg packets have no maximum size and a
123zero-byte minimum size.  There is no restriction on size changes from
124packet to packet. Variable size packets do not require the use of any
125optional or additional container features.  There is no optimal
126suggested packet size, though special consideration was paid to make
127sure 50-200 byte packets were no less efficient than larger packet
128sizes.  The original design criteria was a 2% overhead at 50 byte
129packets, dropping to a maximum working overhead of 1% with larger
130packets, and a typical working overhead of .5-.7% for most practical
131uses.
132
133<h3>Simple pagination</h3>
134
135<p>Ogg is a byte-aligned container with no context-dependent, optional
136or variable-length fields.  Ogg requires no repacking of codec data.
137The page structure is written out in-line as packet data is submitted
138to the streaming abstraction.  In addition, it is possible to
139implement both Ogg mux and demux as MT-hot zero-copy abstractions (as
140is done in the Tremor sourcebase).
141
142<h3>Capture</h3>
143
144<p>Ogg is designed for efficient and immediate stream capture with
145high confidence.  Although packets have no size limit in Ogg, pages
146are a maximum of just under 64kB meaning that any Ogg stream can be
147captured with confidence after seeing 128kB of data or less [worst
148case; typical figure is 6kB] from any random starting point in the
149stream.
150
151<h3>Seeking</h3>
152
153<p>Ogg implements simple coarse- and fine-grained seeking by design.
154
155<p>Coarse seeking may be performed by simply 'moving the tone arm' to a
156new position and 'dropping the needle'.  Rapid capture with
157accompanying timecode from any location in an Ogg file is guaranteed
158by the stream design.  From the acquisition of the first timecode,
159all data needed to play back from that time code forward is ahead of
160the stream cursor.
161
162<p>Ogg implements full sample-granularity seeking using an
163interpolated bisection search built on the capture and timecode
164mechanisms used by coarse seeking.  As above, once a search finds
165the desired timecode, all data needed to play back from that time code
166forward is ahead of the stream cursor.
167
168<p>Both coarse and fine seeking use the page structure and sequencing
169inherent to the Ogg format.  All Ogg streams are fully seekable from
170creation; seekability is unaffected by truncation or missing data, and
171is tolerant of gross corruption.  Seek operations are neither 'fuzzy' nor
172heuristic.
173
174<p>Seeking without use of an index is a major point of the Ogg
175design. There are several reasons why Ogg forgoes an index:
176
177<ul>
178
179<li>It must be possible to create an Ogg stream in a single pass, and
180an index requires either two passes to create, or the index must be
181tacked onto the end of a live stream after the stream is finished.
182Both methods run afoul of other design constraints.
183
184<li>An index is only marginally useful in Ogg for the complexity
185added; it adds no new functionality and seldom improves performance
186noticeably.  Empirical testing shows that indexless interpolation
187search does not require many more seeks in practice than using an
188index would.
189
190<li>'Optional' indexes encourage lazy implementations that can seek
191only when indexes are present, or that implement indexless seeking
192only by building an internal index after reading the entire file
193beginning to end.  This has been the fate of other containers that
194specify optional indexing.
195
196</ul>
197
198<h3>Simple multiplexing</h3>
199
200<p>Ogg multiplexes streams by interleaving pages from multiple elementary streams into a
201multiplexed stream in time order.  The multiplexed pages are not
202altered.  Muxing an Ogg AV stream out of separate audio,
203video and data streams is akin to shuffling several decks of cards
204together into a single deck; the cards themselves remain unchanged.
205Demultiplexing is similarly simple (as the cards are marked).
206
207<p>The goal of this design is to make the mux/demux operation as
208trivial as possible to allow live streaming systems to build and
209rebuild streams on the fly with minimal CPU usage and no additional
210storage or latency requirements.
211
212<h3>Continuous and Discontinuous Media</h3>
213
214<p>Ogg streams belong to one of two categories, "Continuous" streams and
215"Discontinuous" streams.
216
217<p>A stream that provides a gapless, time-continuous media type with a
218fine-grained timebase is considered to be 'Continuous'. A continuous
219stream should never be starved of data. Examples of continuous data
220types include broadcast audio and video.
221
222<p>A stream that delivers data in a potentially irregular pattern or
223with widely spaced timing gaps is considered to be 'Discontinuous'. A
224discontinuous stream may be best thought of as data representing
225scattered events; although they happen in order, they are typically
226unconnected data often located far apart. One example of a
227discontinuous stream types would be captioning such as <a
228href="http://wiki.xiph.org/OggKate">Ogg Kate</a>. Although it's
229possible to design captions as a continuous stream type, it's most
230natural to think of captions as widely spaced pieces of text with
231little happening between.
232
233<p>The fundamental reason for distinction between continuous and
234discontinuous streams concerns buffering.
235
236<h3>Buffering</h3>
237
238<p>A continuous stream is, by definition, gapless. Ogg buffering is based
239on the simple premise of never allowing an active continuous stream
240to starve for data during decode; buffering works ahead until all
241continuous streams in a physical stream have data ready and no further.
242
243<p>Discontinuous stream data is not assumed to be predictable. The
244buffering design takes discontinuous data 'as it comes' rather than
245working ahead to look for future discontinuous data for a potentially
246unbounded period. Thus, the buffering process makes no attempt to fill
247discontinuous stream buffers; their pages simply 'fall out' of the
248stream when continuous streams are handled properly.
249
250<p>Buffering requirements in this design need not be explicitly
251declared or managed in the encoded stream. The decoder simply reads as
252much data as is necessary to keep all continuous stream types gapless
253and no more, with discontinuous data processed as it arrives in the
254continuous data. Buffering is implicitly optimal for the given
255stream. Because all pages of all data types are stamped with absolute
256timing information within the stream, inter-stream synchronization
257timing is always maintained without the need for explicitly declared
258buffer-ahead hinting.
259
260<h3>Codec metadata</h3>
261
262<p>Ogg does not replicate codec-specific metadata into the mux layer
263in an attempt to make the mux and codec layer implementations 'fully
264separable'.  Things like specific timebase, keyframing strategy, frame
265duration, etc, do not appear in the Ogg container.  The mux layer is,
266instead, expected to query a codec through a standardized interface,
267left to the implementation, for this data when it is needed.
268
269<p>Though modern design wisdom usually prefers to predict all possible
270needs of current and future codecs then embed these dependencies and
271the required metadata into the container itself, this strategy
272increases container specification complexity, fragility, and rigidity.
273The mux and codec implementations become more independent, but the
274specifications become less independent. A codec can't do what a
275container hasn't already provided for.  New codecs are harder to
276support, and you can do fewer useful things with the ones you've
277already got (eg, try to make a good splitter without using any codecs.
278You're stuck splitting at keyframes only, or building yet another new
279mechanism into the container layer to mark what frames to skip
280displaying).
281
282<p>Ogg's design goes the opposite direction, where the specification
283is to be as simple, easy to understand, and 'proofed' against novel
284codecs as possible.  When an Ogg mux layer requires codec-specific
285information, it queries the codec (or a codec stub).  This trades a
286more complex implementation for a simpler, more flexible
287specification.
288
289<h3>Stream structure metadata</h3>
290
291<p>The Ogg container itself does not define a metadata system for
292declaring the structure and interrelations between multiple media
293types in a muxed stream.  That is, the Ogg container itself does not
294specify data like 'which steam is the subtitle stream?' or 'which
295video stream is the primary angle?'.  This metadata still exists, but
296is stored in the Ogg container rather than being built into the Ogg
297container.  Xiph specifies the 'Skeleton' metadata format for Ogg
298streams, but this decoupling of container and stream structure
299metadata means it is possible to use Ogg with any metadata
300specification without altering the container itself, or without stream
301structure metadata at all.
302
303<h3>Frame accurate absolute position</h3>
304
305<p>Every Ogg page is stamped with a 64 bit 'granule position' that
306serves as an absolute timestamp for mux and seeking.  A few nifty
307little tricks are usually also embedded in the granpos state, but
308we'll leave those aside for the moment (strictly speaking, they're
309part of each codec's mapping, not Ogg).
310
311<p>As previously mentioned above, granule positions are mapped into
312absolute timestamps by the codec, rather than being a hard timestamp.
313This allows maximally efficient use of the available 64 bits to
314address every sample/frame position without approximation while
315supporting new and previously unknown timebase encodings without
316needing to extend or update the mux layer.  When a codec needs a novel
317timebase, it simply brings the code for that mapping along with it.
318This is not a theoretical curiosity; new, wholly novel timebases were
319deployed with the adoption of both Theora and Dirac.  "Rolling INTRA"
320(keyframeless video) also benefits from novel use of the granule
321position.
322
323<h2>Ogg stream arrangement</h2>
324
325<h3>Packets, pages, and bitstreams</h3>
326
327<p>Ogg codecs use <em>packets</em>.  Packets are octet payloads of
328raw, compressed data, containing the data needed for a single
329decompressed unit, eg, one video frame. Packets have no maximum size
330and may be zero length. They do not have any high-level structure or
331boundary information; strung together, the unframed packets form a
332<em>logical bitstream</em> of apparently random bytes with no internal
333landmarks.
334
335<p>Logical bitstream packets are grouped and framed into Ogg pages
336along with a unique stream <em>serial number</em> to produce a
337<em>physical bitstream</em>.  An <em>elementary stream</em> is a
338physical bitstream containing only the pages framing a single logical
339bitstream. Each page is a self contained entity, although a packet may
340be split and encoded across one or more pages. The page decode
341mechanism is designed to recognize, verify and handle single pages at
342a time from the overall bitstream.
343
344<p><a href="framing.html">Ogg Bitstream Framing</a> specifies
345the page format of an Ogg bitstream, the packet coding process
346and elementary bitstreams in detail.
347
348<h3>Multiplexed bitstreams</h3>
349
350<p>Multiple logical/elementary bitstreams can be combined into a single
351<em>multiplexed bitstream</em> by interleaving whole pages from each
352contributing elementary stream in time order. The result is a single
353physical stream that multiplexes and frames multiple logical streams.
354Each logical stream is identified by the unique stream serial number
355stamped in its pages.  A physical stream may include a 'meta-header'
356(such as the <a href="skeleton.html">Ogg Skeleton</a>) comprising its
357own Ogg page at the beginning of the physical stream. A decoder
358recovers the original logical/elementary bitstreams out of the
359physical bitstream by taking the pages in order from the physical
360bitstream and redirecting them into the appropriate logical decoding
361entity.
362
363<p><a href="ogg-multiplex.html">Ogg Bitstream Multiplexing</a> specifies
364proper multiplexing of an Ogg bitstream in detail.
365
366<h3>Chaining</h3>
367
368<p>Multiple Ogg physical bitstreams may be concatenated into a single new
369stream; this is <em>chaining</em>. The bitstreams do not overlap; the
370final page of a given logical bitstream is immediately followed by the
371initial page of the next.</p>
372
373<p>Each logical bitstream in a chain must have a unique serial number
374within the scope of the full physical bitstream, not only within a
375particular <em>link</em> or <em>segment</em> of the chain.</p>
376
377<h3>Continuous and discontinuous streams</h3>
378
379<p>Within Ogg, each stream must be declared (by the codec) to be
380continuous- or discontinuous-time.  Most codecs treat all streams they
381use as either inherently continuous- or discontinuous-time, although
382this is not a requirement. A codec may, as part of its mapping, choose
383according to data in the initial header.
384
385<p>Continuous-time pages are stamped by end-time, discontinuous pages
386are stamped by begin-time.  Pages in a multiplexed stream are
387interleaved in order of the time stamp regardless of stream type.
388Both continuous and discontinuous logical streams are used to seek
389within a physical stream, however only continuous streams are used to
390determine buffering depth; because discontinuous streams are stamped
391by start time, they will always 'fall out' in time when buffering
392tracks only the continuous streams.  See 'Examples' for an
393illustration of the buffering mechanism.
394
395<h2>Mapping Requirements</h2>
396
397<p>Each codec is allowed some freedom in deciding how its logical
398bitstream is encapsulated into an Ogg bitstream (even if it is a
399trivial mapping, eg, 'plop the packets in and go'). This is the
400codec's <em>mapping</em>. Ogg imposes a few mapping requirements
401on any codec.
402
403<p>The <a href="framing.html">framing specification</a> defines
404'beginning of stream' and 'end of stream' page markers via a header
405flag (it is possible for a stream to consist of a single page). A
406correct stream always consists of an integer number of pages, an easy
407requirement given the variable size nature of pages.</p>
408
409<p>The first page of an elementary Ogg bitstream consists of a single,
410small 'initial header' packet that must include sufficient information
411to identify the exact CODEC type. From this initial header, the codec
412must also be able to determine its timebase and whether or not it is a
413continuous- or discontinuous-time stream.  The initial header must fit
414on a single page. If a codec makes use of auxiliary headers (for
415example, Vorbis uses two auxiliary headers), these headers must follow
416the initial header immediately.  The last header finishes its page;
417data begins on a fresh page.
418
419<p>As an example, Ogg Vorbis places the name and revision of the
420Vorbis CODEC, the audio rate and the audio quality into this initial
421header.  Comments and detailed codec setup appears in the larger
422auxiliary headers.</p>
423
424<h2>Multiplexing Requirements</h2>
425
426<p>Multiplexing requirements within Ogg are straightforward. When
427constructing a single-link (unchained) physical bitstream consisting
428of multiple elementary streams:
429
430<ol>
431
432<li> The initial header for each stream appears in sequence, each
433header on a single page.  All initial headers must appear with no
434intervening data (no auxiliary header pages or packets, no data pages
435or packets).  Order of the initial headers is unspecified. The
436'beginning of stream' flag is set on each initial header.
437
438<li> All auxiliary headers for all streams must follow.  Order
439is unspecified.  The final auxiliary header of each stream must flush
440its page.
441
442<li>Data pages for each stream follow, interleaved in time order.
443
444<li>The final page of each stream sets the 'end of stream' flag.
445Unlike initial pages, terminal pages for the logical bitstreams need
446not occur contiguously; indeed it may not be possible for them to do so.
447</oL>
448
449<p>Each grouped bitstream must have a unique serial number within the
450scope of the physical bitstream.</p>
451
452<h3>chaining and multiplexing</h3>
453
454<p>Multiplexed and/or unmultiplexed bitstreams may be chained
455consecutively. Such a physical bitstream obeys all the rules of both
456chained and multiplexed streams.  Each link, when unchained, must
457stand on its own as a valid physical bitstream.  Chained streams do
458not mix; a new segment may not begin until all streams in the
459preceding segment have terminated. </p>
460
461<h2>Examples</h2>
462
463<em>[More to come shortly; this section is currently being revised and expanded]</em>
464
465<p>Below, we present an example of a multiplexed and chained bitstream:</p>
466
467<p><img src="stream.png" alt="stream"/></p>
468
469<p>In this example, we see pages from five total logical bitstreams
470multiplexed into a physical bitstream. Note the following
471characteristics:</p>
472
473<ol>
474<li>Multiplexed bitstreams in a given link begin together; all of the
475initial pages must appear before any data pages. When concurrently
476multiplexed groups are chained, the new group does not begin until all
477the bitstreams in the previous group have terminated.</li>
478
479<li>The ordering of pages of concurrently multiplexed bitstreams is
480goverened by timestamp (not shown here); there is no regular
481interleaving order.  Pages within a logical bitstream appear in
482sequence order.</li>
483</ol>
484
485<div id="copyright">
486  The Xiph Fish Logo is a
487  trademark (&trade;) of Xiph.Org.<br/>
488
489  These pages &copy; 1994 - 2010 Xiph.Org. All rights reserved.
490</div>
491
492</body>
493</html>
494