1Topic: 2 3Sample granularity editing of a Vorbis file; inferred arbitrary sample 4length starting offsets / PCM stream lengths 5 6Overview: 7 8Vorbis, like mp3, is a frame-based* audio compression where audio is 9broken up into discrete short time segments. These segments are 10'atomic' that is, one must recover the entire short time segment from 11the frame packet; there's no way to recover only a part of the PCM time 12segment from part of the coded packet without expanding the entire 13packet and then discarding a portion of the resulting PCM audio. 14 15* In mp3, the data segment representing a given time period is called 16 a 'frame'; the roughly equivalent Vorbis construct is a 'packet'. 17 18Thus, when we edit a Vorbis stream, the finest physical editing 19granularity is on these packet boundaries (the mp3 case is 20actually somewhat more complex and mp3 editing is more complicated 21than just snipping on a frame boundary because time data can be spread 22backward or forward over frames. In Vorbis, packets are all 23stand-alone). Thus, at the physical packet level, Vorbis is still 24limited to streams that contain an integral number of packets. 25 26However, Vorbis streams may still exactly represent and be edited to a 27PCM stream of arbitrary length and starting offset without padding the 28beginning or end of the decoded stream or requiring that the desired 29edit points be packet aligned. Vorbis makes use of Ogg stream 30framing, and this framing provides time-stamping data, called a 31'granule position'; our starting offset and finished stream length may 32be inferred from correct usage of the granule position data. 33 34Time stamping mechanism: 35 36Vorbis packets are bundled into into Ogg pages (note that pages do not 37necessarily contain integral numbers of packets, but that isn't 38inportant in this discussion. More about Ogg framing can be found in 39ogg/doc/framing.html). Each page that contains a packet boundary is 40stamped with the absolute sample-granularity offset of the data, that 41is, 'complete samples-to-date' up to the last completed packet of that 42page. (The same mechanism is used for eg, video, where the number 43represents complete 2-D frames, and so on). 44 45(It's possible but rare for a packet to span more than two pages such 46that page[s] in the middle have no packet boundary; these packets have 47a granule position of '-1'.) 48 49This granule position mechaism in Ogg is used by Vorbis to indicate when the 50PCM data intended to be represented in a Vorbis segment begins a 51number of samples into the data represented by the first packet[s] 52and/or ends before the physical PCM data represented in the last 53packet[s]. 54 55File length a non-integral number of frames: 56 57A file to be encoded in Vorbis will probably not encode into an 58integral number of packets; such a file is encoded with the last 59packet containing 'extra'* samples. These samples are not padding; they 60will be discarded in decode. 61 62*(For best results, the encoder should use extra samples that preserve 63the character of the last frame. Simply setting them to zero will 64introduce a 'cliff' that's hard to encode, resulting in spread-frame 65noise. Libvorbis extrapolates the last frame past the end of data to 66produce the extra samples. Even simply duplicating the last value is 67better than clamping the signal to zero). 68 69The encoder indicates to the decoder that the file is actually shorter 70than all of the samples ('original' + 'extra') by setting the granule 71position in the last page to a short value, that is, the last 72timestamp is the original length of the file discarding extra samples. 73The decoder will see that the number of samples it has decoded in the 74last page is too many; it is 'original' + 'extra', where the 75granulepos says that through the last packet we only have 'original' 76number of samples. The decoder then ignores the 'extra' samples. 77This behavior is to occur only when the end-of-stream bit is set in 78the page (indicating last page of the logical stream). 79 80Note that it not legal for the granule position of the last page to 81indicate that there are more samples in the file than actually exist, 82however, implementations should handle such an illegal file gracefully 83in the interests of robust programming. 84 85Beginning point not on integral packet boundary: 86 87It is possible that we will the PCM data represented by a Vorbis 88stream to begin at a position later than where the decoded PCM data 89really begins after an integral packet boundary, a situation analagous 90to the above description where the PCM data does not end at an 91integral packet boundary. The easiest example is taking a clip out of 92a larger Vorbis stream, and choosing a beginning point of the clip 93that is not on a packet boundary; we need to ignore a few samples to 94get the desired beginning point. 95 96The process of marking the desired beginning point is similar to 97marking an arbitrary ending point. If the encoder wishes sample zero 98to be some location past the actual beginning of data, it associates a 99'short' granule position value with the completion of the second* 100audio packet. The granule position is associated with the second 101packet simply by making sure the second packet completes its page. 102 103*(We associate the short value with the second packet for two reasons. 104 a) The first packet only primes the overlap/add buffer. No data is 105 returned before decoding the second packet; this places the decision 106 information at the point of decision. b) Placing the short value on 107 the first packet would make the value negative (as the first packet 108 normally represents position zero); a negative value would break the 109 requirement that granule positions increase; the headers have 110 position values of zero) 111 112The decoder sees that on the first page that will return 113data from the overlap/add queue, we have more samples than the granule 114position accounts for, and discards the 'surplus' from the beginning 115of the queue. 116 117Note that short granule values (indicating less than the actually 118returned about of data) are not legal in the Vorbis spec outside of 119indicating beginning and ending sample positions. However, decoders 120should, at minimum, tolerate inadvertant short values elsewhere in the 121stream (just as they should tolerate out-of-order/non-increasing 122granulepos values, although this too is illegal). 123 124Beginning point at arbitrary positive timestamp (no 'zero' sample): 125 126It's also possible that the granule position of the first page of an 127audio stream is a 'long value', that is, a value larger than the 128amount of PCM audio decoded. This implies only that we are starting 129playback at some point into the logical stream, a potentially common 130occurence in streaming applications where the decoder may be 131connecting into a live stream. The decoder should not treat the long 132value specially. 133 134A long value elsewhere in the stream would normally occur only when a 135page is lost or out of sequence, as indicated by the page's sequence 136number. A long value under any other situation is not legal, however 137a decoder should tolerate both possibilities. 138 139 140