1Encapsulation of FLAC in ISO Base Media File Format 2Version 0.0.4 (draft) 3 4Table of Contents 51 Scope 62 Supporting Normative References 73 Design Rules of Encapsulation 8 3.1 File Type Identification 9 3.2 Overview of Track Structure 10 3.3 Definition of FLAC sample 11 3.3.1 Sample entry format 12 3.3.2 FLAC Specific Box 13 3.3.3 Sample format 14 3.3.4 Duration of FLAC sample 15 3.3.5 Sub-sample 16 3.3.6 Random Access 17 3.3.6.1 Random Access Point 18 3.4 Basic Structure (informative) 19 3.4.1 Initial Movie 20 3.5 Example of Encapsulation (informative) 214 Acknowledgements 225 Author's Address 23 241 Scope 25 26 This document specifies the normative mapping for encapsulation of 27 FLAC coded audio bitstreams in ISO Base Media file format and its 28 derivatives. The encapsulation of FLAC coded bitstreams in 29 QuickTime file format is outside the scope of this specification. 30 312 Supporting Normative References 32 33 [1] ISO/IEC 14496-12:2012 Corrected version 34 35 Information technology — Coding of audio-visual objects — Part 36 12: ISO base media file format 37 38 [2] ISO/IEC 14496-12:2012/Amd.1:2013 39 40 Information technology — Coding of audio-visual objects — Part 41 12: ISO base media file format AMENDMENT 1: Various 42 enhancements including support for large metadata 43 44 [3] FLAC format specification 45 46 https://xiph.org/flac/format.html 47 48 Definition of the FLAC Audio Codec stream format 49 50 [4] FLAC-in-Ogg mapping specification 51 52 https://xiph.org/flac/ogg_mapping.html 53 54 Ogg Encapsulation for the FLAC Audio Codec 55 56 [5] Matroska specification 57 583 Design Rules of Encapsulation 59 60 3.1 File Type Identification 61 62 This specification does not define any brand to declare files 63 which conform to this specification. Files which conform to 64 this specification shall contain at least one brand which 65 supports the requirements and the requirements described in 66 this clause without contradiction in the compatible brands 67 list of the File Type Box. The minimal support of the 68 encapsulation of FLAC bitstreams in ISO Base Media file format 69 requires the 'isom' brand. 70 71 3.2 Overview of Track Structure 72 73 FLAC coded audio shall be encapsulated into the ISO Base 74 Media File Format as media data within an audio track. 75 76 + The handler_type field in the Handler Reference Box 77 shall be set to 'soun'. 78 79 + The Media Information Box shall contain the Sound Media 80 Header Box. 81 82 + The codingname of the sample entry is 'fLaC'. 83 84 This specification does not define any encapsulation 85 using MP4AudioSampleEntry with objectTypeIndication 86 specified by the MPEG-4 Registration Authority 87 (http://www.mp4ra.org/). See section 'Sample entry 88 format' for the definition of the sample entry. 89 90 + The 'dfLa' box is added to the sample entry to convey 91 initializing information for the decoder. 92 93 See section 'FLAC Specific Box' for the definition of 94 the box contents. 95 96 + A FLAC sample is exactly one FLAC frame as described 97 in the format specification[3]. See section 98 'Sample format' for details of the frame contents. 99 100 + Every FLAC sample is a sync sample. No pre-roll or 101 lapping is required. See section 'Random Access' for 102 further details. 103 104 3.3 Definition of a FLAC sample 105 106 3.3.1 Sample entry format 107 108 For any track containing one or more FLAC bitstreams, a 109 sample entry describing the corresponding FLAC bitstream 110 shall be present inside the Sample Table Box. This version 111 of the specification defines only one sample entry format 112 named FLACSampleEntry whose codingname is 'fLaC'. This 113 sample entry includes exactly one FLAC Specific Box 114 defined in section 'FLAC specific box' as a mandatory box 115 and indicates that FLAC samples described by this sample 116 entry are stored by the sample format described in section 117 'Sample format'. 118 119 The syntax and semantics of the FLACSampleEntry is shown 120 as follows. The data fields of this box and native 121 FLAC[3] structures encoded within FLAC blocks are both 122 stored in big-endian format, though for purposes of the 123 ISO BMFF container, FLAC native metadata and data blocks 124 are treated as unstructured octet streams. 125 126 class FLACSampleEntry() extends AudioSampleEntry ('fLaC'){ 127 FLACSpecificBox(); 128 } 129 130 The fields of the AudioSampleEntry portion shall be set as 131 follows: 132 133 + channelcount: 134 135 The channelcount field shall be set equal to the 136 channel count specified by the FLAC bitstream's native 137 METADATA_BLOCK_STREAMINFO header as described in [3]. 138 Note that the FLAC FRAME_HEADER structure that begins 139 each FLAC sample redundantly encodes channel number; 140 the number of channels declared in each FRAME_HEADER 141 MUST match the number of channels declared here and in 142 the METADATA_BLOCK_STREAMINFO header. 143 144 + samplesize: 145 146 The samplesize field shall be set equal to the bits 147 per sample specified by the FLAC bitstream's native 148 METADATA_BLOCK_STREAMINFO header as described in [3]. 149 Note that the FLAC FRAME_HEADER structure that begins 150 each FLAC sample redundantly encodes the number of 151 bits per sample; the bits per sample declared in each 152 FRAME_HEADER MUST match the samplesize declared here 153 and the bits per sample field declared in the 154 METADATA_BLOCK_STREAMINFO header. 155 156 + samplerate: 157 158 When possible, the samplerate field shall be set 159 equal to the sample rate specified by the FLAC 160 bitstream's native METADATA_BLOCK_STREAMINFO header 161 as described in [3], left-shifted by 16 bits to 162 create the appropriate 16.16 fixed-point 163 representation. 164 165 When the bitstream's native sample rate is greater 166 than the maximum expressible value of 65535 Hz, 167 the samplerate field shall hold the greatest 168 expressible regular division of that rate. I.e. 169 the samplerate field shall hold 48000.0 for 170 native sample rates of 96 and 192 kHz. In the 171 case of unusual sample rates which do not have 172 an expressible regular division, the maximum value 173 of 65535.0 Hz should be used. 174 175 High-rate FLAC bitstreams are common, and the native 176 value from the METADATA_BLOCK_STREAMINFO header in 177 the FLACSpecificBox MUST be read to determine the 178 correct sample rate of the bitstream. 179 180 Note that the FLAC FRAME_HEADER structure that begins 181 each FLAC sample redundantly encodes the sample rate; 182 the sample rate declared in each FRAME_HEADER MUST 183 match the sample rate declared in the 184 METADATA_BLOCK_STREAMINFO header, and here in the 185 AudioSampleEntry portion of the FLACSampleEntry 186 as much as is allowed by the encoding restrictions 187 described above. 188 189 Finally, the FLACSpecificBox carries codec headers: 190 191 + FLACSpecificBox 192 193 This box contains initializing information for the 194 decoder as defined in section 'FLAC specific box'. 195 196 3.3.2 FLAC Specific Box 197 198 Exactly one FLAC Specific Box shall be present in each 199 FLACSampleEntry. This specification defines version 0 200 of this box. If incompatible changes occur in future 201 versions of this specification, another version number 202 will be defined. The data fields of this box and native 203 FLAC[3] structures encoded within FLAC blocks are both 204 stored in big-endian format, though for purposes of the 205 ISO BMFF container, FLAC native metadata and data blocks 206 are treated as unstructured octet streams. 207 208 The syntax and semantics of the FLAC Specific Box is shown 209 as follows. 210 211 class FLACMetadataBlock { 212 unsigned int(1) LastMetadataBlockFlag; 213 unsigned int(7) BlockType; 214 unsigned int(24) Length; 215 unsigned int(8) BlockData[Length]; 216 } 217 218 aligned(8) class FLACSpecificBox 219 extends FullBox('dfLa', version=0, 0){ 220 for (i=0; ; i++) { // to end of box 221 FLACMetadataBlock(); 222 } 223 } 224 225 + Version: 226 227 The Version field shall be set to 0. 228 229 In the future versions of this specification, this 230 field may be set to other values. And without support 231 of those values, the reader shall not read the fields 232 after this within the FLACSpecificBox. 233 234 + Flags: 235 236 The Flags field shall be set to 0. 237 238 After the FullBox header, the box contains a sequence of 239 FLAC[3] native-metadata block structures that fill the 240 remainder of the box. 241 242 Each FLACMetadataBlock structure consists of three fields 243 filling a total of four bytes that form a FLAC[3] native 244 METADATA_BLOCK_HEADER, followed by raw octet bytes that 245 comprise the FLAC[3] native METADATA_BLOCK_DATA. 246 247 + LastMetadataBlockFlag: 248 249 The LastMetadataBlockFlag field maps semantically to 250 the FLAC[3] native METADATA_BLOCK_HEADER 251 Last-metadata-block flag as defined in the FLAC[3] 252 file specification. 253 254 The LastMetadataBlockFlag is set to 1 if this 255 MetadataBlock is the last metadata block in the 256 FLACSpecificBox. It is set to 0 otherwise. 257 258 + BlockType: 259 260 The BlockType field maps semantically to the FLAC[3] 261 native METADATA_BLOCK_HEADER BLOCK_TYPE field as 262 defined in the FLAC[3] file specification. 263 264 The BlockType is set to a valid FLAC[3] BLOCK_TYPE 265 value that identifies the type of this native metadata 266 block. The BlockType of the first FLACMetadataBlock 267 must be set to 0, signifying this is a FLAC[3] native 268 METADATA_BLOCK_STREAMINFO block. 269 270 + Length: 271 272 The Length field maps semantically to the FLAC[3] 273 native METADATA_BLOCK_HEADER Length field as 274 defined in the FLAC[3] file specification. 275 276 The length field specifies the number of bytes of 277 MetadataBlockData to follow. 278 279 + BlockData 280 281 The BlockData field maps semantically to the FLAC[3] 282 native METADATA_BLOCK_HEADER METADATA_BLOCK_DATA as 283 defined in the FLAC[3] file specification. 284 285 Taken together, the bytes of the FLACMetadataBlock form a 286 complete FLAC[3] native METADATA_BLOCK structure. 287 288 Note that a minimum of a single FLACMetadataBlock, 289 consisting of a FLAC[3] native METADATA_BLOCK_STREAMINFO 290 structure, is required. Should the FLACSpecificBox 291 contain more than a single FLACMetadataBlock structure, 292 the FLACMetadataBlock containing the FLAC[3] native 293 METADATA_BLOCK_STREAMINFO must occur first in the list. 294 295 Other containers that package FLAC audio streams, such as 296 Ogg[4] and Matroska[5], wrap FLAC[3] native metadata without 297 modification similar to this specification. When 298 repackaging or remuxing FLAC[3] streams from another 299 format that contains FLAC[3] native metadata into an ISO 300 BMFF file, the complete FLAC[3] native metadata should be 301 preserved in the ISO BMFF stream as described above. It 302 is also allowed to parse this native metadata and include 303 contextually redundant ISO BMFF-native repackagings and/or 304 reparsings of FLAC[3] native metadata, so long as the 305 native metadata is also preserved. 306 307 3.3.3 Sample format 308 309 A FLAC sample is exactly one FLAC audio FRAME (as defined 310 in the FLAC[3] file specification) belonging to a FLAC 311 bitstreams. The FLAC sample data begins with a complete 312 FLAC FRAME_HEADER, followed by one FLAC SUBFRAME per 313 channel, any necessary bit padding, and ends with the 314 usual FLAC FRAME_FOOTER. 315 316 Note that the FLAC native FRAME_HEADER structure that 317 begins each FLAC sample redundantly encodes channel count, 318 sample rate, and sample size. The values of these fields 319 must agree both with the values declared in the FLAC 320 METADATA_BLOCK_STREAMINFO structure as well as the 321 FLACSampleEntry box. 322 323 3.3.4 Duration of a FLAC sample 324 325 The duration of any given FLAC sample is determined by 326 dividing the decoded block size of a FLAC frame, as 327 encoded in the FLAC FRAME's FRAME_HEADER structure, by the 328 value of the timescale field in the Media Header Box. 329 FLAC samples are permitted to have variable durations 330 within a given audio stream. FLAC does not use padding 331 values. 332 333 3.3.5 Sub-sample 334 335 Sub-samples are not defined for FLAC samples in this 336 specification. 337 338 3.3.6 Random Access 339 340 This subclause describes the nature of the random access 341 of FLAC sample. 342 343 3.3.6.1 Random Access Point 344 345 All FLAC samples can be independently decoded 346 i.e. every FLAC sample is a sync sample. The Sync 347 Sample Box shall not be present as long as there are 348 no samples other than FLAC samples in the same 349 track. The sample_is_non_sync_sample field for FLAC 350 samples shall be set to 0. 351 352 3.4 Basic Structure (informative) 353 354 3.4.1 Initial Movie 355 356 This subclause shows a basic structure of the Movie Box as follows: 357 358 +----+----+----+----+----+----+----+----+------------------------------+ 359 |moov| | | | | | | | Movie Box | 360 +----+----+----+----+----+----+----+----+------------------------------+ 361 | |mvhd| | | | | | | Movie Header Box | 362 +----+----+----+----+----+----+----+----+------------------------------+ 363 | |trak| | | | | | | Track Box | 364 +----+----+----+----+----+----+----+----+------------------------------+ 365 | | |tkhd| | | | | | Track Header Box | 366 +----+----+----+----+----+----+----+----+------------------------------+ 367 | | |edts|* | | | | | Edit Box | 368 +----+----+----+----+----+----+----+----+------------------------------+ 369 | | | |elst|* | | | | Edit List Box | 370 +----+----+----+----+----+----+----+----+------------------------------+ 371 | | |mdia| | | | | | Media Box | 372 +----+----+----+----+----+----+----+----+------------------------------+ 373 | | | |mdhd| | | | | Media Header Box | 374 +----+----+----+----+----+----+----+----+------------------------------+ 375 | | | |hdlr| | | | | Handler Reference Box | 376 +----+----+----+----+----+----+----+----+------------------------------+ 377 | | | |minf| | | | | Media Information Box | 378 +----+----+----+----+----+----+----+----+------------------------------+ 379 | | | | |smhd| | | | Sound Media Header Box | 380 +----+----+----+----+----+----+----+----+------------------------------+ 381 | | | | |dinf| | | | Data Information Box | 382 +----+----+----+----+----+----+----+----+------------------------------+ 383 | | | | | |dref| | | Data Reference Box | 384 +----+----+----+----+----+----+----+----+------------------------------+ 385 | | | | | | |url | | DataEntryUrlBox | 386 +----+----+----+----+----+----+ or +----+------------------------------+ 387 | | | | | | |urn | | DataEntryUrnBox | 388 +----+----+----+----+----+----+----+----+------------------------------+ 389 | | | | |stbl| | | | Sample Table | 390 +----+----+----+----+----+----+----+----+------------------------------+ 391 | | | | | |stsd| | | Sample Description Box | 392 +----+----+----+----+----+----+----+----+------------------------------+ 393 | | | | | | |fLaC| | FLACSampleEntry | 394 +----+----+----+----+----+----+----+----+------------------------------+ 395 | | | | | | | |dfLa| FLAC Specific Box | 396 +----+----+----+----+----+----+----+----+------------------------------+ 397 | | | | | |stts| | | Decoding Time to Sample Box | 398 +----+----+----+----+----+----+----+----+------------------------------+ 399 | | | | | |stsc| | | Sample To Chunk Box | 400 +----+----+----+----+----+----+----+----+------------------------------+ 401 | | | | | |stsz| | | Sample Size Box | 402 +----+----+----+----+----+ or +----+----+------------------------------+ 403 | | | | | |stz2| | | Compact Sample Size Box | 404 +----+----+----+----+----+----+----+----+------------------------------+ 405 | | | | | |stco| | | Chunk Offset Box | 406 +----+----+----+----+----+ or +----+----+------------------------------+ 407 | | | | | |co64| | | Chunk Large Offset Box | 408 +----+----+----+----+----+----+----+----+------------------------------+ 409 | |mvex|* | | | | | | Movie Extends Box | 410 +----+----+----+----+----+----+----+----+------------------------------+ 411 | | |trex|* | | | | | Track Extends Box | 412 +----+----+----+----+----+----+----+----+------------------------------+ 413 414 Figure 1 - Basic structure of Movie Box 415 416 It is strongly recommended that the order of boxes should 417 follow the above structure. Boxes marked with an asterisk 418 (*) may or may not be present depending on context. For 419 most boxes listed above, the definition is as is defined 420 in ISO/IEC 14496-12 [1]. The additional boxes and the 421 additional requirements, restrictions and recommendations 422 to the other boxes are described in this specification. 423 424 3.5 Example of Encapsulation (informative) 425 [File] 426 size = 17790 427 [ftyp: File Type Box] 428 position = 0 429 size = 24 430 major_brand = mp42 : MP4 version 2 431 minor_version = 0 432 compatible_brands 433 brand[0] = mp42 : MP4 version 2 434 brand[1] = isom : ISO Base Media file format 435 [moov: Movie Box] 436 position = 24 437 size = 757 438 [mvhd: Movie Header Box] 439 position = 32 440 size = 108 441 version = 0 442 flags = 0x000000 443 creation_time = UTC 2014/12/12, 18:41:19 444 modification_time = UTC 2014/12/12, 18:41:19 445 timescale = 48000 446 duration = 33600 (00:00:00.700) 447 rate = 1.000000 448 volume = 1.000000 449 reserved = 0x0000 450 reserved = 0x00000000 451 reserved = 0x00000000 452 transformation matrix 453 | a, b, u | | 1.000000, 0.000000, 0.000000 | 454 | c, d, v | = | 0.000000, 1.000000, 0.000000 | 455 | x, y, w | | 0.000000, 0.000000, 1.000000 | 456 pre_defined = 0x00000000 457 pre_defined = 0x00000000 458 pre_defined = 0x00000000 459 pre_defined = 0x00000000 460 pre_defined = 0x00000000 461 pre_defined = 0x00000000 462 next_track_ID = 2 463 [iods: Object Descriptor Box] 464 position = 140 465 size = 33 466 version = 0 467 flags = 0x000000 468 [tag = 0x10: MP4_IOD] 469 expandableClassSize = 16 470 ObjectDescriptorID = 1 471 URL_Flag = 0 472 includeInlineProfileLevelFlag = 0 473 reserved = 0xf 474 ODProfileLevelIndication = 0xff 475 sceneProfileLevelIndication = 0xff 476 audioProfileLevelIndication = 0xfe 477 visualProfileLevelIndication = 0xff 478 graphicsProfileLevelIndication = 0xff 479 [tag = 0x0e: ES_ID_Inc] 480 expandableClassSize = 4 481 Track_ID = 1 482 [trak: Track Box] 483 position = 173 484 size = 608 485 [tkhd: Track Header Box] 486 position = 181 487 size = 92 488 version = 0 489 flags = 0x000007 490 Track enabled 491 Track in movie 492 Track in preview 493 creation_time = UTC 2014/12/12, 18:41:19 494 modification_time = UTC 2014/12/12, 18:41:19 495 track_ID = 1 496 reserved = 0x00000000 497 duration = 33600 (00:00:00.700) 498 reserved = 0x00000000 499 reserved = 0x00000000 500 layer = 0 501 alternate_group = 0 502 volume = 1.000000 503 reserved = 0x0000 504 transformation matrix 505 | a, b, u | | 1.000000, 0.000000, 0.000000 | 506 | c, d, v | = | 0.000000, 1.000000, 0.000000 | 507 | x, y, w | | 0.000000, 0.000000, 1.000000 | 508 width = 0.000000 509 height = 0.000000 510 [mdia: Media Box] 511 position = 273 512 size = 472 513 [mdhd: Media Header Box] 514 position = 281 515 size = 32 516 version = 0 517 flags = 0x000000 518 creation_time = UTC 2014/12/12, 18:41:19 519 modification_time = UTC 2014/12/12, 18:41:19 520 timescale = 48000 521 duration = 34560 (00:00:00.720) 522 language = und 523 pre_defined = 0x0000 524 [hdlr: Handler Reference Box] 525 position = 313 526 size = 51 527 version = 0 528 flags = 0x000000 529 pre_defined = 0x00000000 530 handler_type = soun 531 reserved = 0x00000000 532 reserved = 0x00000000 533 reserved = 0x00000000 534 name = Xiph Audio Handler 535 [minf: Media Information Box] 536 position = 364 537 size = 381 538 [smhd: Sound Media Header Box] 539 position = 372 540 size = 16 541 version = 0 542 flags = 0x000000 543 balance = 0.000000 544 reserved = 0x0000 545 [dinf: Data Information Box] 546 position = 388 547 size = 36 548 [dref: Data Reference Box] 549 position = 396 550 size = 28 551 version = 0 552 flags = 0x000000 553 entry_count = 1 554 [url : Data Entry Url Box] 555 position = 412 556 size = 12 557 version = 0 558 flags = 0x000001 559 location = in the same file 560 [stbl: Sample Table Box] 561 position = 424 562 size = 321 563 [stsd: Sample Description Box] 564 position = 432 565 size = 79 566 version = 0 567 flags = 0x000000 568 entry_count = 1 569 [fLaC: Audio Description] 570 position = 448 571 size = 63 572 reserved = 0x000000000000 573 data_reference_index = 1 574 reserved = 0x0000 575 reserved = 0x0000 576 reserved = 0x00000000 577 channelcount = 2 578 samplesize = 16 579 pre_defined = 0 580 reserved = 0 581 samplerate = 48000.000000 582 [dfLa: FLAC Specific Box] 583 position = 484 584 size = 50 585 version = 0 586 flags = 0x000000 587 [FLACMetadataBlock] 588 LastMetadataBlockFlag = 1 589 BlockType = 0 590 Length = 34 591 BlockData[34]; 592 [stts: Decoding Time to Sample Box] 593 position = 492 594 size = 24 595 version = 0 596 flags = 0x000000 597 entry_count = 1 598 entry[0] 599 sample_count = 18 600 sample_delta = 1920 601 [stsc: Sample To Chunk Box] 602 position = 516 603 size = 40 604 version = 0 605 flags = 0x000000 606 entry_count = 2 607 entry[0] 608 first_chunk = 1 609 samples_per_chunk = 13 610 sample_description_index = 1 611 entry[1] 612 first_chunk = 2 613 samples_per_chunk = 5 614 sample_description_index = 1 615 [stsz: Sample Size Box] 616 position = 556 617 size = 92 618 version = 0 619 flags = 0x000000 620 sample_size = 0 (variable) 621 sample_count = 18 622 entry_size[0] = 977 623 entry_size[1] = 938 624 entry_size[2] = 939 625 entry_size[3] = 938 626 entry_size[4] = 934 627 entry_size[5] = 945 628 entry_size[6] = 948 629 entry_size[7] = 956 630 entry_size[8] = 955 631 entry_size[9] = 930 632 entry_size[10] = 933 633 entry_size[11] = 934 634 entry_size[12] = 972 635 entry_size[13] = 977 636 entry_size[14] = 958 637 entry_size[15] = 949 638 entry_size[16] = 962 639 entry_size[17] = 848 640 [stco: Chunk Offset Box] 641 position = 648 642 size = 24 643 version = 0 644 flags = 0x000000 645 entry_count = 2 646 chunk_offset[0] = 686 647 chunk_offset[1] = 12985 648 [free: Free Space Box] 649 position = 672 650 size = 8 651 [mdat: Media Data Box] 652 position = 680 653 size = 17001 654 6554 Acknowledgements 656 657 This spec draws heavily from the Opus-in-ISOBMFF specification 658 work done by Yusuke Nakamura <muken.the.vfrmaniac |at| gmail.com> 659 660 Thank you to Tim Terriberry, David Evans, and Yusuke Nakamura 661 for valuable feedback. Thank you to Ralph Giles for editorial 662 help. 663 6645 Author Address 665 666 Monty Montgomery <cmontgomery@mozilla.com> 667