<<

Private Draft Document. Not for Publication

Interoperable Master Format (IMF) Specification

April 8, 2010 Version 0.7r DRAFT

Copyright  2009, 2010 by Entertainment Technology Center 509 West 29th Street Los Angeles, California 90007 United States of America 213-743-1600 www.etcenter.org

1 Private Draft Document. Not for Publication

NOTICE

The Entertainment Technology Center (ETC) is the author and creator of this specification for the purpose of copyright and other laws in all countries throughout the world. The ETC copyright notice must be included in all reproductions, whether in whole or in part, and may not be deleted or attributed to others. ETC hereby grants to its members and their suppliers a limited license to reproduce this specification for their own use, provided it is not sold. Others should obtain permission to reproduce this specification from Entertainment Technology Center; Attn: CEO and Executive Director; 509 West 29th Street, Los Angeles, California 90007; USA; (213) 743-1600 (voice); (213) 743-1803 facsimile.

This document is a specification developed and adopted by ETC. This document may be revised by ETC. It is intended solely as a guide for companies interested in developing products, which can be compatible with other products, developed using this document. Each ETC Member Company shall decide independently the extent to which it will utilize, or require adherence to, these specifications. ETC shall not be liable for any exemplary, incidental, proximate or consequential damages or expenses arising from the use of this document. This document defines only one approach to compatibility, and other approaches may be available to the industry.

This document is an authorized and approved publication of ETC. Only ETC has the right and authority to revise or change the material contained in this document, and any revisions by any party other than ETC are unauthorized and prohibited.

Compliance with this document may require use of one or more features covered by proprietary rights (such as features which are the subject of a patent, patent application, copyright, mask work right or trade secret right). By publication of this document, no position is taken by ETC with respect to the validity or infringement of any patent or other proprietary right. ETC hereby expressly disclaims any liability for infringement of intellectual property rights of others by virtue of the use of this document. ETC has not and does not investigate any notices or allegations of infringement prompted by publication of any ETC document, nor does ETC undertake a duty to advise users or potential users of ETC documents of such notices or allegations. ETC hereby expressly advises all users or potential users of this document to investigate and analyze any potential infringement situation, seek the advice of intellectual property counsel, and, if indicated, obtain a license under any applicable intellectual property right or take the necessary steps to avoid infringement of any intellectual property right. ETC expressly disclaims any intent to promote infringement of any intellectual property right by virtue of the evolution, adoption, or publication of this document.

2 Private Draft Document. Not for Publication

Revision History

Date Revision Description Contributors 02/9/09 v0.1 Initial draft Howard Lukk 02/13/09 V0.2 1st Rev, Name Change, revise framework chapter IMF Tech Committee, H. Lukk order and headings. Spell check and general clean up 3/13/09 V0.3 Chapter 10 change to Asset Management, Chapter 5 IMF Tech Committee, H. Lukk change to requirements instead of standard, Document clean up, Chapter 1 review and update, new Annex 6/16/09 V0.41 Update Chapter 1 along with formatting, Complete IMF Tech Committee, H. Lukk re-write of Chapter 2, re label chapters, re structure framework of document 8/4/09 V0.42 Update Figures, Add Table of Tables, Update IMF Tech Committee, H. Lukk Chapter 2., Remove “Clip” from document, Update Chapter 7 to reflect Chapter 2., Updated Glossary 11/6/09 V0.5 Update Chapter 2 and 7. Minor updates throughout IMF Tech Committee, H. Lukk document to remove Dynamic Downmixing and Audio Transitions. Inserted new Chapter 8 for Output Profile List 12/16/09 V0.5d Updates to Chapter 3 (audio draft for essence B. Vessa, A. Chang, H. Lukk and chapter + general notes), 7 (Composition), 8 (Output N. Goodkin Profile List), 9 (Packaging), and the addition of Chapter 10 (Annex), plus spelling and other minor corrections.

12/22/09 V0.6a Changes agreed during the Dec 18 document review, S. Levenson, N. Goodkin recorded by Seth Levenson during the meeting. Composition Play List (CPL) example added as Chapter 12. Moved comments out of the body and into comments. Made all "TBD" text red. Added styles for DCI Normal 1, 2 and 3 corresponding to text under paragraphs DCI Heading 1, 2, and 3. Formatted all tables to have similar headings and fonts. 12/24/09 V0.6b Changed Table Captions to "automatically N. Goodkin generated," thus automatically numbered. Changed Table of Tables to an Inserted Reference Table instead of a table generated by styles. Added box borders to all figures. 12/27/09 V0.6c Moved Image Metadata Data Elements table and text N. Goodkin from 3.2.1.1 to 3.2.3.1.

Moved SL12 text to Chapter 13, Annex - Ideal Implementation

Inserted a reference to the Essence Chapter, according to BV15.

Replaced all references to "Audio Configuration" with "Soundfield Configuration"

Rewrote the Audio section using headings and

3 Private Draft Document. Not for Publication

Date Revision Description Contributors

bullets, and using 'shall' and 'may.'

Changed all instances of "is required to" to "shall."

Minor grammatical and sentence structure changes.

01/11/10 V0.6d Updated the figure numbers and deleted the dynamic down-mixing section. 01/13/10 V0.6d-HL Section 2, System Overview figures 5, 6 H Lukk replacement.

Section 3, Essence: Changes to wording, insertions, deletions and formatting. Addition of comments.

Section 4, Data Essence: Comments, formatting and updated paragraph references.

Section 5, Dynamic Metadata: Comments, deletions and formatting.

Section 6, Wrapping: Comment, formatting, figure replacement.

Section 7, The Composition: formatting, figure 12, 14 replacement. 1/13/10 V0.6e-HL Accepted HL's formatting changes and updated N Goodkin automatic table numbering and table of tables. 1/19/10 V0.6f Comments and changes captured from the group B Vessa, S Levenson, N Goodkin review on 1-15-2010. Accepted all changes except where accompanied by a comment. Added BV's Audio Structural Metadata table, including comments. 1/20/10 V0.6g Corrected internal Word error. N Goodkin

1/26/10 V0.7a Changes from 1/15 meeting, plus DG’s comments, S Levenson, N Goodkin changed captions on figures to automatically number and made all references links to the figures.

Moved 3.4.2.4 to Metadata

Scrubbed Chapter 4 to get rid of cut & paste errors & use consistent styles.

TBD: Move Audio Section – chapter 7 from Audio Essence – HL will work with Seth off-line. 2/02/10 V0.7b Section 7.11.4.1 changes B Baggelaar 2/04/10 V0.7c Changed reference for Broadcast Wave to B Vessa Recommendation ITU-R BR.1352-3 (2007). Added comments in red under the "samples per frame" table in 3.4.1.5.

4 Private Draft Document. Not for Publication

2/04/10 V0.7d Added working document from the OPL group: A Ramamurthy overview and XML examples 2/05/10 V0.7e Stereoscopic additions for Composition, Metadata, A Chang Dynamic Metadata and OPL 2/05/10 V0.7f Remainder of Stereoscopic additions from A Chang S Levenson 2/05/10 V0.7g Miscellaneous edits to clean up the doc and N Goodkin complete changes initiated during the Tech Committee conference call today. 2/19/10 V0.7h Miscellaneous edits N Goodkin 3/02/10 V0.7i Changes made during the 2/26/10 conference call. S Levenson, N Goodkin Took out Bit-Depth, kept Quality Layers, updated fractional samples per frame, incorporated & redistributed B Vessa's comments. Saved as a Word document, type .doc. 3/08/10 V0.7j “Fractional samples only allow audio editorial N Goodkin granularity of once very five frames.” Updated the CPL section to address this as well (7.4.4) Ref: 3.3.1.6 Updated the description of stereoscopic tracks. 3/09/10 V0.7k Changes made during the Tech Meeting: Accepted S Levenson changes to Stereoscopic Content, removed Bit Depth, changed the note about JPEG 2000 supporting IMF requirements, and updated 3.3.2.2 to refer to DPX (SMPTE 268-2005). 3/18/10 V0.7m 2.2.2.5 IMF replaces DSM in the text B Vessa

2.2.2.8 ...encapsulating the essence AND data into well understood......

3.3.2.1 pulled the text regarding the exact codec out of the end of 3.3.2.1.1 (was bulleted), and put it into a separate sub section, 3.3.2.1.2, "Image Compression Codecs".

3.3.3.1 Fixed reference to "table 3".

3.4.1.4 added text for the fractional frame rates in the table that references SMPTE 382M.

3.4.1.4 Pulled out the audio group text into a comment.

3.4.1.12.1 Added comment to discuss moving to the CPL section

3.4.2 In the table, changed a word in the example for Audio Content: "Hearing Impaired" to "Visually Impaired"

5.3.6 Added comment that this needs rewording

5.5.4 Added comment

5.6.4 Added comment

5 Private Draft Document. Not for Publication

6.6 Added comment regarding sample rate

6.9.8.5 Added comment regarding fractional frame rates

6.9.8.6 Added comment regarding 382M encoding.

6.12.1 Added comment regarding rewording the section to indicate it's for a sequence

6.12.7.4.1 Added comment regarding label reference

7.3.2.1 Added comment

8.1.2 Changed to "which may invoke predefined processes or pass specific parameters to an external system" 03/24/10 V0.7n Updates agreed during the 3/19 face-to-face. S Levenson, N Goodkin, M Smith Changed Figure 7 in 3.3 Image Specification, 3.3.1.2 Pixel , 3.3.2.1.1 added image quality layers, and 3.3.2.1.2 Image Compression Codecs. Combined 3.3.1.3 "Raster Format and 3.2.2 Frame Rate" paragraphs in the Essence section, and replaced frame rate specifications in the following sections, with references to 3.2.2 Frame Rate: 3.4.1.3 Frame Rate/Audio Speed, 4.1.4.2 Data Essence, 5.1.4.2 Dynamic Metadata, and 6.2.2 Wrapping. Replaced references to "Color Correction" with "Color Transforms." 4/01/01 V0.7p Changes from Technical Committee conference call S Levenson, N Goodkin on 3/31:

Deleted "dynamic down-mixing and dynamic range control" from 2.1.1.

Moved "common file format" information into 3.2.1 Common Essence File Formats and added cross- references to 4.1.4.1 Data Essence, 5.1.4.1 Dynamic Metadata, and 6.2.1 Common File Formats. 4/07/10 V0.7r Replaced Figure 16 - Future State - Mastering & B Vessa Distribution Servicing, Figure 17 - Example IMF Workflow

Replaced Section 5, Wrapping H Lukk

Replaced OPL XML examples. Moved examples to A Ramamurthy 11. Annex - Output Profile List XML Examples.

Section 4.3 Specification (Dynamic A Chang Metadata)

6 Private Draft Document. Not for Publication Reviewed By

Review Type Reviewed By Date Reviewed

Approval / Sign Off

Name Signature Date

7 Private Draft Document. Not for Publication Table of Contents

1. IMF INTRODUCTION...... 22

1.1. INTRODUCTION...... 22 1.1. INTRODUCTION...... 22

1.2. SCOPE...... 22 1.2. SCOPE...... 22

1.3. SYSTEM OBJECTIVES...... 22 1.3. SYSTEM OBJECTIVES...... 22

1.4. DOCUMENT LANGUAGE...... 23 1.4. DOCUMENT LANGUAGE...... 23 2. SYSTEM OVERVIEW...... 24

1.5. FUNCTIONAL FRAMEWORK ...... 24 1.5. FUNCTIONAL FRAMEWORK ...... 24

1.6. OVERVIEW...... 25 1.6. OVERVIEW...... 25 1.6.1. IMF System Architecture...... 25 1.6.1. IMF SYSTEM ARCHITECTURE...... 25 1.6.2. Major System Key Concepts...... 26 1.6.2. MAJOR SYSTEM KEY CONCEPTS...... 26 1.6.2.1. Business to Business Solution...... 26 1.6.2.1. Business to Business Solution...... 26 1.6.2.2. Digital Source Master (DSM)...... 26 1.6.2.2. Digital Source Master (DSM)...... 26 1.6.2.3. Archive not in Scope...... 26 1.6.2.3. Archive not in Scope...... 26 1.6.2.4. File / Frame-Based System...... 27 1.6.2.4. File / Frame-Based System...... 27 1.6.2.5. Essence and Data Essence...... 27 1.6.2.5. Essence and Data Essence...... 27 1.6.2.6. Metadata...... 27 1.6.2.6. Metadata...... 27 1.6.2.7. Security...... 27 1.6.2.7. Security...... 27 1.6.2.8. Wrapping and Track File Creation...... 27 1.6.2.8. Wrapping and Track File Creation...... 27 1.6.2.9. Composition...... 28 1.6.2.9. Composition...... 28 1.6.2.10. Versions...... 28 1.6.2.10. Versions...... 28 1.6.2.11. Sequence...... 28 1.6.2.11. Sequence...... 28 1.6.2.12. Output Profile List...... 29 1.6.2.12. Output Profile List...... 29 1.6.2.13. Packages...... 29 1.6.2.13. Packages...... 29 1.7. IMF ELEMENTS AND PROCESSES...... 31 1.7. IMF ELEMENTS AND PROCESSES...... 31 2. ESSENCE...... 32

8 Private Draft Document. Not for Publication

2.1. OVERVIEW...... 32 2.1. OVERVIEW...... 32 2.1.1. Introduction...... 32 2.1.1. INTRODUCTION...... 32 2.1.2. Essence System Overview...... 32 2.1.2. ESSENCE SYSTEM OVERVIEW...... 32 2.2. ESSENCE FUNDAMENTAL REQUIREMENTS...... 32 2.2. ESSENCE FUNDAMENTAL REQUIREMENTS...... 32 2.2.1. Common Essence File Formats...... 32 2.2.1. COMMON ESSENCE FILE FORMATS...... 32 2.2.2. Frame Rates and Synchronization...... 32 2.2.2. FRAME RATES AND SYNCHRONIZATION...... 32 2.3. IMAGE SPECIFICATION...... 34 2.3. IMAGE SPECIFICATION...... 34 2.3.1. Image Structure...... 34 2.3.1. IMAGE STRUCTURE...... 34 2.3.1.1. Image Container, Active Image, ...... 34 2.3.1.1. Image Container, Active Image, Pixel Aspect Ratio...... 34 2.3.1.2. Pixel Aspect Ratio...... 35 2.3.1.2. Pixel Aspect Ratio...... 35 2.3.1.3. Bit Depth...... 35 2.3.1.3. Bit Depth...... 35 2.3.1.4. Chroma Subsampling...... 35 2.3.1.4. Chroma Subsampling...... 35 2.3.1.5. Color Space, Color Space Gamuts...... 35 2.3.1.5. Color Space, Color Space Gamuts...... 35 2.3.1.6. Stereoscopic Content...... 35 2.3.1.6. Stereoscopic Content...... 35 2.3.2. Compression Requirements...... 35 2.3.2. COMPRESSION REQUIREMENTS...... 35 2.3.2.1. Image Compression Requirements...... 35 2.3.2.1. Image Compression Requirements...... 35 3.3.2.1.1 Intra-Frame Compression...... 36 3.3.2.1.1 Intra-Frame Compression...... 36 3.3.2.1.2 Image Compression Codecs...... 36 3.3.2.1.2 Image Compression Codecs...... 36 2.3.2.2. Uncompressed Formats...... 36 2.3.2.2. Uncompressed Formats...... 36 2.3.3. Structural Metadata...... 37 2.3.3. STRUCTURAL METADATA...... 37 2.3.3.1. Image Metadata Required Fields...... 37 2.3.3.1. Image Metadata Required Fields...... 37 2.3.3.2. Stereoscopic Metadata Required Fields...... 37 2.3.3.2. Stereoscopic Metadata Required Fields...... 37 2.4. AUDIO SPECIFICATION...... 38 2.4. AUDIO SPECIFICATION...... 38 2.4.1. Audio Structure...... 38 2.4.1. AUDIO STRUCTURE...... 38 2.4.1.1. Audio File Format...... 38 2.4.1.1. Audio File Format...... 38 2.4.1.2. Sampling Rate...... 39 2.4.1.2. Sampling Rate...... 39 2.4.1.3. Frame Rate/Audio Speed...... 39 2.4.1.3. Frame Rate/Audio Speed...... 39

9 Private Draft Document. Not for Publication

2.4.1.4. Allowable Samples per Frame...... 39 2.4.1.4. Allowable Samples per Frame...... 39 2.4.1.5. Audio Bit Depth...... 39 2.4.1.5. Audio Bit Depth...... 39 2.4.1.6. Audio Track File Content...... 40 2.4.1.6. Audio Track File Content...... 40 2.4.1.7. Audio Track File Content Constraint...... 40 2.4.1.7. Audio Track File Content Constraint...... 40 2.4.1.8. Audio Track File Language Constraint...... 40 2.4.1.8. Audio Track File Language Constraint...... 40 2.4.1.9. Track File Loudspeaker Channel Content Constraint: ...... 40 2.4.1.9. Track File Loudspeaker Channel Content Constraint: ...... 40 2.4.1.10. Channels Per Track File Constraint...... 40 2.4.1.10. Channels Per Track File Constraint...... 40 2.4.1.11. Simultaneous Multiple Audio Track File Availability and Playout...... 40 2.4.1.11. Simultaneous Multiple Audio Track File Availability and Playout...... 40 2.4.1.12. Multiple Audio Track File routing...... 40 2.4.1.12. Multiple Audio Track File routing...... 40 2.4.1.13. Mixing of audio channels...... 40 2.4.1.13. Mixing of audio channels...... 40 2.4.1.14. Audio Elements Examples...... 41 2.4.1.14. Audio Elements Examples...... 41 2.4.1.15. Soundfield Configurations...... 42 2.4.1.15. Soundfield Configurations...... 42 2.4.1.16. Channel layout and mapping...... 42 2.4.1.16. Channel layout and mapping...... 42 2.4.2. Audio Metadata Data Elements...... 42 2.4.2. AUDIO METADATA DATA ELEMENTS...... 42 3. DATA ESSENCE...... 44

3.1. OVERVIEW...... 44 3.1. OVERVIEW...... 44 3.1.1. Introduction...... 44 3.1.1. INTRODUCTION...... 44 3.1.2. Data Essence System Overview...... 44 3.1.2. DATA ESSENCE SYSTEM OVERVIEW...... 44 3.1.3. Major Data Essence Concepts...... 44 3.1.3. MAJOR DATA ESSENCE CONCEPTS...... 44 3.1.4. Data Essence Fundamental Requirements...... 44 3.1.4. DATA ESSENCE FUNDAMENTAL REQUIREMENTS...... 44 3.1.4.1. Common File Formats...... 44 3.1.4.1. Common File Formats...... 44 3.1.4.2. Synchronization...... 44 3.1.4.2. Synchronization...... 44 3.2. SUBTITLE SPECIFICATION...... 44 3.2. SUBTITLE SPECIFICATION...... 44 3.2.1. Sub picture...... 44 3.2.1. SUB PICTURE...... 44 3.2.2. Timed Text...... 44 3.2.2. TIMED TEXT...... 44 3.3. CAPTION SPECIFICATION...... 44 3.3. CAPTION SPECIFICATION...... 44 3.3.1. Sub picture...... 44 3.3.1. SUB PICTURE...... 44 3.3.2. Timed Text...... 44

10 Private Draft Document. Not for Publication

3.3.2. TIMED TEXT...... 44 4. DYNAMIC METADATA...... 45

4.1. OVERVIEW...... 45 4.1. OVERVIEW...... 45 4.1.1. Introduction...... 45 4.1.1. INTRODUCTION...... 45 4.1.2. Dynamic Metadata System Overview...... 45 4.1.2. DYNAMIC METADATA SYSTEM OVERVIEW...... 45 4.1.3. Major Dynamic Metadata Concepts...... 45 4.1.3. MAJOR DYNAMIC METADATA CONCEPTS...... 45 4.1.4. Dynamic Metadata Fundamental Requirements...... 45 4.1.4. DYNAMIC METADATA FUNDAMENTAL REQUIREMENTS...... 45 4.1.4.1. Common File Formats...... 45 4.1.4.1. Common File Formats...... 45 4.2. TIME CODE SPECIFICATION...... 45 4.2. TIME CODE SPECIFICATION...... 45

4.3. PAN AND SCAN SPECIFICATION...... 45 4.3. PAN AND SCAN SPECIFICATION...... 45 4.3.1.1. Basic Pan and Scan Requirements...... 45 4.3.1.1. Basic Pan and Scan Requirements...... 45 4.3.1.2. Pan and Scan Metadata Required Fields...... 46 4.3.1.2. Pan and Scan Metadata Required Fields...... 46 5. WRAPPING...... 51

5.1. OVERVIEW...... 51 5.1. OVERVIEW...... 51 5.1.1. Introduction...... 51 5.1.1. INTRODUCTION...... 51 5.2. WRAPPING REQUIREMENTS...... 51 5.2. WRAPPING REQUIREMENTS...... 51 5.2.1. Introduction...... 51 5.2.1. INTRODUCTION...... 51 5.2.2. Format Information...... 52 5.2.2. FORMAT INFORMATION...... 52 5.2.3. Metadata...... 52 5.2.3. METADATA...... 52 5.2.4. Track File Replacement...... 52 5.2.4. TRACK FILE REPLACEMENT...... 52 5.2.5. Synchronization...... 52 5.2.5. SYNCHRONIZATION...... 52 5.2.6. Splicing...... 52 5.2.6. SPLICING...... 52 5.2.7. Security...... 52 5.2.7. SECURITY...... 52 5.2.8. Extensibility...... 52 5.2.8. EXTENSIBILITY...... 52 5.2.9. Simple Essence...... 52 5.2.9. SIMPLE ESSENCE...... 52 5.3. MXF TRACK FILE ENCRYPTION...... 52 5.3. MXF TRACK FILE ENCRYPTION...... 52 5.3.1. Introduction...... 52

11 Private Draft Document. Not for Publication

5.3.1. INTRODUCTION...... 52 5.3.2. Standards...... 52 5.3.2. STANDARDS...... 52 5.4. IMAGE TRACK FILE...... 53 5.4. IMAGE TRACK FILE...... 53 5.4.1. Introduction...... 53 5.4.1. INTRODUCTION...... 53 5.4.2. Frame Boundaries...... 53 5.4.2. FRAME BOUNDARIES...... 53 5.4.3. Compression...... 53 5.4.3. COMPRESSION...... 53 5.4.4. Standards...... 53 5.4.4. STANDARDS...... 53 5.5. AUDIO TRACK FILE...... 53 5.5. AUDIO TRACK FILE...... 53 5.5.1. Introduction...... 53 5.5.1. INTRODUCTION...... 53 5.5.2. Standards...... 53 5.5.2. STANDARDS...... 53 5.5.3. Metadata...... 53 5.5.3. METADATA...... 53 5.6. TIMED TEXT TRACK FILE...... 53 5.6. TIMED TEXT TRACK FILE...... 53 5.6.1. Introduction...... 53 5.6.1. INTRODUCTION...... 53 5.6.2. Standards...... 54 5.6.2. STANDARDS...... 54 5.7. TIME CODE TRACK FILES (OPTIONAL)...... 54 5.7. TIME CODE TRACK FILES (OPTIONAL)...... 54 5.7.1. Introduction...... 54 5.7.1. INTRODUCTION...... 54 5.7.2. Frame Boundaries...... 54 5.7.2. FRAME BOUNDARIES...... 54 5.7.3. Metadata...... 54 5.7.3. METADATA...... 54 7. THE COMPOSITION...... 55

5.8. INTRODUCTION...... 55 5.8. INTRODUCTION...... 55

5.9. FUNCTIONAL FRAMEWORK...... 55 5.9. FUNCTIONAL FRAMEWORK...... 55

5.10. THE COMPOSITION FUNDAMENTAL REQUIREMENTS...... 55 5.10. THE COMPOSITION FUNDAMENTAL REQUIREMENTS...... 55 5.10.1. Open Standard...... 55 5.10.1. OPEN STANDARD...... 55 5.10.2. Interoperable...... 55 5.10.2. INTEROPERABLE...... 55 5.10.3. Scalable...... 55 5.10.3. SCALABLE...... 55 5.10.4. Extensible...... 55 5.10.4. EXTENSIBLE...... 55

12 Private Draft Document. Not for Publication

5.10.5. Synchronization...... 55 5.10.5. SYNCHRONIZATION...... 55 5.10.6. Human Readable Metadata...... 55 5.10.6. HUMAN READABLE METADATA...... 55 5.10.7. File Format...... 55 5.10.7. FILE FORMAT...... 55 5.11. CPL CONSTRAINTS...... 56 5.11. CPL CONSTRAINTS...... 56 5.11.1. Constant Frame and Sample Rate...... 56 5.11.1. CONSTANT FRAME AND SAMPLE RATE...... 56 5.11.2. Minimum Track File Duration...... 56 5.11.2. MINIMUM TRACK FILE DURATION...... 56 5.11.3. Minimum Sequence Duration...... 56 5.11.3. MINIMUM SEQUENCE DURATION...... 56 5.11.4. Fractional Sample Editorial Granularity...... 56 5.11.4. FRACTIONAL SAMPLE EDITORIAL GRANULARITY...... 56 5.11.5. Audio Items...... 56 5.11.5. AUDIO ITEMS...... 56 5.11.5.1. Partitions...... 56 5.11.5.1. Partitions...... 56 5.11.5.2. Audio Editing Granularity...... 57 5.11.5.2. Audio Editing Granularity...... 57 5.11.5.3. Crossfades...... 57 5.11.5.3. Crossfades...... 57 5.11.5.4. Audio Editing and Transitions...... 57 5.11.5.4. Audio Editing and Transitions...... 57 5.11.5.5. Audio Insert Considerations...... 57 5.11.5.5. Audio Insert Considerations...... 57 5.12. STEREOSCOPIC CONTENT...... 57 5.12. STEREOSCOPIC CONTENT...... 57

5.13. TERMINOLOGY...... 57 5.13. TERMINOLOGY...... 57

5.14. SYNCHRONIZATION...... 58 5.14. SYNCHRONIZATION...... 58

5.15. COMPOSITION PLAYLIST STRUCTURE...... 60 5.15. COMPOSITION PLAYLIST STRUCTURE...... 60 5.15.1. General Information...... 60 5.15.1. GENERAL INFORMATION...... 60 5.16. CPL ELEMENT REQUIREMENTS...... 61 5.16. CPL ELEMENT REQUIREMENTS...... 61 5.16.1. Unique Id...... 61 5.16.1. UNIQUE ID...... 61 5.16.2. AnnotationText [optional]...... 61 5.16.2. ANNOTATIONTEXT [OPTIONAL]...... 61 5.16.3. IssueDate...... 61 5.16.3. ISSUEDATE...... 61 5.16.3.1. Issuer [optional]...... 61 5.16.3.1. Issuer [optional]...... 61 5.16.4. Creator [optional]...... 61 5.16.4. CREATOR [OPTIONAL]...... 61 5.16.5. ContentTitleText...... 61 5.16.5. CONTENTTITLETEXT...... 61

13 Private Draft Document. Not for Publication

5.16.6. ContentKind...... 61 5.16.6. CONTENTKIND...... 61 5.16.7. ContentVersion...... 62 5.16.7. CONTENTVERSION...... 62 5.16.7.1. Id...... 62 5.16.7.1. Id...... 62 5.16.7.2. LabelText...... 62 5.16.7.2. LabelText...... 62 5.16.8. ContentDescription...... 62 5.16.8. CONTENTDESCRIPTION...... 62 5.16.8.1. SourceMediaDescription [optional]...... 62 5.16.8.1. SourceMediaDescription [optional]...... 62 5.16.8.2. FrameRate ...... 63 5.16.8.2. FrameRate ...... 63 5.16.8.3. TimecodeType [optional]...... 63 5.16.8.3. TimecodeType [optional]...... 63 5.16.8.4. ImageEncodingStandard [optional]...... 63 5.16.8.4. ImageEncodingStandard [optional]...... 63 5.16.8.5. StandardsBody [optional]...... 63 5.16.8.5. StandardsBody [optional]...... 63 5.16.8.6. Label [optional]...... 63 5.16.8.6. Label [optional]...... 63 5.16.8.7. AudioSamplingRate...... 63 5.16.8.7. AudioSamplingRate...... 63 5.16.8.8. AudioSamplesPerFrame...... 63 5.16.8.8. AudioSamplesPerFrame...... 63 5.16.8.9. SoundfieldConfig...... 63 5.16.8.9. SoundfieldConfig...... 63 5.16.8.10. AudioBitDepth...... 63 5.16.8.10. AudioBitDepth...... 63 5.16.8.11. SubtitleFormat [optional]...... 64 5.16.8.11. SubtitleFormat [optional]...... 64 5.16.8.12. CaptionFormat [optional]...... 64 5.16.8.12. CaptionFormat [optional]...... 64 5.16.9. TotalRunningTime...... 64 5.16.9. TOTALRUNNINGTIME...... 64 5.16.10. Language [optional]...... 64 5.16.10. LANGUAGE [OPTIONAL]...... 64 5.16.11. Country [optional]...... 64 5.16.11. COUNTRY [OPTIONAL]...... 64 5.16.12. RatingList [optional]...... 64 5.16.12. RATINGLIST [OPTIONAL]...... 64 5.16.12.1. Agency...... 64 5.16.12.1. Agency...... 64 5.16.12.2. Label...... 64 5.16.12.2. Label...... 64 5.16.13. Encryption...... 64 5.16.13. ENCRYPTION...... 64 5.16.14. SequenceList...... 64 5.16.14. SEQUENCELIST...... 64 5.16.15. Signer [optional]...... 65 5.16.15. SIGNER [OPTIONAL]...... 65 5.16.16. Signature [optional]...... 65 5.16.16. SIGNATURE [OPTIONAL]...... 65 5.17. SEQUENCE STRUCTURE...... 66 5.17. SEQUENCE STRUCTURE...... 66

5.18. SEQUENCE INFORMATION...... 66

14 Private Draft Document. Not for Publication

5.18. SEQUENCE INFORMATION...... 66 5.18.1. SequenceType...... 67 5.18.1. SEQUENCETYPE...... 67 5.18.2. Unique Id...... 67 5.18.2. UNIQUE ID...... 67 5.18.3. AnnotationText [optional]...... 67 5.18.3. ANNOTATIONTEXT [OPTIONAL]...... 67 5.18.4. SeqIssueDate...... 67 5.18.4. SEQISSUEDATE...... 67 5.18.5. Issuer [optional]...... 67 5.18.5. ISSUER [OPTIONAL]...... 67 5.18.6. AssetList...... 67 5.18.6. ASSETLIST...... 67 5.18.6.1. MainMarkers [optional]...... 67 5.18.6.1. MainMarkers [optional]...... 67 5.18.6.2. MainPicture [optional]...... 67 5.18.6.2. MainPicture [optional]...... 67 5.18.6.3. PictureSpatial [optional]...... 67 5.18.6.3. PictureSpatial [optional]...... 67 5.18.6.4. MainSound [optional]...... 68 5.18.6.4. MainSound [optional]...... 68 5.18.6.5. MainSubtitle [optional]...... 68 5.18.6.5. MainSubtitle [optional]...... 68 5.18.6.6. MainCaption [optional]...... 68 5.18.6.6. MainCaption [optional]...... 68 5.18.6.7. Extensions (New Asset Types)...... 68 5.18.6.7. Extensions (New Asset Types)...... 68 5.19. ASSET STRUCTURES...... 69 5.19. ASSET STRUCTURES...... 69 5.19.1. Asset Information...... 69 5.19.1. ASSET INFORMATION...... 69 5.19.2. GenericAssetType...... 70 5.19.2. GENERICASSETTYPE...... 70 5.19.2.1. Id...... 70 5.19.2.1. Id...... 70 5.19.2.2. AnnotationText [optional]...... 70 5.19.2.2. AnnotationText [optional]...... 70 5.19.2.3. EditRate...... 70 5.19.2.3. EditRate...... 70 5.19.2.4. IntrinsicDuration...... 70 5.19.2.4. IntrinsicDuration...... 70 5.19.2.5. EntryPoint [optional]...... 70 5.19.2.5. EntryPoint [optional]...... 70 5.19.2.6. Duration [optional]...... 71 5.19.2.6. Duration [optional]...... 71 5.19.3. TrackFileAssetType...... 71 5.19.3. TRACKFILEASSETTYPE...... 71 5.19.3.1. KeyId [optional]...... 71 5.19.3.1. KeyId [optional]...... 71 5.19.3.2. Hash [optional]...... 71 5.19.3.2. Hash [optional]...... 71 5.19.4. MarkerAssetType...... 72 5.19.4. MARKERASSETTYPE...... 72 5.19.4.1. MarkerList...... 72 5.19.4.1. MarkerList...... 72 5.19.4.2. Label...... 72 5.19.4.2. Label...... 72

15 Private Draft Document. Not for Publication

5.19.4.3. AnnotationText [optional]...... 73 5.19.4.3. AnnotationText [optional]...... 73 5.19.4.4. Offset...... 73 5.19.4.4. Offset...... 73 5.19.4.5. SegmentUUID...... 73 5.19.4.5. SegmentUUID...... 73 5.19.4.6. SegmentReplaceUUID [optional]...... 73 5.19.4.6. SegmentReplaceUUID [optional]...... 73 5.19.5. PictureTrackFileAssetType...... 73 5.19.5. PICTURETRACKFILEASSETTYPE...... 73 5.19.5.1. PicturePixelMatrix...... 74 5.19.5.1. PicturePixelMatrix...... 74 5.19.5.2. PictureBitDepth...... 74 5.19.5.2. PictureBitDepth...... 74 5.19.5.3. PictureColorEncoding...... 74 5.19.5.3. PictureColorEncoding...... 74 5.19.5.4. PictureFrameRate...... 74 5.19.5.4. PictureFrameRate...... 74 5.19.5.5. ImageAspectRatio...... 74 5.19.5.5. ImageAspectRatio...... 74 5.19.6. PictureSpatialMetadataTrackFileAssetType...... 74 5.19.6. PICTURESPATIALMETADATATRACKFILEASSETTYPE...... 74 5.19.6.1. StandardsBody [optional]...... 74 5.19.6.1. StandardsBody [optional]...... 74 5.19.6.2. Label [optional]...... 74 5.19.6.2. Label [optional]...... 74 5.19.7. SoundTrackFileAssetType...... 75 5.19.7. SOUNDTRACKFILEASSETTYPE...... 75 5.19.7.1. SoundSamplingRate...... 75 5.19.7.1. SoundSamplingRate...... 75 5.19.7.2. SoundBitDepth...... 75 5.19.7.2. SoundBitDepth...... 75 5.19.7.3. SoundChannelNumber...... 75 5.19.7.3. SoundChannelNumber...... 75 5.19.7.4. ChannelConfiguration...... 75 5.19.7.4. ChannelConfiguration...... 75 5.19.7.5. Label...... 75 5.19.7.5. Label...... 75 5.19.7.6. Language [optional]...... 76 5.19.7.6. Language [optional]...... 76 5.19.8. SubtitleTrackFileAssetType...... 77 5.19.8. SUBTITLETRACKFILEASSETTYPE...... 77 5.19.8.1. Language [optional]...... 77 5.19.8.1. Language [optional]...... 77 5.19.9. CaptionTrackFileAssetType...... 77 5.19.9. CAPTIONTRACKFILEASSETTYPE...... 77 5.19.9.1. Language [optional]...... 77 5.19.9.1. Language [optional]...... 77 6. OUTPUT PROFILE LIST...... 78

6.1. INTRODUCTION...... 78 6.1. INTRODUCTION...... 78 6.1.1. Output Profile List Definition...... 78 6.1.1. OUTPUT PROFILE LIST DEFINITION...... 78 6.1.2. OPL Scope...... 78 6.1.2. OPL SCOPE...... 78 6.1.3. Stereoscopic Support...... 78

16 Private Draft Document. Not for Publication

6.1.3. STEREOSCOPIC SUPPORT...... 78 6.1.4. OPL Examples...... 78 6.1.4. OPL EXAMPLES...... 78 6.1.4.1. Image Output Parameters...... 78 6.1.4.1. Image Output Parameters...... 78 6.1.4.2. Audio Output Parameters...... 78 6.1.4.2. Audio Output Parameters...... 78 6.1.4.3. Encoding Format...... 78 6.1.4.3. Encoding Format...... 78 6.1.4.4. Color Transformations...... 79 6.1.4.4. Color Transformations...... 79 6.1.4.5. “Pre-Processing” parameters...... 79 6.1.4.5. “Pre-Processing” parameters...... 79 6.1.5. OPL Relationship Overview...... 79 6.1.5. OPL RELATIONSHIP OVERVIEW...... 79 6.2. OUTPUT PROFILE LIST OVERVIEW...... 79 6.2. OUTPUT PROFILE LIST OVERVIEW...... 79 6.2.1. Functional Framework...... 79 6.2.1. FUNCTIONAL FRAMEWORK...... 79 6.2.2. Output Profile List Fundamental Requirements...... 80 6.2.2. OUTPUT PROFILE LIST FUNDAMENTAL REQUIREMENTS...... 80 6.2.2.1. Introduction...... 80 6.2.2.1. Introduction...... 80 6.2.2.2. Open Standard...... 80 6.2.2.2. Open Standard...... 80 6.2.2.3. Interoperable...... 80 6.2.2.3. Interoperable...... 80 6.2.2.4. Scalable...... 80 6.2.2.4. Scalable...... 80 6.2.2.5. Extensible...... 80 6.2.2.5. Extensible...... 80 6.2.2.6. Human Readable Metadata...... 80 6.2.2.6. Human Readable Metadata...... 80 6.2.3. Output Profile List Concepts...... 80 6.2.3. OUTPUT PROFILE LIST CONCEPTS...... 80 6.3. OUTPUT PROFILE LIST FORMAT...... 80 6.3. OUTPUT PROFILE LIST FORMAT...... 80 6.3.1. Introduction...... 80 6.3.1. INTRODUCTION...... 80 6.3.2. File Format...... 80 6.3.2. FILE FORMAT...... 80 6.3.3. ImageOutputFormat...... 80 6.3.3. IMAGEOUTPUTFORMAT...... 80 6.3.3.1. OutputPixelMatrix...... 80 6.3.3.1. OutputPixelMatrix...... 80 6.3.3.2. OutputColorEncoding...... 80 6.3.3.2. OutputColorEncoding...... 80 6.3.3.3. Stereoscopic Metadata Required Fields...... 80 6.3.3.3. Stereoscopic Metadata Required Fields...... 80 6.3.3.4. StandardsBody [optional]...... 81 6.3.3.4. StandardsBody [optional]...... 81 6.3.3.5. Label [optional]...... 81 6.3.3.5. Label [optional]...... 81 6.3.3.6. DisplayAspectRatio...... 81 6.3.3.6. DisplayAspectRatio...... 81 6.3.3.7. OutputPictureOverlay [optional]...... 81

17 Private Draft Document. Not for Publication

6.3.3.7. OutputPictureOverlay [optional]...... 81 6.3.3.8. TimeCodeWindow [optional]...... 81 6.3.3.8. TimeCodeWindow [optional]...... 81 6.3.3.9. Spoiler [optional]...... 81 6.3.3.9. Spoiler [optional]...... 81 6.3.4. OutputAudioFormat...... 81 6.3.4. OUTPUTAUDIOFORMAT...... 81 6.3.4.1. AudioCompressionStandard [optional]...... 81 6.3.4.1. AudioCompressionStandard [optional]...... 81 6.3.4.2. StandardsBody [optional]...... 81 6.3.4.2. StandardsBody [optional]...... 81 6.3.4.3. Label [optional]...... 81 6.3.4.3. Label [optional]...... 81 6.3.5. OutputSpeedOffset [optional]...... 82 6.3.5. OUTPUTSPEEDOFFSET [OPTIONAL]...... 82 6.3.6. Open/Discussion Items 82 6.3.6. OPEN/DISCUSSION ITEMS 82 7. PACKAGING...... 83

7.1. INTRODUCTION...... 83 7.1. INTRODUCTION...... 83

7.2. PACKAGING SYSTEM OVERVIEW...... 83 7.2. PACKAGING SYSTEM OVERVIEW...... 83 7.2.1. Functional Framework...... 83 7.2.1. FUNCTIONAL FRAMEWORK...... 83 7.2.2. Packaging Fundamental Requirements...... 83 7.2.2. PACKAGING FUNDAMENTAL REQUIREMENTS...... 83 7.2.2.1. Introduction...... 83 7.2.2.1. Introduction...... 83 7.2.2.2. Open Standard...... 83 7.2.2.2. Open Standard...... 83 7.2.2.3. Interoperable...... 83 7.2.2.3. Interoperable...... 83 7.2.2.4. Scalable...... 83 7.2.2.4. Scalable...... 83 7.2.2.5. Extensible...... 83 7.2.2.5. Extensible...... 83 7.2.2.6. Synchronization...... 83 7.2.2.6. Synchronization...... 83 7.2.2.7. Human Readable Metadata...... 84 7.2.2.7. Human Readable Metadata...... 84 7.2.3. Packaging Concepts...... 84 7.2.3. PACKAGING CONCEPTS...... 84 7.3. DISTRIBUTION PACKAGE...... 84 7.3. DISTRIBUTION PACKAGE...... 84 7.3.1. Introduction...... 84 7.3.1. INTRODUCTION...... 84 7.3.2. Distribution Package...... 84 7.3.2. DISTRIBUTION PACKAGE...... 84 7.3.2.1. General...... 84 7.3.2.1. General...... 84 7.3.2.2. Packing for Transport...... 84 7.3.2.2. Packing for Transport...... 84

18 Private Draft Document. Not for Publication

7.3.2.3. Flattened Packages...... 84 7.3.2.3. Flattened Packages...... 84 7.3.2.4. Security...... 84 7.3.2.4. Security...... 84 7.3.3. Packing List...... 84 7.3.3. PACKING LIST...... 84 7.3.3.1. File Format...... 84 7.3.3.1. File Format...... 84 7.3.3.2. Fields...... 84 7.3.3.2. Fields...... 84 7.3.4. Asset Map...... 85 7.3.4. ASSET MAP...... 85 7.3.4.1. File Format...... 85 7.3.4.1. File Format...... 85 7.3.4.2. Fields...... 85 7.3.4.2. Fields...... 85 8. ANNEX...... 86

8.1. EXAMPLE WORKFLOWS...... 86 8.1. EXAMPLE WORKFLOWS...... 86

8.2. EDITORIAL SYSTEMS...... 90 8.2. EDITORIAL SYSTEMS...... 90 8.2.1. Editorial System Fundamental Requirements...... 90 8.2.1. EDITORIAL SYSTEM FUNDAMENTAL REQUIREMENTS...... 90 8.2.1.1. Easy Assembly of Content...... 90 8.2.1.1. Easy Assembly of Content...... 90 8.2.1.2. Movement of Content...... 90 8.2.1.2. Movement of Content...... 90 8.2.1.3. Ease of Operation...... 90 8.2.1.3. Ease of Operation...... 90 8.2.1.4. Multiple Systems...... 90 8.2.1.4. Multiple Systems...... 90 8.3. PLAYOUT SYSTEMS...... 90 8.3. PLAYOUT SYSTEMS...... 90 8.3.1. Playout System Fundamental Requirements...... 90 8.3.1. PLAYOUT SYSTEM FUNDAMENTAL REQUIREMENTS...... 90 8.3.1.1. Easy Assembly of Content...... 91 8.3.1.1. Easy Assembly of Content...... 91 8.3.1.2. Movement of Content...... 91 8.3.1.2. Movement of Content...... 91 8.4. TRANSCODING SYSTEMS...... 91 8.4. TRANSCODING SYSTEMS...... 91 8.4.1. Transcoding System Fundamental Requirements...... 91 8.4.1. TRANSCODING SYSTEM FUNDAMENTAL REQUIREMENTS...... 91 8.4.1.1. Easy Assembly of Content...... 91 8.4.1.1. Easy Assembly of Content...... 91 8.4.1.2. Movement of Content...... 91 8.4.1.2. Movement of Content...... 91 8.5. ASSET MANAGEMENT INTERFACE...... 91 8.5. ASSET MANAGEMENT INTERFACE...... 91 8.5.1. Interface Fundamental Requirements...... 91 8.5.1. INTERFACE FUNDAMENTAL REQUIREMENTS...... 91 8.5.1.1. Introduction...... 91

19 Private Draft Document. Not for Publication

8.5.1.1. Introduction...... 91 8.5.1.2. API...... 91 8.5.1.2. API...... 91 8.5.1.3. Protocols...... 91 8.5.1.3. Protocols...... 91 8.5.2. Asset Management Interface Fundamental Concepts...... 91 8.5.2. ASSET MANAGEMENT INTERFACE FUNDAMENTAL CONCEPTS...... 91 8.5.3. Application Programming Interface...... 91 8.5.3. APPLICATION PROGRAMMING INTERFACE...... 91 8.5.4. Protocols...... 91 8.5.4. PROTOCOLS...... 91 9. GLOSSARY OF TERMS...... 92 10. COMPOSITION PLAY LIST (CPL) EXAMPLE...... 96 11. ANNEX - OUTPUT PROFILE LIST XML EXAMPLES...... 101

11.1. OPL OUTLINE IN PSEUDO-XML...... 101 11.1. OPL OUTLINE IN PSEUDO-XML...... 101

11.2. A SIMPLE OPL...... 102 11.2. A SIMPLE OPL...... 102

11.3. A COMPLEX (LEVEL1) OPL...... 103 11.3. A COMPLEX (LEVEL1) OPL...... 103

11.4. A COMPLEX (LEVEL2) OPL...... 104 11.4. A COMPLEX (LEVEL2) OPL...... 104

11.5. A PRE-PROCESSING SECTION FOR AN OPL...... 107 11.5. A PRE-PROCESSING SECTION FOR AN OPL...... 107 12. ANNEX - IDEAL IMPLEMENTATION...... 110

12.1. REAL-TIME PLAYBACK...... 110 12.1. REAL-TIME PLAYBACK...... 110

12.2. DECODE COMPRESSED IMAGE FASTER THAN REAL TIME...... 110 12.2. DECODE COMPRESSED IMAGE FASTER THAN REAL TIME...... 110

20 Private Draft Document. Not for Publication Table of Figures FIGURE 1 - EXAMPLE THEATRICAL WORKFLOW...... 25 FIGURE 2 - EXAMPLE IMF SYSTEM ARCHITECTURE...... 26 FIGURE 3 - EXAMPLE WRAPPING...... 28 FIGURE 4 - EXAMPLE IMF COMPOSITION...... 28 FIGURE 5 - EXAMPLE IMF HIERARCHICAL STRUCTURE...... 29 FIGURE 6 - EXAMPLE IMP...... 30 FIGURE 7 - EXAMPLE OF AN IMAGE CONTAINER VS. 2.39:1 ACTIVE IMAGE AREA...... 34 FIGURE 8 - EXAMPLE TRACK FILE STRUCTURE...... 51 FIGURE 9 - EXAMPLE OF KLV CODING...... 51 FIGURE 10 - COMPOSITION PLAYLIST SEQUENCE...... 59 FIGURE 11 - TIMING RELATIONSHIPS WITHIN A SEQUENCE...... 59 FIGURE 12 - SEQUENCE SYNC RELATIONSHIP EXAMPLE...... 59 FIGURE 13 - SEQUENCE STRUCTURE...... 66 FIGURE 14 - ASSET TIMING PARAMETERS...... 70 FIGURE 15 - OUTPUT PROFILE LIST RELATIONSHIP OVERVIEW...... 79 FIGURE 16 - FUTURE STATE - MASTERING & DISTRIBUTION SERVICING...... 88 FIGURE 17 - EXAMPLE IMF WORKFLOW...... 90

Table of Tables TABLE 1: IMF BASIC ELEMENTS...... 31 TABLE 2: REQUIRED NON-STANDARD RESOLUTIONS AND FRAME RATES...... 33 TABLE 3: IMAGE METADATA DATA ELEMENTS...... 37 TABLE 4: ALLOWABLE SAMPLES FOR SPECIFIED FRAME RATES*...... 39 TABLE 5 - AUDIO METADATA DATA ELEMENTS...... 42 TABLE 6 - PAN AND SCAN METADATA DATA ELEMENTS...... 46 TABLE 7: XML NAMESPACES...... 56 TABLE 8: TERMS AND DEFINITIONS...... 57 TABLE 9: EXAMPLES OF CONTENT KIND...... 61 TABLE 10: EXAMPLE RATINGS (INFORMATIVE)...... 64 TABLE 11: EXAMPLES OF MARKER LABELS...... 72 TABLE 12: EXAMPLES OF STANDARDS BODY LABELS...... 75 TABLE 13: EXAMPLES CHANNEL LABELS...... 76 TABLE 14: EXAMPLES OF LABEL ELEMENTS...... 81 TABLE 15 - ASSET MAP DATA FIELDS...... 85 TABLE 16: GLOSSARY OF TERMS...... 92

21 Private Draft Document. Not for Publication

1. IMF Introduction

1.1. Introduction With the advance of technology within the motion picture post-production industry, a paradigm shift is upon us as we move from the videotape workflow to the file based workflow. It is with this shift that a need has risen for a standardized set of specifications for this file-based workflow. In light of this shift, an organization was sought out by content creators to provide the facility for detailed discussions surrounding this topic. The Entertainment Technology Center stood out as a leading candidate for such discussions and in November of 2008 the Interoperable Master Format or IMF specification effort was initiated.

1.2. Scope The IMF file-based workflow is designed to replace the existing tape-based Distribution Servicing workflow. It will store one master set of file based elements to be assembled for any downstream distribution using multiple Composition Play Lists (Recipes), similar to what is used in present day Digital Cinema Packaging (DCP). The broad concept of a high quality, uniform IMF should lower costs, improve time-to-market, and increase interoperability of existing production processes and needs. This specification is intended to promote improved interoperability and faster implementation of multiple variants on a common video package, while retaining highest quality and appropriate security of the material. It is envisioned that this Final Specification would then be presented to SMPTE (Society of Motion Picture and Television Engineers) to create standards, recommended practices and engineering guidelines to implement this specification throughout the industry as a common interchange method.

1.3. System Objectives At the onset of writing a specification for an Interoperable Master Format, ETC acknowledged certain fundamental requirements, which are: The Interoperable Master Format (IMF) shall provide for a single set of master files and recipes to allow for easy creation of versions for distribution channels. This should also provide the potential to repurpose existing content. The IMF shall have the capability to present Essence and Metadata that is equal to or better than what one could achieve with current practice (i.e. Videotape). The IMF shall provide mechanisms (i.e. asset management, packing lists) for the exchange of IMF metadata or packages between facilities. This system should be based upon international standards so that content can be interchanged anywhere in the world as can be done today with videotape. These standards should be open published industry standards that are widely accepted and codified by regional and international standards bodies such as: ANSI, SMPTE, ITU, W3C, and ISO/IEC. The system specification and formats should be chosen so that the capital equipment and operational costs are reasonable and exploit, as much as possible, the economies of scale associated with equipment and technology in use in other industries. The hardware and software used in the system should be easily upgraded as advances in technology are made. Upgrades to the format shall be designed in a way so that content may be distributed and compatibly exchanged on the latest IMF-compliant hardware and software, as well as earlier adopted IMF-compliant equipment installations. The Interoperable Master Format shall be based upon a component architecture (e.g., Mastering, Compression, Encryption, Distribution, Storage, Playback) that allows for the components to be replaced or upgraded in the future without the replacement of the complete system. It is the intention of this Digital Video specification to allow for advances in technology and the economics of technology advancement. The document specifies a baseline for the implementation of an Interoperable Master Format. The goal of backwards compatibility in this context is to allow, for example, new content at higher resolution and color space to be understood by the baseline implementation.

22 Private Draft Document. Not for Publication

The Interoperable Master Format shall provide a reliability and availability that is equal to, or better than, current practice. (i.e. videotape) The Interoperable Master Format shall allow for the use of a security method. The system should provide a means to allow the content to use standardized encryption with private/public keys. The IMF should also allow the use of forensic marking of the content for providing traceable forensic evidence in the case of theft.

1.4. Document Language This document consists of normative text and optional informative text. Normative text is text that describes the elements of the design that are indispensable or contains the conformance language keywords: “shall”, “should” or “may”. Informative text is text that is potentially helpful to the user, but not indispensable and can be removed, changed or added editorially without affecting interoperability. Informative text does not contain any conformance keywords. All text in the document is, by default, normative except: any section titled “Introduction”, any section explicitly labeled as “Informative”, or individual paragraphs that start with the word “Note.” Normative references are those external documents referenced in normative text and are indispensable to the user. Informative, or bibliographic, references are those references made from informative text or are otherwise not indispensable to the user. The keywords “shall” and “shall not” indicate requirements that must be strictly followed in order to conform to the document and from which no deviation is permitted. The keywords “should” and “should not” indicate that among several possibilities one is recommended as particularly suitable, without mentioning or excluding others; or that a certain course of action is preferred but not necessarily required. In the negative form, a certain possibility or course of action is deprecated but not prohibited. The keywords “may” and “need not” indicate a course of action permissible within the limits of the document. The keyword “reserved” indicates that a condition is not defined and shall have no meaning. However, it may be defined in the future. The keyword “forbidden” is the same as reserved, except that the condition shall never be defined in the future. A compliant implementation is one that includes all mandatory provisions (“shall”) and, if implemented, all recommended provisions (“should”) as described. A compliant implementation need not implement optional provisions (“may”). Requirements are indicated with the key phrases “is required to”, “is encouraged to” and “can” which represent “shall,” “should” and “may” (had the text been in a separate requirements document). This is necessary in order to distinguish requirements from the specification conformance language. Sentences with the following keywords are italics: shall, shall not, should not, is required, is not required, is not encourage and is encouraged. The names of standards publications and protocols are placed in [bracketed text]. International and industry standards contain provisions, which, through reference in this text, constitute provisions of this specification. At the time of publication, the editions indicated were valid. These referenced standards are subject to revision, and parties to agreements based upon this specification are encouraged to investigate the possibility of applying the most recent editions of the referenced standards. Section 9 is a glossary of technical terms and acronyms used throughout this specification. The reader is encouraged to refer to the glossary for any unfamiliar terms and acronyms. Trademarked names are the property of their respective owners.

23 Private Draft Document. Not for Publication

2. SYSTEM OVERVIEW

1.5. Functional Framework The document defines technical specifications and requirements for the creation and distribution of Interoperable Master Format. The details are in the following sections: Essence: This section provides specifications for the image and audio essence and its specific inherent metadata. The image essence specification defines a common set of image structures by specifying parameters such as image container, colorimetry and, if required, a mezzanine compression structure for the Interoperable Master Format (IMF). The Audio Essence specifies audio characteristics such as; bit depth, sample rate, minimum channel count, channel mapping and reference levels. Data Essence: This section provides specifications for the subtitle (Timed Text and sub pictures) and captions data essence. The Subtitles Essence specifies the format of a Digital Video subtitle track file. A subtitle file contains a set of instructions for placing rendered text or graphical overlays at precise locations on distinct groups of motion picture frames. A caption file may provide graphical overlays or provide graphical information to a secondary system for display of text. Outside of providing to a secondary system most other parameters are the same as subtitle data essence. Dynamic Metadata: This section provides the specifications for the Metadata that is outside of the associated metadata of the essence or data essence. This would be metadata such as; time code, pan and scan, and color transforms. Wrapping: This section defines the requirements for wrapping the content and metadata (image, audio and subtitle) files using (where possible) existing Material eXchange Format (MXF) specifications. The output of this process is the Track Files. This section also defines the requirements for encrypting the essence (sound, picture and subtitles) of the IMF if required Composition: This section provides the specifications for the creation of XML using Composition PlayLists (CPL), which are scripts that link the IMF Track Files together into synchronized pieces of content. This section will also touch upon the security requirements for a composition and Composition PlayList. Packaging: Interoperable Master Format Compositions can be Packaged for distribution which is detailed in this section. Provides requirements for all the tools necessary for editorial functions in a typical post-production environment. Annex: This section contains example workflows using the IMF as well as definitions and suggested devices for editorial systems, transcoding and playout systems.

24 Private Draft Document. Not for Publication

1.6. Overview The goal of the IMF project is to establish an integrated hardware and software solution for the processing, storage, and management of digital content intended for downstream distribution to business that will consume such content.

1.6.1. IMF System Architecture Figure 1, below, demonstrates an example of the end-to-end data flow from Production to Distribution. The focus of IMF is within the Distribution Servicing phase.

Distribution Mechanisms

Figure 1 - Example Theatrical Workflow

To further demonstrate the concept of the Interoperable Master Format in a system architecture for distribution servicing refer to Figure 2, below. The IMF aggregates image, audio, and other files to support processing for any downstream delivery requirement. It does this using Composition Play Lists to create the deliverables using transcoding technologies. The IMF package will allow for delivery using the image elements stored in either a compressed or uncompressed file. An example of this would be an HD resolution with 709 color space at 24 FPS converted to SD resolution with 601 color space at 29.97 FPS.

25 Private Draft Document. Not for Publication

Figure 2 - Example IMF System Architecture

1.6.2. Major System Key Concepts The following listed concepts below are the Key Concepts of the IMF as a system. These concepts are derived from work done by content creators to create masters for the downstream distribution servicing of theatrical content. This is not to say that these concepts or the IMF cannot be applied to other forms of content. Instead it is noted here so that the reader may better understand its origin.

1.6.2.1. Business to Business Solution The IMF is intended to facilitate internal or a business to business relationship and is not intended to deliver content to the consumer.

1.6.2.2. Digital Source Master (DSM) The IMF is derived from a wide range of sources with a wide range of technical levels. One example is content in its finished state at the end of the theatrical post-production process which is shown above, in Figure 2, as a Digital Intermediate, or also known as a Digital Source Master (DSM). The DSM can also be used to convert to a film duplication master and/or a master for archival purposes. It is not the intention of this document to, in any way, specify the DSM. This is left to the discretion of the content provider.

1.6.2.3. Archive not in Scope It is also not the intention of this specification to design the IMF as an archiveable master. One could conceive this as another use of the IMF, however this is not part of the scope of this specification.

26 Private Draft Document. Not for Publication

1.6.2.4. File / Frame-Based System The IMF is built upon a data file-based design, i.e., all of the content is made up of data stored in files. These files are organized around the image frames, which means that all of the synchronization references to the image frames and frame rate. The file is the most basic component of the system.

1.6.2.5. Essence and Data Essence The raw image and audio files of the IMF make up what is known as Essence. Also included as part of the IMF there may be subtitle or caption essence. This type of essence is called Data Essence and because of its nature it may require different specifications and hence is described in its own section.

1.6.2.6. Metadata Metadata is essentially data about data. There are many examples of metadata: • Supporting metadata • Descriptive metadata • Dynamic metadata

Supporting and Descriptive metadata are defined by SMPTE. A new concept proposed for the IMF is Dynamic metadata. This metadata is metadata that changes on a frame basis such as; time code, pan and scan, and color transforms . This metadata may or may not be applied to the underlying essence. It must be synchronized to the essence and therefore with these requirements will need to be wrapped and contained as Track Files.

1.6.2.7. Security The IMF should not preclude the use of encryption or the use of forensic marks. This may occur at the wrapping, composition or packaging stage.

1.6.2.8. Wrapping and Track File Creation Wrapping is the process of encapsulating the essence, data essence and dynamic metadata files into well understood temporal units, called Track Files, using a standardized wrapping method, such as MXF. Figure 3, below, shows a basic example of this process of wrapping Essence and Metadata into Track Files. This process includes a way to uniquely identify each Track File along with its associated essence and metadata. It also provides a method to identify synchronization locations within each of the Track Files.

27 Private Draft Document. Not for Publication

Figure 3 - Example Wrapping

1.6.2.9. Composition A Composition represents a complete set of files that may be a feature, an episode, a trailer, an advertisement or any other single piece of content. A composition minimally consists of a Composition PlayList (CPL) and one or more Track Files. Composition PlayLists (CPL’s) are XML structured textual lists that define how elements of an IMF are to be assembled and either transcoded to another format or played out as a presentation.

Figure 4 - Example IMF Composition

1.6.2.10. Versions One of the main reasons for having an IMF is the ability to create many versions of a program without duplicating the common essence used for each version. Instead, one Composition PlayList (CPL) is required for each version of the program, which is much smaller in size as compared to the Track Files themselves.

1.6.2.11. Sequence Within the Composition PlayList (CPL) one may create sequences where the Track Files are nested within these sequences. The CPL is organized in such a way that, finding and possibly

28 Private Draft Document. Not for Publication replacing sequences becomes more efficient depending on the number of Track Files. This configuration was selected to give the content creator more choices for the structure of the Composition PlayList. In the IMF, a sequence represents a conceptual period of time having a specific duration chosen by the content provider. For example, a Sequence maybe the same length as a reel or it may be the running time of a television program between commercials. Again this length is determined by the content provider to fit both the particular workflow and content type. Once created, Sequences can then be electronically spliced together within the CPL to create a complete presentation. An example below (see Figure 5) shows the hierarchical structure of the CPL. A sequence may have one or more Track Files nested within it. The IMF allows cuts only to occur between Track Files as well as allowing a minimum duration of a single image frame/field. Sequences on the other hand only allow a minimum duration of one second.

Figure 5 - Example IMF Hierarchical Structure

1.6.2.12. Output Profile List Once versions of a Composition are created, many different distribution formats may be required to be made from that particular version. In order to assist with the automation of this process, one shall use a method to define the transcoder or play out device’s output. The method of communicating the desired output is called, the Output Profile List or OPL. The OPL is a textual list that contains the specification of the output. It is linked to the CPL by using the UUID of the CPL. This allows one to link many OPL’s to a single CPL and automate the transcoding of the Composition into multiple distribution formats.

1.6.2.13. Packages An Interoperable Master Package (IMP) shall minimally consist of a Composition, an Output Profile List, an Asset Map and Packing List. An Asset Map is a text document that describes how the files are distributed across the physical media that is used for transport. A Packing List is a

29 Private Draft Document. Not for Publication text document that provides a list of all of the files included in that specific IMP. An example of an IMP is shown below, in Figure 6.

Figure 6 - Example IMP

30 Private Draft Document. Not for Publication 1.7. IMF Elements and Processes The following table provides a list of the identified IMF elements and processes. The table also provides the category they belong to and where you can find more detail in the following sections.

Table 1: IMF Basic Elements

Element Category Image Essence Audio Essence Primary Display Subtitles / Captions Data Essence Composition PlayList (CPL) Files Dynamic Metadata Output Profile List (OPL) Supporting Metadata Packing List Supporting Metadata Asset Map Supporting Metadata Timecode Dynamic Metadata Secondary Display Captions/Subtitles Data Essence QC / Picture Reports and Fact Sheets Descriptive Metadata Forensic Marking Data Essence Pan and Scan (Aspect Ratio Conversion) Dynamic Metadata Time Compression/ Expansion Dynamic Metadata Color Transform Dynamic Metadata HI (Hearing Impaired) Essence VI (Visually Impaired) Essence

31 Private Draft Document. Not for Publication

2. Essence

2.1. Overview

2.1.1. Introduction Essence files are the core files that contain the image and audio data. One of the goals of the IMF is to allow for a variety of these files to be included into the IMF. For example the IMF should support multiple image resolutions (ex. 1920x1080 and 1280x720). This is just one of many different variables that may be supported by the IMF for multiple Essence types.

2.1.2. Essence System Overview For the purpose of documenting the specific requirements and specifications for Essence, it is helpful to divide the system into a set of components. The specifications and requirements for each of these components will be described in the following sections: • Image – The image specification and file format o Compression Requirements o Structural Metadata • Audio – The audio specification and file format o Structural Metadata

2.2. Essence Fundamental Requirements

2.2.1. Common Essence File Formats The Essence file types shall use a common standardized file format for each element (image and audio). The image essence file format shall be a standard-conformant file based on existing ISO or SMPTE standards. The audio track file format shall be based on Recommendation ITU-R BR.1352-3 (2007) version of Broadcast Wave. Essence file formats shall also use standardized structure metadata to optimize interchange. The Subtitle essence should be based on PNG and XML file formats. An MXF-conformant file, based on existing SMPTE standards, shall be used for wrapping each type of element (Image, Audio, Time Code, Aux Data, etc.)

2.2.2. Frame Rates and Synchronization The frame rate of any individual IMF source master shall remain constant. Metadata shall be carried in the image and audio data file format to indicate the frame rate.

Essence files shall carry information to provide for frame-based synchronization between essence files from the same individual IMF source master - at a minimum, they shall include a start of file and a continuous frame count. The Image Essence shall be the master reference for synchronization with a frame based precision. Granularity is frame based. Raster Format (Interlaced or Progressive) and Frame Rate are dependent on each other and to some extent, on the resolution of the image. The IMF shall support the same Raster Formats and Frame Rates that are currently used in mastering today including those listed in the following SMPTE specifications: • SMPTE 274M • SMPTE 296M • SMPTE 259M

32 Private Draft Document. Not for Publication

Because the file-based world affords some extra flexibility in regards to the mixing of resolution, frame rate and raster formats, particularly with progressive frame rate material, the IMF shall support the resolutions and frame rates listed in Table 2: Required Non-Standard Resolutions and Frame Rates, in addition to the above SMPTE standards.

Table 2: Required Non-Standard Resolutions and Frame Rates

System Nomenclature Luma or R’G’B’ samples Active lines per frame Frame Rate (Hz) per active line (S/AL) (AL/F) 720x480/23.98/P 720 480 24/1.001 = 23.976 720x486/23.98/P 720 480 24/1.001 = 23.976 720x576/25/P 720 576 25 Active Image Only Any up to and including Any up to and including 23.976p, 24p, 25p, 29.97p, 1920 1080 30p, 50i, 50p, 59.94i, 59.94p, 60p

In addition to the required resolutions, frame rates and raster formats, the IMF should support up to 24 frames per second for horizontal resolutions greater than 1920 pixels and up to 4096 pixels. There may need to be additional metadata defined for Interlaced Raster Formats. Because this is a field rate, additional data elements including field order (bottom or top field first) and possibly repeated fields for content with 3:2 pulldown may be needed to thoroughly describe the Interlaced content. The image and audio formats shall support the following frame rates, measured in frames per second, for non-stereoscopic content. The letter following the rate indicates the type of scan, p = progressive scan, i = interlaced. Frames rates for stereoscopic content shall be these frame rates multiplied by two: The image structure shall support a frame rate of 24p and 23.976p, and the image essence structure may support a frame rate of 59.94i and 59.94p.

33 Private Draft Document. Not for Publication

2.3. Image Specification

2.3.1. Image Structure

2.3.1.1. Image Container, Active Image, Pixel Aspect Ratio The Image Container is defined as the full canvas of the image area, parts of which are not necessarily meant to be seen. Image Container is defined here as a rectangular array of pixels that contains the maximum possible image area in a given format. In contrast, the Active Image area is a subset of the Image Container and is defined as the area that contains any part of the image that is meant to be seen. Active Image contains the active image content only and shall not contain letterboxing mattes or side mattes. The Active Image area shall not exceed the size of the Image Container. Because the Active Image area may be smaller than the Image Container, the Active Image area needs to be defined by a horizontal and vertical position within the Image Container. In order to describe the location and size of the Active Image area with the least amount of information, the Active Image area shall be expressed as two x/y coordinate values: Top Left Coordinate and Bottom Right Coordinate. For example, the Active Image of content that is shown in the 2.39:1 aspect ratio in the highest resolution of HD (Active Image Width of 1920 pixels) would have an Active Vertical size of 803 (1920/2.39). Because the vertical size is an odd number, the placement of the Active Image cannot be centered within the 1920x1080 Image Container; however, by using coordinates to describe the position, the exact location within the Image Container can be made, as shown in Figure 7. The Top Left of the Image Container is always (0,0). Spec shall allow active pixels as well as a full container for the intended resolution, up to a maximum of 4096 horizontal pixels and 3112 vertical pixels, not to exceed the image container.

Figure 7 - Example of an Image Container vs. 2.39:1 Active Image Area

The IMF Image Container is expressed in the metadata (see section 5.5.3). Note: The IMF does not specify any hard constraints for horizontal or vertical resolution.

34 Private Draft Document. Not for Publication

2.3.1.2. Pixel Aspect Ratio

The Pixel Aspect Ratio (PAR) defines the shape of the pixel used by the image in a ratio of width versus height. Square PAR is the same as a ratio of 1:1. Depending on the standard used, the PAR can differ – examples include the non-square PAR for NTSC and PAL. Both the Image Container and the Active Image shall share the same Pixel Aspect Ratio.

2.3.1.3. Bit Depth In regards to Bit Depth, the IMF shall support 8-bit and 10-bit image content and should support 12-bit and 16-bit content. At the time of this writing, 8-bit and 10-bit content are used considerably more than the higher bit-depths, so the IMF must support 8-bit and 10-bit in order to match current mastering standards.

2.3.1.4. Chroma Subsampling The IMF shall support 4:4:4 (no chroma subsampling) and 4:2:2 chroma subsampling types. Current mastering standards use both of these ratios; therefore the IMF must include the both of these in the format.

2.3.1.5. Color Space, Color Space Gamuts The format shall include the Color Space and Color Space Gamuts that are currently used in mastering, which include: • R’G’B’ ITU-R BT.709

• Y’CbCr ITU-R BT.709

• Y’CbCr ITU-R BT.601

The IMF should support: • XYZ standard CIE 15:2004, Colorimetry

2.3.1.6. Stereoscopic Content The IMF shall support stereoscopic content. The image essence parameters for each eye shall match the parameters for monoscopic image essence in the IMF. For stereoscopic content, the left eye content shall be encapsulated in a single track file, and the right eye content shall be encapsulated in a single and separate track file.

2.3.2. Compression Requirements Image Compression for Digital Video may use data reduction techniques to decrease the size of the data for economical delivery and storage. The system uses perceptual coding techniques to achieve an image compression that is visually lossless. It is important to note that image compression is typically used to ensure meeting transmission bandwidth or media storage limitations. This results in image quality being dependent on scene content and delivered bit rate. Digital Video image compression is much less dependent upon bandwidth or storage requirements, thereby making bit rate dependent on desired image quality rather than the reverse. Uncompressed, lossless and lossy compression schemes shall be included in the IMF.

2.3.2.1. Image Compression Requirements The following section defines the requirements for an Image Compression format and structure. The Uncompressed, Lossless and Lossy compression schemes shall meet the following requirements: • Shall be an Industry Standard (i.e. SMPTE, ITU, etc.) • Shall be License-Free • Shall support Intra-Frame

35 Private Draft Document. Not for Publication

In addition to the above requirements, Lossless and Lossy compression schemes shall meet the following requirements: • Shall support Variable and Constant Bit Rates • Shall support Spatial Resolution Layers • Shall support Multiple Quality Layers The compression scheme shall use documented industry standards in order to ensure consistent interoperability between system implementations and to prevent conflicts with intellectual property. In order to allow for the maximum number of participants to adopt the IMF, the compression scheme shall be license-free.

3.3.2.1.1 Intra-Frame Compression Intra-Frame compression is an important part of the image requirements because it allows for each image frame to be self-contained and edits to occur in one frame increments without any additional decoding being necessary. Both variable and constant bit rates may be used in the compression scheme in order to allow the content owner to balance quality of the image versus overall size.

The compression scheme shall allow for different spatial resolution and quality layers within the same frame. Note: This allows one set of files to contain multiple resolutions of the image at various image quality levels. For example, the compressed HD image with a resolution of 1920x1080 shall allow extractions of lesser resolutions such as 960x540 (half of the original width and height) and 480x270 (one-quarter of the original width and height), without having to decode the full image resolution and then scale the image width and height to the smaller resolutions. Similarly, the use of multiple quality layers allows decoding only a portion of the bits representing the full-quality compressed image. For example, three quality-layers may be created by the encoder, corresponding to 25, 100 and 250 megabits/sec respectively. In certain situations, the decoder may choose to only decode one or two of the three quality layers resulting in a 25 megabit/sec or 100 megabit/sec decoded stream. This method of extraction allows for smaller proxy versions of the files to be used without having to generate a separate file for a system that requires a smaller resolution or image quality level. Smaller proxies are used in many situations including editing and as reference files for audio conforming and subtitling creation. Decoding a subset of the quality layers that are available in the encoded file will lower the complexity of the decoding process for proxy use-cases in which a lower image quality is acceptable, for audio conforming or subtitling creation, for example.

3.3.2.1.2 Image Compression Codecs The IMF shall support the JPEG200 Part 1 (ISO.IEC 15444-1) codec. A later version of IMF may support another codec, if it meets the above image and compression requirements.

2.3.2.2. Uncompressed Formats Uncompressed formats are supported in IMF, but must be able to be wrapped into MXF. The IMF shall support the following uncompressed formats: • Generic Container (SMPTE 384M-2005) • DPX (SMPTE 268M-2003) Note: At the time of this writing, the Generic Container and DPX meet the uncompressed image requirements.

36 Private Draft Document. Not for Publication

2.3.3. Structural Metadata

2.3.3.1. Image Metadata Required Fields Specific image metadata shall be required for each image track that describes the native parameters of the image content, in order to allow for proper interchange between different implementation systems. At the time of this writing, however, the actual metadata fields cannot be defined. Instead, the data elements shown in Table 3 shall be the minimum amount of information supported by the IMF for image. These data elements shall be converted into specific metadata fields once the specification is complete.

2.3.3.2. Stereoscopic Metadata Required Fields Specific metadata shall be required for proper identification of stereoscopic content. At the time of this writing, however, the actual metadata fields cannot be defined. Instead, the data elements shown in Table 3, as Stereoscopic, shall be the minimum amount of information supported by the IMF for stereoscopic content. These data elements shall be converted into specific metadata fields once the specification is complete.

Table 3: Image Metadata Data Elements

Data Element Data Element Definition Examples Stereoscopic Active Image End End of Active Image within Image Container Varies – examples Coordinate (x,y) area expressed in an x,y coordinate value that include (1919,1079) is placed in relation to the Image Container and (1679, 1079) Top Left Coordinate of (0,0) Active Image Top Start of Active Image within an Image Varies – examples Left Start Container area expressed in an (x,y) include (0,0) and Coordinate (x,y) coordinate value that is placed in relation to the (0,239) Image Container Top Left Coordinate of (0,0) Code Value Range Range used to represent zero black and 100% Full-Range (0-1023), white Limited-Range (64- 940) CODEC Coder Decoder for a digital stream of data JPEG 2000 Part 1 Color Channel Bit Number of bits used to represent the digital 8, 10, 12, 16 Depth image data

Color Encoding Type of color model used for the image RGB, YCbCr, XYZ expressed in a set of components Color Encoding Number of samples used per color space 4:4:4, 4:2:2 Ratio component expressed in a ratio Color Primaries xy chroma coordinates of the tri-stimulus Rxy,Gxy,Bxy values Coordinate Origin Coordinates that define the upper left corner of Top-left, bottom-right the Active Image within the Image Container Frame Rate Rate that each image is shown; used in Frame Rates are conjunction with Raster Format to express specified in the fields/second or frames/second Essence section, Ref: 2.2.2 HANC Horizontal Ancillary Data - Ancillary packets Embedded Audio located in the horizontal blanking region

37 Private Draft Document. Not for Publication

Data Element Data Element Definition Examples Stereoscopic Image Container Total number of horizontal pixels used for the 4096, 2048, 1920, Horizontal Pixels Image Container 720, could be many Image Container Total number of vertical pixels used for the 3112, 2160, 1556, Vertical Pixels Image Container 1080, 576, 486, 480, could be many Mastering Optional metadata item that provides the 14fL, 35fL Luminance reference luminance value used in the (Optional) mastering environment Mono Playback Enables or disables the ability to playback one Allowed, Not Allowed eye out of a stereoscopic pair - also gives control to the content owner to not allow one eye to be played out Pixel Aspect Ratio Shape of the pixel expressed in a ratio of width 1:1, could be any divided by height of the pixel ratio Raster Format Interlaced (fields/second) or Progressive Interlaced, (frames/second) Progressive Stereoscopic ID Identifies the Track as either a monoscopic or Mono, Stereo_Left, Stereoscopic stereoscopic Track and if stereoscopic, Stereo_Right identifies which Track is which eye Transfer Function Relationship of code value to brightness value Linear, log, power function VANC Vertical Ancillary Data - Ancillary packets Closed Caption data located in the vertical blanking region and VPID Floating Window Stereoscopic

Mastering Screen Stereoscopic Size

Best Eye If stereoscopic tracks may be viewed as a Right, Left Stereoscopic single track, this identifies the best eye track (right or left) to use

2.4. Audio Specification

2.4.1. Audio Structure

2.4.1.1. Audio File Format Audio Track Files represent audio within the IMF. • An audio track file shall be a single, complete audio element, which may be any soundfield configuration. • An audio track file would typically be wrapped or interleaved into a single file, and shall be limited to a single audio element (such as a composite mix, or dialog track) and a single soundfield configuration (such as 5.1 or mono) per audio track file. • Audio track files shall be further constrained per sections 3.3.1.6 through 3.3.1.x • The format of an audio track file shall be Interleaved Broadcast Wave.

38 Private Draft Document. Not for Publication

• Data rate coded audio shall not be used (e.g. AC3, DTS-MA, Dolby E). • Matrix encoded audio (e.g. Lt-Rt, Dolby EX) may be used.

Note: discrete inputs to IMF authoring (informative): Broadcast Wave, Wave, PCM

2.4.1.2. Sampling Rate Sampling rates shall include 48k, 96k, 47.952k, and 95.9k.

2.4.1.3. Frame Rate/Audio Speed • Native 24, 25, 30, 50 and 60 are equivalent audio speeds, as the image being represented is the same. • Pull down 23.976, 29.97 and 59.94 are equivalent audio speeds, as the image being represented is the same.

2.4.1.4. Allowable Samples per Frame Allowable samples per frame shall be based on frame rate, according to the following table. The required sample (word) clocking rate shall sync to the frame speed of the associated image.

Table 4: Allowable Samples for Specified Frame Rates*

Samples/Frame Samples/Frame Samples/Frame Samples/Frame Frame Rate @ 47.952kHz @48kHz @95.9kHz @96kHz 23.976 2000 2002 4000 4004

24 n/a 2000 n/a 4000

25 n/a 1920 n/a 3840

29.97 1600 1601.61 3200 3203.21

30 n/a 1600 n/a 3200

50 n/a 960 n/a 1920

59.94 800 800.81 1600 1601.61

60 n/a 800 n/a 1600

48 n/a 1000 n/a 2000

• Not to be confused with stereoscopic frame rates (see 2.2.2, above). • 1 – If the frame rate is not an integer multiple of the audio sample rate, then the number of samples in each frame length shall vary such that the correct aggregate number of samples are maintained per the coding theory delineated in SMPTE 382M, Section 6.2. In practice this is aggregated over a 5 frame period. Therefore, for these cases, the audio editorial granularity is restricted to once every 5 frames. See CPL requirement Ref: Error: Reference source not found.

2.4.1.5. Audio Bit Depth The audio bit depth shall be 24 bits. 16 and 20 bit are not allowed - they shall be padded in the least significant bits to create 24 bit.

39 Private Draft Document. Not for Publication

Note: Metadata should be included to reflect the original bit depth, e.g. “24 bit padded from the original 16 bit source.”

2.4.1.6. Audio Track File Content

2.4.1.7. Audio Track File Content Constraint There shall be one audio element per track file. Example audio elements are listed in section 2.4.1.14.

2.4.1.8. Audio Track File Language Constraint Only one audio language shall be used in an audio track file.

2.4.1.9. Track File Loudspeaker Channel Content Constraint: A given loudspeaker channel shall only be represented once per track file, i.e. within a track file containing multiple audio channels, there shall be a one-to-one relationship between loudspeakers and audio channels. There shall not be multiple instances of a loudspeaker channel within a track file.

2.4.1.10. Channels Per Track File Constraint An audio track file shall not exceed 16 audio channels.

2.4.1.11. Simultaneous Multiple Audio Track File Availability and Playout The CPL in an IMF shall be capable of pointing to multiple audio track files, which are simultaneously available for use by a transcoder or real-time playout device. IMF shall support simultaneous multiple audio track file playout, as multiple audio elements must be available to play out at once.

2.4.1.12. Multiple Audio Track File routing When accessing multiple track files, the track file individual channels shall be combined (like channel to like channel) for output routing purposes on a unity gain basis. For example, the left channel of audio track file A and the left channel of audio track file B are combined at unity gain as the overall left channel content of the IMF. Note: Track file channels shall not be combined at other than unity gain. Note: Different loudspeaker channels shall not be combined, e.g. Left and Right are never combined to make a mono output.

2.4.1.13. Mixing of audio channels • Audio individual channels within a track file shall not be mixed. • Audio channels between multiple track files shall not be mixed, other than the routing spec stated above. • Different channels shall not be combined (downmixed) to make a narrower soundfield configuration.

40 Private Draft Document. Not for Publication

2.4.1.14. Audio Elements Examples • There shall be only one audio element per track file. • Multiple audio elements shall not be combined in a single audio track file. The following audio elements are typical of what may be carried in the IMF. Other audio elements may be carried in an IMF at the content provider’s discretion. • Printmaster (reels) • Composite Mix (full length or parts) • Music+Effects (one track, two tracks or full multitrack) • M+E optional material (one track, two tracks or full multitrack) • Narration • VI (Visually Impaired) • SAP • Dialog (one track, two tracks, or full multitrack) Often, this is part of a “DME split track” • Music (one track, two tracks, or full multitrack) Often, this is part of a “DME split track” • Effects (one track, two tracks, or full multitrack) Often, this is part of a “DME split track”

41 Private Draft Document. Not for Publication

2.4.1.15. Soundfield Configurations There shall be only one soundfield configuration per track file. Multiple soundfield configurations shall not be combined in a single audio track file. The following soundfield configurations shall be supported in IMF V.1. Soundfield configurations not specified in this section shall not be supported. • 5.1 • 5.0 • L/R (Lo-Ro) “Standard Stereo” • Lt-Rt • Mono • LCR • LCRS • 5.1EX • 6.1 (discrete) • 6.0 (discrete) • 7.1 (L, C, R, Ls, Rs, Rls, Rrs, Sub)

Note: future soundfield configurations must be defined and standardized in order to be supported. The soundfield configuration shall be described in the metadata that is carried with the audio track file. A registry of soundfield configurations is in process in SMPTE 30MR and shall be used for this purpose.

2.4.1.16. Channel layout and mapping Note: SMPTE 428-3 is not applicable to IMF and should not be referenced. 428-3 is specific to digital cinema, is an early document, and will be revised in the future. In addition, SMPTE 31FS- 10 is working on defining channel labels using (future) registered UL’s, and this will play a part in all multichannel audio specs once ratified. IMF shall have a restricted pattern-based layout, such that the audio channel layout is always in the same order for a given soundfield configuration. For example, 5.1 (6 channel) would always be L, R, C, Sub, Ls, Rs). Note: These layouts would need to be defined, possibly in an annex of the IMF spec.

2.4.2. Audio Metadata Data Elements Audio metadata data elements are given in the following table.

Table 5 - Audio Metadata Data Elements

Audio Data Element Data Element Definition Examples Essence Audio File Type Audio file type of the essence BWF, LPCM contained in the track file

Track Audio Type The type of wrapper that is MXF or BWF Interleave conveying the essence

Sample Rate Number of audio samples in a 48K, 96K second

42 Private Draft Document. Not for Publication

Audio Data Element Data Element Definition Examples Frame Rate The audio speed as expressed by the frame rate of the associated picture file

Samples/Frame The number of audio samples in the 2000, 2002, 4000, 4004 duration of one frame of the associated picture file

Bit Depth The number of bits contained in an 24 bit audio word

Language The spoken language of the audio English, Italian essence

Audio Element Type The type of audio contained in the Partial DME split track, printmaster, audio essence music and effects

Audio Content An additional modifier to describe Dialog, Visually Impaired, SAP the content of the audio contained in the audio essence

Associated Audio Track Files Y/N Indicates that this track file is part of Yes or No a group of track files that are to be played out simultaneously

Associated Track File Information Music, Effects

Soundfield Configuration 5.11

Channel Layout L, R, C, Sub, Ls, Rs1

Native Speed Y/N If no, look to speed/processing and Yes or No pitch correction fields

Speed/Processing (If applicable) Varispeed, Time Compression This is “25 fps”, but is sped up from the native speed of 23.976 to 25 fps. The resultant audio is 4.1% faster and higher in pitch than the original captured audio. Metadata should indicate it is “25 fps sped up 4.1%” This is “25 fps”, but is sped up from the native speed of 24 to 25 fps. The resultant audio is 4.0% faster and higher in pitch than the original captured audio. Metadata should indicate it is “25 fps sped up 4.0%”

Pitch Correction Y/N Yes or No

1 – Per work being performed in 31FS-10.

43 Private Draft Document. Not for Publication

3. Data Essence

3.1. Overview

3.1.1. Introduction The Data Essence…

3.1.2. Data Essence System Overview For the purpose of documenting the specific requirements and specifications for Data Essence, it is helpful to divide the system into a set of components. The specifications and requirements for each of these components will be described in the following sections: • Subtitles o Sub Picture – the pre-rendered open text specification and file format o Timed Text – the Timed Text data specification and file format • Captions - the captions specification with similar categories as Subtitles above. o Sub Picture – the pre-rendered open text specification and file format o Timed Text – the Timed Text data specification and file format

3.1.3. Major Data Essence Concepts Fill in the blanks……

3.1.4. Data Essence Fundamental Requirements

3.1.4.1. Common File Formats The Essence and Data Essence shall use a common standardized file format for each element (image, audio, subtitles, etc.). See 2.2.1 Common Essence File Formats.

3.1.4.2. Synchronization Files within the image and/or audio track shall carry information to provide for frame-based synchronization between each file. See 2.2.2 Frame Rates and Synchronization. At a minimum, they shall include a start of file and a continuous frame count.

3.2. Subtitle Specification

3.2.1. Sub picture

3.2.2. Timed Text

3.3. Caption Specification

3.3.1. Sub picture

3.3.2. Timed Text

44 Private Draft Document. Not for Publication

4. Dynamic Metadata

4.1. Overview

4.1.1. Introduction The Metadata…

4.1.2. Dynamic Metadata System Overview For the purpose of documenting the specific requirements and specifications for metadata, it is helpful to divide the system into a set of components. The specifications and requirements for each of these components will be described in the following sections: • Time code – The specification and file format for time code dynamic metadata • Pan and Scan – The specification and file format for Image pan and scan dynamic metadata • Stereoscopic Offset Metadata - Allow for a offset subtitle metadata track?

4.1.3. Major Dynamic Metadata Concepts Fill in the blanks……

4.1.4. Dynamic Metadata Fundamental Requirements

4.1.4.1. Common File Formats The dynamic metadata shall use a common standardized file format for each element (time code, aux data, etc.). See also 2.2.1 Common Essence File Formats.

4.2. Time code Specification 1. It should support continuous time code per sequence. Continuous is one time code track for entire video. 2. Multiple time code tracks shall be supported. 3. Only one time-base is allowed within a sequence.

4.3. Pan and Scan Specification The pan and scan metadata track shall contain the information to allow for a “pan-and-scan” version of the image. Instead of storing multiple versions of the image to accommodate different aspect ratios of the feature (for example, 2.39:1, 1.78:1 and a 1.33:1), this feature allows for the IMP to contain one version of image and along with pan and scan metadata, create multiple aspect ratio versions of the image track.

4.3.1.1. Basic Pan and Scan Requirements The pan and scan metadata track shall contain information derived from pan and scan composition equipment/software in a standardized format. The format will include the basic, common capabilities of pan and scan in an image including: . Displaying only certain areas of an image . Zoom in . Zoom out/Scale/ . Tilt (up and down movement) . Pan (left and right movement) . Horizontal squeeze/stretch . Vertical squeeze/stretch . Rotate . Flip/Flop, horizontally/vertically

45 Private Draft Document. Not for Publication

The pan and scan metadata track will be used in conjunction with the timecode metadata track in order to create the proper movements from one area of the image to another area of the image at specific timecodes. Any movement from one area to another area of the image shall be depicted as events that occur at specific timecodes. The movement could be either a constant, linear movement or a dynamically-changing, non-linear movement; however, movements shall not be defined – only the resulting changes in image size and location per timecode frame.

The Active Image and the Image Container are also important components to the pan and scan metadata. The Active Image start and end coordinates define the area of the image used for the pan and scan area while the Image Container will hold the entire canvas that is to be panned and scanned over. Other parameters in the pan and scan metadata could cause the Active Image to be altered such as changing the overall aspect ratio or the size of the image to be displayed.

The output image settings and/or Output Profile Lists shall determine the overall scaling of the images during the output stage of the IMF. The Active Image area created by the pan and scan metadata shall be scaled to fit the output resolution depending on the Output Profile List parameters. For example, a 1.78 pan and scan could be output as a 4x3 letterboxed 1.78 if the OPL specified the settings to create the 4x3 letterbox from the 1.78 pan and scan metadata.

4.3.1.2. Pan and Scan Metadata Required Fields In order to define the areas of the image that should be shown and to create any pan and scan movements, specific metadata is required to accurately identify the pan and scan information. At the time of this writing, however, the actual metadata fields cannot be defined. Instead, the data elements shown in Table 6 - Pan and Scan Metadata Data Elements, shall be the minimum amount of information supported by the IMF for pan and scan metadata. These data elements shall be converted into specific metadata fields once the specification is complete. Some of these data elements have sub-element fields to help define the overall data element.

Table 6 - Pan and Scan Metadata Data Elements

Data Element Data Element Definition Examples Pan/Scan Track ID Label for the specific pan and scan metadata 2.40, 2.39, 2.35, 2.20, track that distinguishes it from other pan and 1.78, 1.33 scan metadata tracks and sets the fixed aspect ratio for the specific track Fill Color In cases where the Active Image is smaller Format TBD; could than the output size, this field determines the be R-G-B values color that should be used to fill. dependent on bit depth of IMF Pan/Scan Event ID Identifies the start of a new pan and scan Numeric, starting with event change 0, number of digits TBD Timecode Sub-element of Pan/Scan Event ID, timecode of where event is located Active Image Sub-element of Pan/Scan Event ID, start of Varies – examples Start Coordinate Active Image within an Image Container area include (0,0) and (x, y) expressed in an (x,y) coordinate value that is (0,239) placed in relation to the Image Container Top Left Coordinate of (0,0)

46 Private Draft Document. Not for Publication

Data Element Data Element Definition Examples Active Image Sub-element of Pan/Scan Event ID, end of Varies – examples End Coordinate Active Image within Image Container area include (1919,1079) (x,y) expressed in an (x,y) coordinate value that is and (1679, 1079) placed in relation to the Image Container Bottom Right Coordinate Scale x % Sub-element of Pan/Scan Event ID, denotes 100% = No change the percentage in change in the horizontal <100 = Smaller direction for the Active Image for output >100 = Larger Scale y % Sub-element of Pan/Scan Event ID, denotes 100% = No change the percentage in change in the vertical <100 = Smaller direction for the Active Image for output >100 = Larger Rotate Sub-element of Pan/Scan Event ID, denotes Numbers ranging Parameter the angle of rotation for the Active Image in from -180 to +180, degrees; 0 = No rotation, Positive Numbers: 0 = No Rotation Clockwise, Negative Numbers: Counter- Clockwise Flip Parameter Sub-element of Pan/Scan Event ID, denotes None, Horizontal, whether the Active Image area should be Vertical flipped horizontally or vertically

Examples of how the metadata fields work together to create the pan and scan image areas and movements are shown below.

Example 1: Simple cuts from one area to another

In the above example, two people are having a conversation. The Image Container/full aperture image is 2048x1556, and the red boxes show the Active Image area that should be displayed to create a 1.33 version of the image. The image in shot #1 would be displayed starting at a specific timecode, while one person is talking and then cut to the image in shot #2 when the other person starts talking. A possible scenario of the pan and scan metadata is shown below:

Metadata Field Value Comments Pan/Scan Track ID 1.33 Fill Color 64-64-64 Assumes 10-bit RGB content PS Event ID 0000 Pan/Scan Event for Shot #1 Timecode 01:03:00:05 Start x,y 0, 100 Top left of red box End x,y 1126, 933 Bottom right of red box Scale x 100 No change Scale y 100 No change Rotate 0 No change Flip None No change PS Event ID 0001 Pan/Scan Event for Shot #2 Timecode 01:03:05:18

47 Private Draft Document. Not for Publication

Start x,y 819, 100 Top left of red box End x,y 2048, 933 Bottom right of red box Scale x 100 No change Scale y 100 No change Rotate 0 No change Flip None No change

Example 2: Pan from one area to another

In Example 2, above, two people are having a conversation. The Image Container/full aperture image is 2048x1556, and the red boxes show the Active Image area that should be displayed to create a 1.33 version of the image. The image in shot #1 would be displayed starting at a specific timecode, while one person is talking and then pan over to the image in shot #2 when the other person starts talking. The pan could either be linear or non-linear in speed.

Example 2, is similar to Example 1, but instead of only having two Pan/Scan events, there would be a separate Pan/Scan Event for each subsequent frame as the red box area changes along the pan. During each timecode frame, the x,y coordinates of the red box would change according to where the image should be based on whether the pan is a constant, linear speed or a dynamic, non-linear pan.

Example 3: Change in Active Image size

In Example 3, Shot #1 starts out with a smaller Active Image area that slowly zooms out to show the entire area in Shot #2. The zoom out could be linear or non-linear in speed. Similar to example 2, there would be a separate Pan/Scan Event for each subsequent frame as the red box area changes along the zoom out. During each timecode frame, the x,y coordinates of the red box would change according to where the image should be based on whether the zoom out is a constant, linear speed or a dynamic, non-linear zoom out.

Because the image sizes are different for each shot’s Active Area, during the output of the IMP, the images would be scaled to the final output resolution. This could result in Shot #1 being scaled up to the final resolution or Shot #2 being scaled down to the final resolution. An OPL could be created to determine the parameters of how this re-scaling could be done.

48 Private Draft Document. Not for Publication

Example 4: Squeezing or Scaling Shots

1

2

In Example 4, the shot contains a 1.78 main title that needs to be modified in order to display in a 1.33 version. However, after the main titles and credits, the rest of the feature will be full-frame 1.33. This example shows two ways to make credits fit into a 1.33 aspect ratio: a. Squeeze the credit horizontally to fit into the 4x3 area b. Letterbox the credit and keep the same aspect ratio

In either case, feature content could either cut back to a full-screen 4x3 image. In the letterbox case, the letterboxing could slowly “scroll” outwards to reveal a full-screen 4x3 image by zooming in on the credit. A possible scenario of the pan and scan metadata is shown below for the Squeeze method:

Metadata Field Value Comments Pan/Scan Track ID 1.33 Fill Color 64-64-64 Assumes 10-bit RGB content PS Event ID 0000 Pan/Scan Event for Squeeze Timecode 01:03:00:05 Start x,y 0, 0 Top left of red box End x,y 1919, 1079 Bottom right of red box Scale x 100 Use full image Scale y 100 Use full image Squeeze x 75 Squeeze the horizontal to 75% of original Squeeze y 100 No change Rotate 0 No change Flip None No change

A possible scenario of the pan and scan metadata is shown below for the Letterbox method:

Metadata Field Value Comments Pan/Scan Track ID 1.33 Fill Color 64-64-64 Fills blank areas with Black PS Event ID 0000 Pan/Scan Event for Initial Letterbox Timecode 01:03:00:05 Start x,y 0, 0 Use full image End x,y 1919, 1079 Use full image Scale x 75 Reduce the horizontal to 75% of original Scale y 75 Reduce the vertical to 75% of original Rotate 0 No change Flip None No change

49 Private Draft Document. Not for Publication

PS Event ID 0001 Pan/Scan Event for start of zoom in Timecode 01:03:05:18 Start x,y 0, 0 Top left of red box End x,y 1919, 1079 Bottom right of red box Scale x 77 Increases the horizontal to 77% of original Scale y 77 Increases the vertical to 77% of original Rotate 0 No change Flip None No change PS Event ID 0002 Pan/Scan Event for next part of zoom in Timecode 01:03:05:19 Start x,y 0, 0 Top left of red box End x,y 1919, 1079 Bottom right of red box Scale x 79 Increases the horizontal to 79% of original Scale y 79 Increases the vertical to 79% of original Rotate 0 No change Flip None No change PS Event ID 0003 Pan/Scan Event for next part of zoom in Timecode 01:03:05:20 Start x,y 0, 0 Top left of red box End x,y 1919, 1079 Bottom right of red box Scale x 81 Increases the horizontal to 81% of original Scale y 81 Increases the vertical to 81% of original Rotate 0 No change Flip None No change …

Pan/Scan Events are continued until the zoom in is complete.

50 Private Draft Document. Not for Publication

5. Wrapping

5.1. Overview

5.1.1. Introduction The IMF, as stated in the System Overview, is a collection of files, such as picture essence files and audio track files. These files, as they stand by themselves, do not represent a complete presentation. Synchronization tools, asset management tools, metadata, content protection and other information are required for a complete presentation to be understood and played back as it was intended. This is especially important when the files become compressed and/or encrypted and are no longer recognizable as image essence or audio essence in this state. See 2.2.2 Frame Rates and Synchronization. Wrapping is a way to organize and collect this material in such a way as to make it suitable for storage and movement to its destination. In seeking a common interchange standard for digital video between facilities and equipment, it is understood that there may be multiple sources of content. This will require special consideration to achieve IMF interchange. Thus, an interchange wrapping structure is needed that operates across several domains.

5.2. Wrapping Requirements

5.2.1. Introduction The Audio, Image and Text Track Files are the lowest-level components in the IMF system. The Material eXchange Format (MXF, SMPTE ST 377:2009) defines the common structure of the various types of files used in IMF to wrap audio, image and timed-text essence. MXF defines a variety of abstract essence container types for storing Essence, Data Essence or Dynamic Metadata information. An MXF file consists of three logical parts: the File Header, the File Body and the File Footer as shown in Figure 8, below.

Figure 8 - Example Track File Structure

The file structure is further broken down into logical data items as defined in [SMPTE 336M Data Encoding Protocol using Key-Length-Value]. The KLV Coding Protocol is composed of a Universal Label (UL) identification Key (UL Key), followed by a numeric Length (Value Length), and followed by the data Value as shown below in Figure 9, below. One or more of these data items are combined to form the logical parts shown above.

Length

UL Key Length Value

16 bytes BER Variable Length Octets

Figure 9 - Example of KLV Coding

51 Private Draft Document. Not for Publication

5.2.2. Format Information Each Track File shall be a self-contained element, such that its Essence and Metadata can be understood and presented as it was packaged by a compliant decoder (i.e., no information outside the file is needed to fully decode the file). The information shall be located in the file as specified by the wrapping standard for the respective essence type.

5.2.3. Metadata In addition to the metadata items defined by the MXF file specifications referenced below, it will be necessary to develop descriptors (MXF data Sets) that allow version and title information to be embedded in MXF files.

5.2.4. Track File Replacement A Track File is the smallest unit that can be managed or replaced as a discrete file in the field.

5.2.5. Synchronization Each Track File shall contain Edit Rate and Index metadata. Synchronization of two or more track files is defined by the Composition Playlist (CPL). See 2.2.2 Frame Rates and Synchronization

5.2.6. Splicing Track Files, of the same Essence type, are required to allow for seamless or click-free splicing to create a continuous data stream for a presentation. The playback system shall be able to perform sample accurate or click free splicing of audio track files.

5.2.7. Security IMF Track File formats shall support encryption and integrity checking.

5.2.8. Extensibility MXF inherently supports future extensions by allowing the decoder to ignore unrecognized KLV packets..

5.2.9. Simple Essence A track file shall contain essence of a single Essence type (e.g., audio, image, subtitles). While a Track File can, for instance, contain all audio channels for a given soundfield, additional soundfields are required to be stored in separate track file. The Composition PlayList will select the correct Track Files to play a requested version of the composition.

5.3. MXF Track File Encryption

5.3.1. Introduction Track File encryption provides additional security for files transported or stored in an un-secure environment. A symmetric-key cipher is employed for processing efficiency. A Key management scheme is required to provide a way to send symmetric keys securely to an intended recipient. Key management is not addressed in this document.

5.3.2. Standards SMPTE ST 429-6:2006 defines a symmetric key encryption wrapper for KLV packets in an MXF file.

52 Private Draft Document. Not for Publication 5.4. Image Track File

5.4.1. Introduction An Image Track File contains the image Essence data and its associated Metadata. Each Image Track File can contain compressed and encrypted image data. The following are requirements for an Image Track File.

5.4.2. Frame Boundaries The Image Track File shall contain all identifiers and parameters required for the decoder to recover the images from the file.

5.4.3. Compression The Track File shall support Constant Bit Rate (CBR) compression and Variable Bit Rate (VBR) compression, within the constraints of the specified code stream for the reference decoder

5.4.4. Standards SMPTE specifications ST 422:2006 and ST 429-4:2006 define an MXF wrapping for JPEG 2000 images. The MXF wrapper is compatible with Track file encryption defined by SMPTE ST 429-6:2006. Additional standards must be developed to provide for using the color spaces required by IMF.

5.5. Audio Track File

5.5.1. Introduction An Audio Track File contains the audio Essence data and its associated Metadata. Audio Essence is stored in Audio Track Files as sample-interleaved 24-bit PCM. Each audio Text file shall contain a single complete soundfield (i.e., all channels in the file shall be heard simultaneously by the listener).

5.5.2. Standards SMPTE specification ST 382:2007 defines an MXF wrapping for sample-interleaved PCM audio data. The MXF wrapper is compatible with Track file encryption defined by SMPTE ST 429-6:2006. Note: No standard exists at this time to address the channel labeling requirements of IMF. A multi- channel labeling framework is maturing in SMPTE TC 31FS, but additional work will be required to create the labeling structure required for IMF.

5.5.3. Metadata In addition to the Metadata defined by the ST 382:2007 wrapping, the following Metadata items shall be supported by the Audio Track File: • Unique ID of corresponding plaintext track encrypted • Channel Mapping Labels

5.6. Timed Text Track File

5.6.1. Introduction A Timed Text Track File contains text Essence data, such as subtitles or captions, for display on- screen (over the main image) or potentially by an auxiliary display. Text essence is encoded as Unicode character strings with associated font resources, to be rendered by the decoder, or as PNG images for direct display. Each Timed Text Track File may contain any combination of text, font references, and image references.

53 Private Draft Document. Not for Publication

5.6.2. Standards SMPTE specifications ST 428-7:2007 and ST 428-10:2008 define XML file formats for timed text with fonts and PNG files. SMPTE ST 429-5:2009 defines an MXF wrapper for XML-based timed text and associated ancillary resources. The MXF wrapper is compatible with Track file encryption defined by SMPTE ST 429-6:2006. Note: Support for stereoscopic positioning is being developed by SMPTE TC 21DC. That work is expected to result in revisions to ST 428-7 some time in 2010.

5.7. Time Code Track Files (Optional)

5.7.1. Introduction A Time Code Track File contains, for example, the Unicode text data or any other data or Metadata that belongs in a separate track for functional purposes. The following are requirements for a Time Code Track File.

5.7.2. Frame Boundaries The Time Code Track File is encouraged to begin and end with complete frames that are associated with its Image Track File to allow for a clean transition between reels.

5.7.3. Metadata The following Metadata shall be furnished with the Time Code Track Files: • Unique identification • Track Type (i.e., auxiliary) • Frame Count Number • Text Format (If Applicable) • Cue Names (If Applicable) •

54 Private Draft Document. Not for Publication

7. The Composition

5.8. Introduction The Composition represents a complete self-contained digital media program. This Composition may be a feature, episode, trailer, advertisement or any other single piece of content. A composition usually consists of at a minimum a Composition PlayList (CPL) and one or more Track Files, which contain the actual program Essence. A very simplified example of the Hierarchical Structure for a Composition is given below.

5.9. Functional Framework For the purpose of documenting the specific requirements for an Interoperable Master Format Packaging system, it is helpful to divide the system into a set of components. The performance requirements for each of these components will be described in the following sections: • Composition – A self-contained representation of a single complete Digital Video work, such as a motion picture, or a trailer, or an advertisement, etc. • Composition PlayList - The instructions to a play out device or a transcoder on how to assemble the Track Files together to create a version of the program. (Similar to an Edit Decision List) • Composition Security – The specifications and requirements of a Security framework the end users may or may not take advantage of.

5.10. The Composition Fundamental Requirements

5.10.1. Open Standard The Composition standard shall be based upon an open worldwide standard. This format is encouraged to be a license-free technology. It shall be a complete standard that equipment receiving a compliant Composition can process and interpret unambiguously.

5.10.2. Interoperable The Composition format shall have an open framework that accommodates compressed, encrypted files as well as all other files used in Digital Video.

5.10.3. Scalable The Composition format shall accommodate any number of Essence or Metadata components. There is no limit on the number of files included in the package or the size of the files.

5.10.4. Extensible The Composition format shall allow for new Digital Video features (compositions) to be contained within the package.

5.10.5. Synchronization The Composition format shall provide support for synchronization of the Essence and Metadata elements. See 2.2.2 Frame Rates and Synchronization

5.10.6. Human Readable Metadata Human readable Metadata shall be in English (default) but can be provided in other languages as well.

5.10.7. File Format The Composition PlayList shall use the secure (digitally signed) text-based XML file format. More specifically the structures defined in this document are represented using the Extensible Markup

55 Private Draft Document. Not for Publication

Language (XML) [XML 1.0], and specified using XML Schema [XML Schema Part 1: Structures] and [XML Schema Part 2: Datatypes]. This specification shall be associated with a unique XML namespace name [Namespaces in XML]. The namespace name shall be the string value “http://www.tbd.org“. This namespace name conveys both structural and semantic version information, and serves the purpose of a traditional version number field. Table 7: XML Namespaces lists the XML namespace names used in this specification. Namespace names are represented as Uniform Resource Identifier (URI) values [RFC 2396]. These values shall be considered as simple strings, and applications should not attempt to resolve them as URLs.

Table 7: XML Namespaces

Qualifier URI cpl http://www.tbd.org/date xs http://www.w3.org/2001/XMLSchema ds http://www.w3.org/2000/09/xmldsig

The namespace qualifier values (namespace prefixes in XML jargon) used in this document (cpl, xs, ds), are not normative values. Implementations shall perform correctly with any XML compliant namespace prefix value that is associated with a URI from Table 7: XML Namespaces, above. Datatypes from other schemas that are used in this document will be prefixed with the appropriate namespace qualifier (e.g. xs:dateTime). See [XML Schema Part 2: Datatypes] and [XML-Signature Syntax and Processing] for further information about these types. The MIME type [IETF RFC 2046] for a document containing a single Composition PlayList element as its root shall be “text/xml”.

5.11. CPL Constraints The following is a list of items that are intended to be constrained within the Composition and therefore reflected in the CPL.

5.11.1. Constant Frame and Sample Rate The Composition shall only consist of similar frame rate and sample rate content in all of the included track files; all track files shall have the same edit rate, sample rate and frame rate.

5.11.2. Minimum Track File Duration The duration of any asset contained in a Sequence, as indicated by the Duration and IntrinsicDuration elements, shall be no less than one frame.

5.11.3. Minimum Sequence Duration The duration of any Sequence shall be no less than one second.

5.11.4. Fractional Sample Editorial Granularity Fractional samples shall only allow audio editorial granularity of once every 5 frames.

5.11.5. Audio Items

5.11.5.1. Partitions The following partitions shall be supported • Continuous (full length)

56 Private Draft Document. Not for Publication

• Reels • Parts

5.11.5.2. Audio Editing Granularity • The audio editing granularity shall have frame accuracy. • An IMF authoring tool shall have granularity to the sample.

5.11.5.3. Crossfades • There shall be no automatic or automated audio crossfades in the IMF CPL at playout or transcoding. • Rendered audio shall be used to perform difficult crosses between content. Note: these would be created in the audio editing process and rendered in the authoring process. • Audio Fade up or fade outs shall not be supported.

5.11.5.4. Audio Editing and Transitions • Only hard cuts (“butt splices”) shall be supported. • Audio editing and transitions shall support only straightforward cuts and inserts rather than complex editing.

5.11.5.5. Audio Insert Considerations • Audio inserts may be of different length than corresponding picture insert to facilitate proper audio flow across edits.

5.12. Stereoscopic Content

Stereoscopic content shall be stored in the Composition as separate track files for each eye in a Dual Track format. Each Track shall conform to the same parameters as a monoscopic Track.

5.13. Terminology The following terms are used to describe the features of this specification.

Table 8: Terms and Definitions

Term Definition

Composition A complete artistic or informational motion picture work, such as a feature, episode, trailer, or an advertisement, etc.

Edit Rate A number of Editable Units to be reproduced during a temporal interval having a duration of exactly one (1.0) second. Because Edit Rate values are not always integer values and sometimes require many digits of precision, Edit Rate values are expressed as a rational number (the ratio of two integers).

Editable Unit The smallest temporal increment of access to Essence, e.g. a frame or a sample.

Essence The sound, picture and data resources that ultimately are intended for a viewing and/or listening experience.

Frame Rate The number of frames per second. Frame Rate values are expressed as a rational number (the ratio of two integers)

57 Private Draft Document. Not for Publication

Term Definition

Native Duration The total number of Editable Units in a Track File.

Native The last Editable Unit of a Track File. End Point

Native The first Editable Unit of a Track File. All Track Files are viewed by a Composition Start Point PlayList as a sequence of Editable Units numbered from 0 (zero). Consequently, the Editable Unit number of the Native Start Point of a Track File will always be 0 (zero).

Playable The set of Editable Units within a Track File that is intended to be reproduced as part of Region a Composition. A Track File may contain Editable Units before and/or after the Playable Region.

Sample Rate The number of essence samples per second. Sample Rate values are expressed as a rational number (the ratio of two integers)

Sequence A single or contiguous set of Track Files intended to be reproduced sequentially within a Composition

Track File A file containing a single Essence, such as sound, picture or subtitle essence.

5.14. Synchronization The Composition PlayList defines an idealized playback timeline using XML to provide information to the transcoder or playback device describing the order and parameters used to manipulate the files. See 2.2.2 Frame Rates and Synchronization

As depicted in first in Figure 1 previously, and now below in Figure 10 - Composition Playlist Sequence, the timeline consists of a contiguous set of Sequences that are spliced together, which fit within the region of a complete Composition.

58 Private Draft Document. Not for Publication

Figure 10 - Composition Playlist Sequence

Continuing with this hierarchical method, as depicted in Figure 10, the expanded Sequence consists of a set of synchronized Track Files along with their associated Metadata Track Files.

Figure 11 - Timing Relationships within a Sequence

For example, the CPL identifies a segment of a Track File to be reproduced in parallel with the other Track Files in the same Sequence. The Asset’s Entry Point and Duration parameters define the order of Editable Units within the Track File that is to be reproduced (the Playable Region). Track Files are aligned with offsets from their Native Start Points from which they are then synchronized and shall be reproduced simultaneously. Using this same method, Figure 11 illustrates the timing relationship between Track Files within a Sequence.

This is a placeholder for Figure 12

Figure 12 - Sequence Sync Relationship Example

At the start of a given Sequence, playback of all Track Files contained within that Sequence shall start simultaneously at the Entry Point given for each respective Track File. The duration of a Sequence shall be equal to the total duration of the Sequence’s MainPicture Track Files, or the Track Files with the shortest duration if the MainPicture Track File is not present.

59 Private Draft Document. Not for Publication 5.15. Composition PlayList Structure Composition PlayList shall be represented by a unique XML element, the CompositionPlaylist element. The Composition PlayList shall be encoded using the UTF-8 character encoding [XML 1.0]. The Composition PlayList shall include the following fields unless stated as optional in which case they may be included. An example of a Composition PlayList is provided in Chapter 10.

5.15.1. General Information The Composition PlayList should provide the following general information to allow quick access to the information about the composition. This should allow either humans or automated systems to source the general information. Below is the list of the both the informative information and required information to complete the structure of the Composition PlayList. Some items are optional and therefore not required for a compliant Composition PlayList. Other sections are described in more detail in later sections. • Unique ID • Annotation Text [optional] • Issue Date • Issuer [optional] • Creator [optional] • Content Title Text [optional] • Content Kind (e.g., Feature, Trailer, Logo, Advertisement, Episode) • Content Version • ID [optional] • Label Text [optional] • Content Description • Source Media Description [optional] • Frame Rate • Timecode Type [optional] • Image Encoding Type • Audio Sample Rate • Audio Samples/Frame • Soundfield Configuration • Audio Bit Depth • Subtitle Type • Caption Type • Media File Size • Total Running Time • Language • Country • Rating [optional] • Agency • Label • Encryption (yes/no) • Sequence List • Signer [optional]

60 Private Draft Document. Not for Publication

• Signature [optional]

5.16. CPL Element Requirements Listed below are the element requirements and descriptions.

5.16.1. Unique Id The Id element uniquely identifies the Composition PlayList for asset management purposes. It shall not uniquely identify the content represented by the composition. It shall be encoded as a urn:uuid per [RFC 4122].

5.16.2. AnnotationText [optional] The AnnotationText element shall be a free-form, human-readable annotation describing the composition. It is meant strictly as a display hint to the user. The optional language attribute is an xs:language language code and indicates the language used for the text. If the language attribute is not present, the default value en shall be used.

5.16.3. IssueDate The IssueDate element shall be used to define the time and date at which the Composition PlayList was issued. It may be displayed to the user. It shall be encoded as an xs:dateTime.

5.16.3.1. Issuer [optional] The Issuer element shall be a free-form, human-readable annotation that shall identify the entity that created the Composition PlayList. It is meant strictly for display to the user. The Signer element defined in Section 5.16.15 shall be used to identify the entity that digitally signed the Composition PlayList. The optional language attribute is an xs:language language code and indicates the text language of the content of the element. If the language attribute is not present, the default value en shall be used.

5.16.4. Creator [optional] The Creator element shall be a free-form, human-readable annotation that shall identify; the application used to create the Composition PlayList, the Facility that created the CPL and the operator that created the CPL. It is meant strictly for display to the user. The optional language attribute is an xs:language language code and indicates the text language of the content of the element. If the language attribute is not present, the default value en shall be used.

5.16.5. ContentTitleText The ContentTitleText element shall contain a human-readable title for the composition, e.g. The Jazz Singer. It is strictly meant as a display hint to the user. The optional language attribute is an xs:language language code and indicates the language of the content of the element. If the language attribute is not present, the default value en shall be used.

5.16.6. ContentKind The ContentKind element defines the kind of material referred to by the Composition PlayList. It is meant to be both human and machine-readable. Table 9: Examples of Content Kind, below, shows examples of Content Kind.

Table 9: Examples of Content Kind

Kind Description advertisement Content promoting a product or service other than an upcoming feature.

61 Private Draft Document. Not for Publication

feature A theatrical feature. psa Public service announcement. rating Slate/still picture indicating the recommended age group permitted to view the content to follow. This rating is generally unique per country. short Non advertising/promotional content (3 to 15 minutes) typically before a theatrical feature. teaser Very short (typically less than 1 minute) content promoting an upcoming theatrical feature. test Content used to test, calibrate or setup equipment. trailer Short (2 to 3 minutes) content promoting an upcoming theatrical feature. transitional Extremely short content (1 to 15 seconds) separating unrelated compositions.

5.16.7. ContentVersion This defines the version of the content referred to by the composition, as opposed to the Composition PlayList Id element which uniquely identifies an instance of the Composition PlayList. Thus there may be two distinct compositions, with distinct CompositionPlaylist Id values, that refer to the same content and thus have the same ContentVersion Id value. This may occur, for example, if a composition is distributed to supersede a previous version. Similarly, while two compositions may share the same content title, they may refer to two different versions, such as French (dubbed) and French (original), and therefore have two distinct ContentVersion Id values. The ContentVersion element is meant to assist both users and software in scheduling and tracking content.

5.16.7.1. Id The Id element shall identify the content contained in the Composition PlayList. It shall be a valid URN, per [RFC 2141].

5.16.7.2. LabelText The LabelText element shall be a human readable label, e.g. “French (1.85 picture, 16.1 sound, dubbed)”, describing the content. The optional language attribute is an xs:language language code and indicates the text language of the content of the element. If the language attribute is not present, the default value en shall be used.

5.16.8. ContentDescription This describes the content referred to by the composition. Most of this information is also contained in the metadata of the Track Files. It is provided here in human readable form for informative purposes only. Machines should not rely on this information for transcoding or Play-Out and in all cases the Track File metadata takes priority over information provided here. The ContentDescription element is meant to assist both users and software in scheduling and tracking content.

5.16.8.1. SourceMediaDescription [optional] The SourceMediaDescription element shall be human readable text, e.g. “HDCamSR (1.85 picture, 16.1 sound, dubbed)”, describing the content source. The optional language attribute is an xs:language language code and indicates the text language of the content of the element. If the language attribute is not present, the default value en shall be used. Further more an element ID and LabelText can be added to the description and is specified below.

62 Private Draft Document. Not for Publication

5.16.8.2. FrameRate The FrameRate element shall contain the number of frames per second of the intended playback rate of the MainPicture element. (ex. 23.98 or 59.94) This is informative information to allow for humans and machine to read and if necessary manipulate the MainPicture asset of the Composition to conform to the FrameRate.

5.16.8.3. TimecodeType [optional] The TimecodeType element shall contain information of the timecode format of the MainPicture element. This is informative information to allow for humans and machine to read and if necessary manipulate the MainPicture asset of the Composition to conform the picture information to the Compression specification.

5.16.8.4. ImageEncodingStandard [optional] The ImageEncodingStandard element shall contain information of the encoding parameters of the MainPicture element. This is informative information to allow for humans and machine to read and if necessary manipulate the MainPicture asset of the Composition to conform the picture information to the encoding specification.

5.16.8.5. StandardsBody [optional] The StandardsBody element shall contain a URI [RFC 2396] that uniquely identifies the Standards Body issuing the Standard.

5.16.8.6. Label [optional] The Label element shall contain a textual representation of the Standards Body, which may be displayed to the user. For each issuing Standards Body, and hence unique URI, there are a number of permissible Label values. An example of such is shown below in Table 10: Example Ratings (Informative). The specification of this mapping is beyond the scope of this document.

5.16.8.7. AudioSamplingRate The AudioSamplingRate element shall contain the floating-point number of the sampling rate of the MainSound element. (ex. 48 or 96) This is informative information to allow for humans and machine to read and if necessary manipulate the MainSound asset of the Composition to conform to the AudioSamplingRate.

5.16.8.8. AudioSamplesPerFrame The AudioSamplesPerFrame element shall contain the integer of the samples per frame of the MainSound element. (ex. 2002 ) This is informative information to allow for humans and machine to read and if necessary manipulate the MainSound asset of the Composition to conform to the Audio Samples per Frame.

5.16.8.9. SoundfieldConfig The SoundfieldConfig element shall contain the Channel configuration of the MainSound element. (ex. LtRt, 5.1 etc.) This is informative information to allow for humans and machine to read and if necessary manipulate the MainSound asset of the Composition to conform to the Soundfield Configuration.

5.16.8.10. AudioBitDepth The AudioBitDepth element shall contain the integer number of bits per sample word of the MainSound element. (ex. 16 or 24) This is informative information to allow for humans and machine to read and if necessary manipulate the MainSound asset of the Composition to conform to the AudioBitDepth.

63 Private Draft Document. Not for Publication

5.16.8.11. SubtitleFormat [optional] The SubtitleFormat element defines the format of the MainSubtitle of material referred to by the Composition PlayList. It is meant to be both human and machine-readable.

5.16.8.12. CaptionFormat [optional] The CaptionFormat element defines the format of the MainCaption of material referred to by the Composition PlayList. It is meant to be both human and machine-readable.

5.16.9. TotalRunningTime The TotalRunningTime element shall be used to define the complete running time of the Composition at the instance when Composition PlayList was issued. It may be displayed to the user. It shall be encoded as an xs:hours:minutes:seconds:frames.

5.16.10. Language [optional] The Language element shall reflect the primary spoken or textual language of the Composition. The element value is encoded as an xs:language language code and indicates the primary language of the content.

5.16.11. Country [optional] The Country element shall reflect the primary area of intended distribution of the Composition. This shall use the ISO standard three-letter acronym for it designation.

5.16.12. RatingList [optional] The RatingList element shall contain an ordered list of zero or more Rating elements containing ratings associated with the composition. Each Rating element, shown in Table 10: Example Ratings (Informative), contains an Agency and a Label element. Each element is meant to be both human and machine-readable. There shall be only one Rating element per given Agency.

5.16.12.1. Agency The Agency element shall contain a URI [RFC 2396] that uniquely identifies the agency issuing the rating.

5.16.12.2. Label The Label element shall contain a textual representation of the rating, which may be displayed to the user. For each issuing agency, and hence unique URI, there are a number of permissible Label values. The specification of this mapping is beyond the scope of this document

Table 10: Example Ratings (Informative)

Agency Labels http://www.mpaa.org/2003-ratings R, PG, PG-13, G, NC-17 http://rcq.qc.ca/2003-ratings G, 13+, 16+, 18+

5.16.13. Encryption The Encryption element defines the whether or not Encryption used on any portion of the Composition. It is meant to be both human and machine-readable.

5.16.14. SequenceList The SequenceList element shall contain an ordered list of Sequence elements to be reproduced in continuous order. The structure of the Sequence element shall be as defined in Section 5.17.

64 Private Draft Document. Not for Publication

5.16.15. Signer [optional] The Signer element uniquely identifies the entity, and hence the public-private key pair, that digitally signed the Composition PlayList. It shall be an instance of the KeyInfoType type defined in [XML- Signature Syntax and Processing]. If the Signer element is present, then the Signature element shall also be present. If X.509 certificates are used per [XML-Signature Syntax and Processing], then the Signer element shall contain one X509Data element containing one X509IssuerSerial element, which uniquely identifies the certificate used to sign the Composition PlayList.

5.16.16. Signature [optional] The Signature element shall contain a digital signature authenticating the Composition PlayList. If the Signature element is present, then the Signer element (5.16.15 above) shall also be present. The Signature element shall be an instance of the ds:Signature element defined in [XML- Signature Syntax and Processing]. The digital signature shall be enveloped and apply to the entire Composition PlayList. An enveloped signature is one that is attached to the document being signed. The signature is generated by the signer, as identified by the Signer element, using the signer’s private key.

65 Private Draft Document. Not for Publication

5.17. Sequence Structure Each Sequence shall consist of a number of Assets, each corresponding to a particular aspect of the Digital Video presentation. While this specification defines a number of asset types, additional asset types may be added in the future. The structure is shown below in Figure 13.

Figure 13 - Sequence Structure

5.18. Sequence Information The Composition PlayList should provide the following sequence information to allow quick access to the information about the composition. This should allow either humans or automated systems to source the general information. Below is the list of the both the informative information and required information to complete the structure of the Sequence. Some items are optional and therefore not required for a compliant Composition PlayList. Other sections are described in more detail in later sections. . SequenceType o Unique ID o Annotation Text (Sequence Title) [optional] o Sequence Version [optional] o Issue Date [optional] o Issuer [optional] o AssetList . MainMarkers [optional] . MainPicture [optional] . PictureSpatial [optional] . MainSound [optional] . MainSubTitles [optional] . MainCaptions [optional] . Other Asset [optional]

66 Private Draft Document. Not for Publication

5.18.1. SequenceType The SequenceType describes a Sequence asset intended to be reproduced as part of a Composition PlayList. Individual child elements are defined in the following subsections.

5.18.2. Unique Id The Id element uniquely identifies the Sequence for asset management purposes. It shall be encoded as a urn:uuid per [RFC 4122].

5.18.3. AnnotationText [optional] The AnnotationText element shall be a free-form, human-readable, text annotation associated with the Sequence. It meant strictly as a display hint to the user. The optional language attribute is an xs:language language code and indicates the text language of the content of the element. If the language attribute is not present, the default value en shall be used.

5.18.4. SeqIssueDate The SeqIssueDate element shall be used to define the time and date at which the Sequence was issued. It may be displayed to the user. It shall be encoded as an xs:dateTime.

5.18.5. Issuer [optional] The Issuer element shall be a free-form, human-readable annotation that shall identify the entity that created the Sequence. It is meant strictly for display to the user.

5.18.6. AssetList The AssetList element shall contain the ordered list of media assets that will be reproduced in parallel during Sequence playback. Each asset shall be uniquely identified by a UUID per [RFC 4122]. The defining specification for each Track File shall specify the location of the identifying UUID. Each child element of the AssetList element shall be derived from the datatype GenericAssetType defined in Section 5.19.2. A number of standard assets are defined in the subsequent sections. As additional assets are defined, e.g. captions, the AssetList element may be extended by introducing new elements.

5.18.6.1. MainMarkers [optional] The MainMarkers element defines markers, e.g. FFOC, LFOC…, associated with the main portion of the presentation, i.e. MainPicture and MainSound assets. Markers shall be referenced from the start of the Sequence they are associated with. The MainMarkers element shall be an instance of MarkersAssetType and its structure is described in more detail in Section 5.19.4.

5.18.6.2. MainPicture [optional] The MainPicture element defines the picture essence to be delivered to the Transcoder or Playback Display device. The actual picture essence is contained in an external Track File. The MainPicture element shall be an instance of PictureTrackFileAssetType and its structure is defined in Section 5.19.5.

5.18.6.3. PictureSpatial [optional] The PictureSpatial element defines the picture spatial metadata to be delivered to the Transcoder or Playback Display device. This provides metadata such as Pan and Scan, Zoom, Matte and Cropping information. The actual picture spatial metadata is contained in an external Track File. The PictureSpatial element shall be an instance of PictureMetadataTrackFileAssetType and its structure is defined in Section 5.19.6.

67 Private Draft Document. Not for Publication

5.18.6.4. MainSound [optional] The MainSound element defines the sound essence to be reproduced in the Transcoder or Playback system. The actual sound essence is contained in an external Track File. The MainSound element shall be an instance of SoundTrackFileAssetType and its structure is defined in Section 5.19.7.

5.18.6.5. MainSubtitle [optional] The MainSubtitle element defines the Subtitle essence to be reproduced on the main screen in the Transcoder or Playback system. The actual Subtitle essence is contained in an external Track File. The MainSubtitle element shall be an instance of SubtitleTrackFileAssetType and its structure is defined in Section 5.19.8.

5.18.6.6. MainCaption [optional] The MainCaption element defines the Captions data essence to be reproduced on the main screen in the Transcoder or Playback system. The actual Captions data essence is contained in an external Track File. The MainCaption element shall be an instance of CaptionTrackFileAssetType and its structure is defined in Section 5.19.9.

5.18.6.7. Extensions (New Asset Types) Extension elements shall be used to represent asset types not defined in this document. Zero or more extension elements may be present in the AssetList. When present, extension elements shall be located after any elements defined by this document. When present, extension elements shall have names that belong to a namespace different than the namespace declared by this document. Implementations may ignore extension elements belonging to an unknown namespace. Extension elements shall directly or indirectly extend GenericAssetType. Extension elements may extend any type defined in this document, which are derived from GenericAssetType. Informative note: Extension elements should have unique, descriptive names and should appear only once in a given Sequence. Extension specifications that allow multiple instances of an element in a Sequence should provide both a means of differentiating instances within a Sequence and a means of linking related instances in separate Sequences.

68 Private Draft Document. Not for Publication

5.19. Asset Structures Sequence Asset elements share common attributes, such as Duration, and are therefore specified as a set of types derived from a common structure, namely the GenericAssetType structure. A number of Asset elements also reference external files, in which case they are derived from the TrackFileAssetType structure.

5.19.1. Asset Information The Composition PlayList should provide the following Asset information to allow quick access to the information about the composition. This should allow either humans or automated systems to source the general information. Below is the list of the both the informative information and required information to complete the structure of the Asset. Some items are optional and therefore not required for a compliant Composition PlayList. Other sections are described in more detail in later sections. • Generic Asset Type • Unique ID • Asset Title (Annotation Text) [optional] • Edit Rate • Intrinsic Duration • Entry Point [optional] • Duration [optional] • Track File Asset Type • Key ID [optional] • Hash File [optional] • Marker Asset Type • Marker List • Label • Annotation Text [optional] • Offset • Picture Track File Asset Type • Pixel Matrix • Bit Depth • Color Encoding • Frame Rate • Image Aspect Ratio • Picture Spatial Metadata Track File Asset Type • Standards Body [optional] • Label [optional] • Sound Track File Asset Type • Sampling Rate • Bit Depth • Number of Channels • Channel Configuration • Channel Labels

69 Private Draft Document. Not for Publication

• Language [optional] • Subtitle Track File Asset Type • Language [optional] • Caption Track File Asset Type • Language [optional]

5.19.2. GenericAssetType The GenericAssetType describes a generic asset intended to be reproduced as part of a Sequence or Clip. Individual child elements are defined in the following subsections.

5.19.2.1. Id The Id element uniquely identifies the Asset. It shall be encoded as a urn:uuid per [RFC 4122]. If the Asset refers to an external resource, such as a Track File, the UUID value shall identify the resource. Mapping of UUID values to actual resources is beyond the scope of this document.

5.19.2.2. AnnotationText [optional] The AnnotationText element shall be a free-form, human-readable text annotation associated with the asset. It meant strictly as a display hint to the user. The optional language attribute is an xs:language language code and indicates the text language of the content of the element. If the language attribute is not present, the default value en shall be used.

5.19.2.3. EditRate The EditRate element defines the Edit Rate of the Asset. It shall be in units of inverse seconds and represented as a Rational Number. The IntrinsicDuration, EntryPointand Duration parameters shall be expressed in units of 1/EditRate (i.e. as integer values). If the Asset refers to an external resource, EditRate may differ from the actual Edit Rate or Sample Rate of the underlying essence.

5.19.2.4. IntrinsicDuration The IntrinsicDuration element shall define the Native Duration of the Asset, as illustrated in Figure 15. It shall not take into account the values of the EntryPoint and Duration elements. Unless the optional EntryPoint and Duration parameters are specified, playback of the asset shall start at the Native Start Point and terminate at the Native End Point of the Track File. IntrinsicDuration shall be expressed in units of 1/EditRate, i.e. as a count of Editable Units.

native native start point end point intrinsic duration

asset

EntryPoint Duration

Figure 14 - Asset Timing Parameters

5.19.2.5. EntryPoint [optional] This element shall only be present when the Asset refers to an external resource such as a Track File. The EntryPoint element identifies the Edit Unit where playback shall start (the first editable unit of the playable region). It shall be encoded as an integer number and shall be expressed in units

70 Private Draft Document. Not for Publication

of 1/EditRate, i.e. as a count of Editable Units. This element shall be required if the desired Entry Point is greater than 0 (zero). If this element is not present, a value of 0 shall be assumed and Asset playback shall start at the Native Start Point of the resource.

5.19.2.6. Duration [optional] This element shall only be present when the Asset refers to an external resource such as a Track File. The Duration element defines the duration of the Playable Region of the resource. It shall be encoded as an integer number and shall be expressed as an integer number in units of 1/EditRate, i.e. as a count of Editable Units. If present, this value shall be an integer between 0 (zero) and IntrinsicDuration – EntryPoint (the number of edit units between the EntryPoint and the Native End Point the Track File). If this element is not present, Asset playback shall stop after (IntrinsicDuration – EntryPoint)/EditRate seconds, i.e. at the Native End Point of the Asset.

5.19.3. TrackFileAssetType The TrackFileAssetType element shall be derived from GenericAssetType. It describes an asset based on an external file such as a Picture or Sound Track File.

5.19.3.1. KeyId [optional] The KeyId element uniquely identifies the cryptographic key used to encrypt the underlying track file. This element shall contain a key identifier encoded as a urn:uuid value. KeyId shall be present if any portion of the underlying track file is encrypted. The mapping of key identifiers to actual key values is beyond the scope of this document.

5.19.3.2. Hash [optional] The Hash element shall contain the hash (message digest) of the underlying track file computed using the SHA-1 message digest algorithm [RFC 3174]. When authenticated by the digital signature in the Composition PlayList (see 5.16.16), it may be used to verify the integrity and authenticity of the underlying track file. The resulting 160-bit integer shall be encoded using Base64 representation [RFC 2045].

71 Private Draft Document. Not for Publication

5.19.4. MarkerAssetType The MarkerAssetType element shall be derived from the GenericAssetType. It describes the content markers, e.g. FFOC, associated with a Sequence. An instance of the MarkerAssetType is the MainMarkers element. Just as for picture and sound assets, the marker asset has a timeline. The Offset of each Marker (see Figure 6) is the position from the start of the timeline and the Intrinsic Duration of the timeline shall correspond to the Offset of the last Marker.

5.19.4.1. MarkerList The MarkerList element shall contain a list of Marker elements. Marker Labels may be repeated as there will likely be multiple instances of several types of content segments throughout a given program (i.e. production logos, commercial blacks, etc). The members of the Marker element are defined in the following subsections.

5.19.4.2. Label The Label element shall contain a textual representation of the marker. An optional scope attribute with default URI value of “http://www.smpte- ra.org/schemas/TBD/CPL#standard-markers” determines the permissible values of the element.

Table 11: Examples of Marker Labels

Marker Description FFBT First Frame of Bars and Tone FFCB First Frame of Commercial Blacks FFCL First Fame of Company/Production Logo FFDL First Frame of Distribution Logo FFEC First Frame of End Credits. First displayable frame of content that contains any intensity of the End Credits (a non zero alpha value), which appear at the end of a feature. FFHS First Frame of Head Slate FFMC First displayable frame of content that contains any intensity of moving, rolling or scrolling credits (a non-zero alpha value), which appear at the end of the feature. FFOB First Frame of Ratings Band. First displayable frame of content of the Rating Band, which is usually a slate at the beginning of a feature. FFOC First Frame of Composition. The first frame of a composition that is intended for display. FFOI First Frame of Intermission. FFSP First Frame of Digital Sync Pop FFTC First Frame of Title Credits. First displayable frame of content that contains any intensity of the Title Credits (a non zero alpha value), which appear at the beginning of a feature. FFTS First Frame of Tail Slate FTXC First Frame of Textless Title Credits FTXE First Frame of Textless End Credits FTXM First Frame of Textless Material Segment LFBT Last Frame of Bars and Tone

72 Private Draft Document. Not for Publication

LFCB Last Frame of Commercial Blacks LFCL Last Frame of Company/Production Logo LFDL Last Frame of Distribution Logo LFEC Last Frame of End Credits. Last displayable frame of content that contains any intensity of the End Credits (a non zero alpha value), which appear at the end of a feature. LFHS Last Frame of Head Slate LFMC Last displayable frame of content that contains any intensity of moving, rolling or scrolling credits (a non-zero alpha value), which appear at the end of the feature. LFOB Last Frame of Ratings Band. Last displayable frame of content of the Rating Band, which is usually a slate at the beginning of a feature. LFOC Last Frame of Composition. The last frame of a composition that is intended for display. LFOI Last Frame of Intermission. LFSP Last Frame of Digital Sync Pop LFTC Last Frame of Title Credits. Last displayable frame of content that contains any intensity of the Title Credits (a non zero alpha value), which appear at the beginning of a feature. LFTS Last Frame of Tail Slate LTXC Last frame of Textless Title Credits LTXE Last Frame of Textless End Credits LTXM Last frame of Textless Material Segment

5.19.4.3. AnnotationText [optional] The AnnotationText element shall be a free-form, human-readable annotation associated with the marker. It meant strictly as a display hint to the user. The optional language attribute is a standard XML language code and indicates the text language of the content of the element. If the text language attribute is not present, the default value en shall be used. The field does not require a size limit, but for practical purposes, the field must allow at least 2048 characters.

5.19.4.4. Offset The Offset element defines the absolute position of the marker from the start of the marker asset. It shall be represented as integer number of 1/EditRate units, as inherited from the GenericAssetType.

5.19.4.5. SegmentUUID The SegmentUUID element uniquely identifies the segment. This will be needed as you may have multiple instances of a particular marker type.

5.19.4.6. SegmentReplaceUUID [optional] The SegmentReplaceUUID element identifies the SegmentUUID of an element that can be replaced by this SegmentUUID.

5.19.5. PictureTrackFileAssetType The PictureTrackFileAssetType, shall be derived from the TrackFileAssetType. It describes a Track File containing picture essence. An instance of the PictureTrackFileAssetType is the MainPicture element

73 Private Draft Document. Not for Publication

The elements defined below replicate values contained in the underlying track file and shall remain consistent with the content of the underlying track file at all times. They are included in the Composition PlayList to alleviate the need for software to access and parse individual track files in order to display the values to users. In the event an inconsistency exists, the values contained in the underlying track file shall take precedence.

5.19.5.1. PicturePixelMatrix The PicturePixelMatrix element shall contain the horizontal and vertical pixel count of the underlying picture track file. As described above, it is included in the Composition PlayList for convenience only. It shall be encoded as a rational number of frames per second.

5.19.5.2. PictureBitDepth The PictureBitDepth element shall contain the number of bits assigned to each picture component of the underlying picture track file. As described above, it is included in the Composition PlayList for convenience only. It shall be encoded as a rational number of frames per second.

5.19.5.3. PictureColorEncoding The PictureColorEncoding element shall contain the identification of the color encoding method of the underlying picture track file. As described above, it is included in the Composition PlayList for convenience only. It shall be encoded as a rational number of frames per second.

5.19.5.4. PictureFrameRate The PictureFrameRate element shall contain the frame rate of the underlying picture track file. As described above, it is included in the Composition PlayList for convenience only. It shall be encoded as a rational number of frames per second.

5.19.5.5. ImageAspectRatio The ImageAspectRatio element shall define the aspect ratio of the picture information contained in the underlying picture track file. As described above, it is included in the Composition PlayList for convenience. It is represented as a rational number and applications may convert the ratio to a decimal number to match current practice.

5.19.6. PictureSpatialMetadataTrackFileAssetType The PictureSpatialMetadataTrackFileAssetType, shall be derived from the TrackFileAssetType. It describes a Track File containing picture spatial metadata. An instance of the PictureSpatialMetadataTrackFileAssetType is the MainPictureMetadata element. The elements defined below replicate values contained in the underlying track file and shall remain consistent with the content of the underlying track file at all times. They are included in the Composition PlayList to alleviate the need for software to access and parse individual track files in order to display the values to users. In the event an inconsistency exists, the values contained in the underlying track file shall take precedence.

5.19.6.1. StandardsBody [optional] The StandardsBody element shall contain a URI [RFC 2396] that uniquely identifies the Standards Body issuing the Standard.

5.19.6.2. Label [optional] The Label element shall contain a textual representation of the Standards Body, which may be displayed to the user. For each issuing Standards Body, and hence unique URI, there are a number of permissible Label values. An example of such is shown below in Table 12: Examples

74 Private Draft Document. Not for Publication

of Standards Body Labels. The specification of this mapping is beyond the scope of this document.

Table 12: Examples of Standards Body Labels

Label URI SMPTE STD xxx-y http://www.smpte.org/date W3 http://www.w3.org/2001 ISO http://www.ISO.org/2009

5.19.7. SoundTrackFileAssetType The SoundTrackFileAssetType is derived from TrackFileAssetType. It describes a Track File containing sound essence. An instance of the SoundTrackFileAssetType is the MainSound element. The elements defined below replicate values contained in the underlying track file and shall remain consistent with the content of the underlying track file at all times. They are included in the Composition PlayList to alleviate the need for software to access and parse individual track files when scheduling content. In the event an inconsistency exists, the values contained in the underlying track file shall take precedence.

5.19.7.1. SoundSamplingRate The SoundSamplingRate element shall contain the frequency of the sampling rate of the underlying sound track file. As described above, it is included in the Composition PlayList for convenience only. It shall be encoded as a rational number of kHz.

5.19.7.2. SoundBitDepth The SoundBitDepth element shall contain the number of the bits assigned to each sample of audio of the underlying sound track file. As described above, it is included in the Composition PlayList for convenience only. It shall be encoded as a integer number of bits.

5.19.7.3. SoundChannelNumber The SoundChannelNumber element shall contain the number of the audio channels of the underlying sound track file. As described above, it is included in the Composition PlayList for convenience only. It shall be encoded as an integer number of channels.

5.19.7.4. ChannelConfiguration The ChannelConfiguration element shall contain the channel number and label assigned to each channel of the underlying sound track file. As described above, it is included in the Composition PlayList for convenience only. It shall be encoded as a integer channel number along with its corresponding label.

5.19.7.5. Label The Label element shall contain a textual representation of the channel, which may be displayed to the user. For each channel number there are a number of permissible Label values. An example of such is shown in Table 13: Examples Channel Labels. The specification of this mapping is beyond the scope of this document.

75 Private Draft Document. Not for Publication

Table 13: Examples Channel Labels

Channel Label Description No. 1 L Left 2 R Right 3 C Center 4 LFE Low Frequency Effects 5 Ls Left Surround 6 Rs Right Surround 7 Cs Center Surround 8 Lt Left total (matrix encoded) 9 Rt Right total (matrix encoded) 10 Mono Monaural 11 HI Hearing Impaired (Dynamic compressed dialog centric mix) 12 VI-N Visually Impaired Narration (Descriptive Narration)

5.19.7.6. Language [optional] The Language element shall reflect the primary spoken language of the sound material of the underlying sound track file. The element value is encoded as an xs:language language code and indicates the spoken language of the content. The absence of the element shall indicate that no spoken language is associated with the asset.

76 Private Draft Document. Not for Publication

5.19.8. SubtitleTrackFileAssetType The SubtitleTrackFileAssetType element is derived from TrackFileAssetType. It describes the subtitle material associated with the Sequence. An instance of the SubtitleTrackFileAssetType is the MainSubtitle element. The elements defined below replicate values contained in the underlying track file and shall remain consistent with the content of the underlying track file at all times. They are included in the Composition PlayList to alleviate the need for theater management software to access and parse individual track files when scheduling content. In the event an inconsistency exists, the values contained in the underlying track file shall take precedence.

5.19.8.1. Language [optional] The Language element shall reflect the primary text language used by the Subtitle essence. The absence of the element shall indicate that no primary text language is associated with the asset. It is represented as an xs:language value.

5.19.9. CaptionTrackFileAssetType The CaptionTrackFileAssetType element is derived from TrackFileAssetType. It describes the subtitle material associated with the Sequence. An instance of the CaptionTrackFileAssetType is the MainCaption element. The elements defined below replicate values contained in the underlying track file and shall remain consistent with the content of the underlying track file at all times. They are included in the Composition PlayList to alleviate the need for theater management software to access and parse individual track files when scheduling content. In the event an inconsistency exists, the values contained in the underlying track file shall take precedence.

5.19.9.1. Language [optional] The Language element shall reflect the primary text language used by the Subtitle essence. The absence of the element shall indicate that no primary text language is associated with the asset. It is represented as an xs:language value.

77 Private Draft Document. Not for Publication

6. Output Profile List

6.1. Introduction

6.1.1. Output Profile List Definition The Output Profile (hereby called “OPL”) is an optional set of information which may be used in conjunction with an IMP to specify particular content provider output preferences. In a typical workflow, it would be specifically designed to read and point to a particular CPL within a particular IMP. It would then be able to facilitate the passage of the composition’s content to a downstream device. In doing so, it would also pass the content provider’s preferences (if included) through to a downstream device in order to facilitate the execution of these preferences in the downstream device (i.e. automation). An OPL is not required for a CPL to function. In the absence of an OPL, a CPL is used as the default input to the playout engine and the IMP output would be an exact reflection of the content within that composition with no processing applied. (Note that the simple unwrapping and decoding of the IMF track files that occurs in the downstream IMF compliant device is not considered processing). An OPL may be included in an IMP as a convenient way to deliver these preferences along with the other components contained within the IMP, but is not required to be part of an IMP. It may be delivered separately from an IMP via another means, such as email, FTP, etc.

6.1.2. OPL Scope At a minimum, an OPL would function as a pointer to a CPL, which in turn points to the IMF track files needed to create the composition that the CPL has specified. In this minimum case, it would pass the content through with no processing. If desired, the OPL may also contain preferences for how a downstream device is to manipulate the data received from the IMF. These parameters are solution agnostic. These parameters are a specified sequence of callouts which may invoke predefined processes, or pass specific parameters to an external system, the order specified in the sequence. They are not the code required to execute the process itself.

6.1.3. Stereoscopic Support • Might need to specify which eye should be played back first • Allow playback of just one eye, if the Mono Playback tag in the Composition allows it. The OPL would have to identify which eye to playback • Possibly allow interleaved frame sequential playback and frame-compatible stereo playback

6.1.4. OPL Examples OPL examples include the following:

6.1.4.1. Image Output Parameters. These would provide values on the desired Frame rate, color encoding, Bit Size, Frame size, Codec to encode to etc.

6.1.4.2. Audio Output Parameters These would provide values on the desired Sample freq, Audio Config, codec to encode to, etc.

6.1.4.3. Encoding Format These would provide values/instruction on the desired Sample freq, Audio Config, codec to encode to, etc.

78 Private Draft Document. Not for Publication

6.1.4.4. Color Transformations. These would provide values/instructions on the desired colorspace conversions, and how to carry them out.

6.1.4.5. “Pre-Processing” parameters These are parameters such as image translations (e.g. center cut, pan/scan, color space conversion parameters) or audio translations (e.g. word clock settings, speed conversion parameters) and

6.1.5. OPL Relationship Overview The following diagram, Figure 15 - Output Profile List Relationship Overview, illustrates the relationship of the Output Profile List to Encoding Parameters, Output Parameters, Pre-Process Parameters and CPLs.

Figure 15 - Output Profile List Relationship Overview

6.2. Output Profile List Overview

6.2.1. Functional Framework For the purpose of documenting the specific requirements for a Interoperable Master Format Output Profile List • TBD – The specifications for the list of elements included in the Output Profile List

79 Private Draft Document. Not for Publication

6.2.2. Output Profile List Fundamental Requirements

6.2.2.1. Introduction

6.2.2.2. Open Standard

6.2.2.3. Interoperable The blank shall have an open framework that accommodates compressed, encrypted files as well as all other files used in Digital Video.

6.2.2.4. Scalable The blank shall accommodate any number of Essence or Metadata components. There is no limit on the number of files included in the package or the size of the files.

6.2.2.5. Extensible The blank shall allow for blank.

6.2.2.6. Human Readable Metadata Human readable Metadata shall be in English (default) but can be provided in other languages as well.

6.2.3. Output Profile List Concepts

6.3. Output Profile List Format

6.3.1. Introduction

6.3.2. File Format

6.3.3. ImageOutputFormat The ImageOutputFormat element defines the output format of the MainPicture of material referred to by the Composition PlayList. It is meant to be both human and machine-readable.

6.3.3.1. OutputPixelMatrix The OutputPixelMatrix element shall contain the horizontal and vertical pixel counts of the MainPicture element. This is informative information to allow for humans and machine to read and if necessary manipulate the MainPicture asset of the Composition to conform to the OutputPixelMatrix requirements.

6.3.3.2. OutputColorEncoding The OutputColorEncoding element shall contain information of the color encoding parameters of the MainPicture element. This is informative information to allow for humans and machine to read and if necessary manipulate the MainPicture asset of the Composition to conform the color information to the OutputColorEncoding specification.

6.3.3.3. Stereoscopic Metadata Required Fields Specific metadata shall be required for proper identification of stereoscopic content. At the time of this writing, however, the actual metadata fields cannot be defined. Instead, the data elements shown in Table 3 shall be the minimum amount of information supported by the IMF for stereoscopic content. These data elements shall be converted into specific metadata fields once the specification is complete.

80 Private Draft Document. Not for Publication

6.3.3.4. StandardsBody [optional] The StandardsBody element shall contain a URI [RFC 2396] that uniquely identifies the Standards Body issuing the Standard.

6.3.3.5. Label [optional] The Label element shall contain a textual representation of the Standards Body, which may be displayed to the user. For each issuing Standards Body, and hence unique URI, there are a number of permissible Label values. An example of such is shown below in Table 14: Examples of Label Elements. The specification of this mapping is beyond the scope of this document.

Table 14: Examples of Label Elements

Label URI SMPTE STD xxx-y http://www.smpte.org/date W3 http://www.w3.org/2001 ISO http://www.ISO.org/2009

6.3.3.6. DisplayAspectRatio The DisplayAspectRatio element shall contain the ratio intended display screen of the MainPicture element. (ex. 4:3 or 16:9) This is informative information to allow for humans and machine to read and if necessary manipulate the MainPicture asset of the Composition to conform to the DisplayAspectRatio.

6.3.3.7. OutputPictureOverlay [optional] The OutputPictureOverlay element shall contain information about the auxiliary overlay image elements for the MainPicture element. This is informative information to allow for humans and machine to read and if necessary manipulate the MainPicture asset of the Composition to conform the aspects of the required OutputPictureOverlay.

6.3.3.8. TimeCodeWindow [optional] The TimeCodeWindow element shall contain a designation that uniquely identifies the Timecode type and spatial location for the OutputPictureOverlay.

6.3.3.9. Spoiler [optional] The Spoiler element shall contain a textual representation and spatial location for the OutputPictureOverlay.

6.3.4. OutputAudioFormat The OutputAudioFormat element defines the format of the MainSound of material referred to by the Composition PlayList. It is meant to be both human and machine-readable.

6.3.4.1. AudioCompressionStandard [optional] The AudioCompressionStandard element shall contain information of the compression encoding parameters of the MainSound element. This is informative information to allow for humans and machine to read and if necessary manipulate the MainSound asset of the Composition to conform the sound information to the Compression specification.

6.3.4.2. StandardsBody [optional] The StandardsBody element shall contain a URI [RFC 2396] that uniquely identifies the Standards Body issuing the Standard.

6.3.4.3. Label [optional]

81 Private Draft Document. Not for Publication

The Label element shall contain a textual representation of the Standards Body, which may be displayed to the user. For each issuing Standards Body, and hence unique URI, there are a number of permissible Label values. An example of such is shown below in Table 4. The specification of this mapping is beyond the scope of this document.

6.3.5. OutputSpeedOffset [optional] The OutputSpeedOffset element shall contain the rational percentage offset intended to be applied to the Composition. This is informative information to allow for humans and machine to read and if necessary manipulate the Composition to conform to the desired running time of the Composition. • • •

6.3.6. Open/Discussion Items

1. Should we consider “IMF Compatible Client Profiles”, which convey to a downstream device the desired output for a particular client and what the device settings should be to attain this. 2. Should we consider standardizing a standard sequencing language. So that everybody agrees on LancosResize, etc. 3. Should we consider using a standardized frame server’s syntax, e.g. AviSynth, VirtualDub 4. Permit callout to “user definable functions”, i.e. in the encoding format a callout to a specific encoder profile, stored as “NTSC_4x3_CenterCut” 5. Discussions ongoing with the Audio group.

82 Private Draft Document. Not for Publication

7. Packaging

7.1. Introduction Packaging is defined as the process of combing elements to prepare them for shipping. This generally will consist of one or more compositions along with two more elements which will be specified in the section of the document. Those elements are the Packing List and the Asset Map.

7.2. Packaging System Overview

7.2.1. Functional Framework For the purpose of documenting the specific requirements for a Interoperable Master Format Packaging system, it is helpful to divide the system into a set of components. The performance requirements for each of these components will be described in the following sections: • Distribution Package – The physical files and the list describing the files and providing a means for authentication as delivered in a Distribution Package (from Distributor to Exhibitor). • Packing List – The specifications for the list of elements included in the Package. • Asset Map – The specification for the directory structure and physical location of those directories and files. • Security –The specifications for the security requirements for the Package.

7.2.2. Packaging Fundamental Requirements

7.2.2.1. Introduction Digital Video presents a challenge to create a versatile packaging system. Throughout this system, some basic requirements are needed and are stated below.

7.2.2.2. Open Standard The Packaging standard shall be based upon an open worldwide standard. This format is encouraged to be a license-free technology. It shall be a complete standard that equipment receiving a compliant package can process and interpret unambiguously.

7.2.2.3. Interoperable The Packaging format shall have an open framework that accommodates compressed, encrypted files as well as all other files used in Digital Video.

7.2.2.4. Scalable The Packaging format shall accommodate any number of Essence or Metadata components. There is no limit on the number of files included in the package or the size of the files.

7.2.2.5. Extensible The Packaging format shall allow for new Digital Video features (compositions) to be contained within the package.

7.2.2.6. Synchronization The Packaging format shall provide support for synchronization of the Essence and Metadata elements.

83 Private Draft Document. Not for Publication

7.2.2.7. Human Readable Metadata Human readable Metadata shall be in English (default) but can be provided in other languages as well.

7.2.3. Packaging Concepts It is common practice to divide content into reels of between 10 and 20 minutes in length for post- production, and distribution. These reels are then assembled, together with other content, to create the modern play lists that are used in play out devices. This concept of reels shall be supported with Digital Video content.

7.3. Distribution Package

7.3.1. Introduction The Distribution Package has two major components. One is the Package itself, which includes all of the Track Files and the other is the Packing List. These are all of the elements required for a complete delivery to the theater Digital Cinema system. It is technically possible to include engagement-specific licenses and keying information in a Package in the form of opaque Metadata, but this is not recommended for general usage. A Distribution Package can contain a complete feature film composition or a set of compositions. Alternatively, it can carry as little as a single file to update one reel’s subtitle or sound track.

7.3.2. Distribution Package

7.3.2.1. General The Distribution Package shall contain a Packing List and one or more IMF track files. The following requirements apply.

7.3.2.2. Packing for Transport The distribution method shall allow a DCP to be transported via physical media, satellite or network.

7.3.2.3. Flattened Packages TBD

7.3.2.4. Security The distribution method shall provide digital signatures to allow the recipient to verify integrity of the Packing List and the enclosed files. Preparation of Packing Lists is a distribution fulfillment or transport function. Therefore, the digital signatures come from these entities, not the content-owner who mastered the files. Packing List security functions do not verify the authenticity of the content, only the intent of the delivery agent. (Content authenticity is verified through PlayList signatures and digital licenses.)

7.3.3. Packing List

7.3.3.1. File Format The Packing List shall use XML data format with XML Signature (digital signature). It should be in English (default) but can be provided in other languages as well.

7.3.3.2. Fields The following data fields are required to be included in the Packing List for each file in the Package:

84 Private Draft Document. Not for Publication

• Unique identification of each file included in the DCP is encoded as urn:UUID. • Annotation Text parameter (optional), if present, is a free-form, human readable annotation associated with the asset. It is meant strictly as a displayable guidance for the user. • File Integrity check (hash) for each file in the distribution package • Size of the file in bytes • Type (e.g., Packing List, PlayList, Track File, opaque security data) • Original File Name The following fields are required to be included in the digital signature section of the Packing List: • Signer parameter uniquely identifies the entity, and hence public key that digitally signs the packing list. • Signature parameter contains a digital signature authenticating the packing list.

7.3.4. Asset Map

7.3.4.1. File Format The Asset Map shall use XML data format with XML Signature (digital signature). It should be in English (default) but can be provided in other languages as well.

7.3.4.2. Fields The following data fields, given in Table 15, are required to be included in the Asset Map for each file in the Package:

Table 15 - Asset Map Data Fields

Field Name Description The MediaFileSize element shall be used to define the MediaF complete size in bytes of the Composition at the instance ileSiz when Composition PlayList was issued. It may be displayed to e the user. It shall be encoded as an xs:GB.

85 Private Draft Document. Not for Publication

8. Annex

8.1. Example Workflows The following diagrams show examples of future state workflows using the concept of the IMF. This IMF and the file-based workflow will enable Mastering & Distribution Servicing to service both existing and emerging distribution channels. It must be stressed that this is only an example workflow. As one becomes more familiar with the concepts of the IMF one can imagine many different workflows using the IMF. It is not the intention of this document to indentify all of these possibilities.

86 Private Draft Document. Not for Publication

87 Private Draft Document. Not for Publication

Figure 16 - Future State - Mastering & Distribution Servicing

88 Private Draft Document. Not for Publication

89 Private Draft Document. Not for Publication

Figure 17 - Example IMF Workflow

8.2. Editorial Systems

8.2.1. Editorial System Fundamental Requirements Digital Video Editorial Systems have some basic requirements that are stated below.

8.2.1.1. Easy Assembly of Content The system shall provide a graphical user interface (GUI) interface for the assembly of content with relative ease in a timely matter.

8.2.1.2. Movement of Content The system shall provide for movement of content within a facility. TBD

8.2.1.3. Ease of Operation The Editorial System is encouraged to require only a reasonable level of computer operation knowledge or training for the basic operation of the system. The computer-based user interfaces are required to be simple and intuitive.

8.2.1.4. Multiple Systems Editing shall be supported across many different systems that are common in editorial processes today.

8.3. Playout Systems

8.3.1. Playout System Fundamental Requirements IMF Playout Systems have some basic requirements that are stated below.

90 Private Draft Document. Not for Publication

8.3.1.1. Easy Assembly of Content The system shall provide a graphical user interface (GUI) interface for the assembly of content with relative ease in a timely matter.

8.3.1.2. Movement of Content The system shall provide for movement of content within a facility. TBD

8.4. Transcoding Systems

8.4.1. Transcoding System Fundamental Requirements IMF Transcoding Systems have some basic requirements that are stated below.

8.4.1.1. Easy Assembly of Content The system shall provide a graphical user interface (GUI) interface for the assembly of content with relative ease in a timely matter.

8.4.1.2. Movement of Content The system is required to provide for movement of content within a facility. TBD

8.5. Asset Management Interface Asset Management in the context of the IMF refers to the system interfaces required to allow for automation of repetitive tasks and tracking of assets for such things as location and version control. This section will concentrate on the interfaces and not all of the different applications required or envisioned for a Asset Management system. Below you should find the requirements and specifications for the interfaces required.

8.5.1. Interface Fundamental Requirements

8.5.1.1. Introduction Interoperable Master Format presents unique opportunities for the automation of content and asset tracking.

8.5.1.2. API The Application Programming Interface for the IMF……TBD.

8.5.1.3. Protocols The Protocols to be used for the interfaces shall be….TBD.

8.5.2. Asset Management Interface Fundamental Concepts The interfacing to the Interoperable Master Format can be accomplished in many different ways. TBD

8.5.3. Application Programming Interface TBD

8.5.4. Protocols

91 Private Draft Document. Not for Publication

9. GLOSSARY OF TERMS

Table 16: Glossary of Terms

Term Description

AES Acronym for Advanced Encryption Standard

AES Acronym for Audio Engineering Society

ANSI Acronym for American National Standards Institute

API Acronym for Application Programming Interface

Broadcast Digital Audio file format developed and standardized by the EBU; recommendation ITU-R Wave BR.1352-3 (2007)

Burned-In Where visual data that is normally supplemental to a motion picture is irrevocably added to the motion-picture image by compositing the data with the underlying image

Captions Text that is a representation, often in the same language, of dialog and audio events occurring during scenes of a motion picture. (Generally associated with a dialog and audio event translation for the deaf and hard of hearing.)

CBC Acronym for Cipher Block Chaining mode

CBR Acronym for Constant Bit Rate for image compression

CIE Acronym for International Commission on Illumination (Commission Internationale de l’Eclairage)

Closed Referring to visual data that is supplemental to a motion picture being displayed off-screen

Composition A motion picture, or a trailer, or an advertisement, etc. Composition consists of a Metadata Composition PlayList along with the Essence and other Metadata track files that define the work.

CPL Acronym for Composition PlayList, the definitive PlayList for specifying how a Composition is played and what track files are required

DCP Acronym for a Digital Cinema Package, the set of files that are the result of the encoding, encryption and packaging process

Distribution The collection of files delivered by the distributor to the exhibitor. A Distribution Package may Package contain pieces of a Composition or several compositions, a complete Composition, replacement/update files, etc.

DM Acronym for Descriptive Metadata

DRM Acronym for Digital Rights Management

DSM Acronym for Digital Source Master, a digital master created in post-production from which different versions and duplication masters may be created.

DVD Acronym for Digital Versatile Disc

EBU Acronym for European Broadcast Union (a standardization organization) e.g. Abbreviation for the Latin phrase exempli gratia, meaning “for example”

92 Private Draft Document. Not for Publication

Term Description

Essence Image, audio, subtitles, or any content that is presented to a human being in a presentation

ETC Acronym for Entertainment Technology Center

FIPS Acronym for Federal Information Processing Standards

FM Acronym for Forensic Marking

Forensic Data embedded in essence to provide forensic tracking information in the event of content Marking theft. Such marking can be visible or non-visible, audible or non-audible.

FPS Acronym for Frames per Second

HD Acronym for High Definition

HI Acronym for Hearing Impaired

HMAC Acronym for Hashing Message Authentication Codes

Hz Abbreviation for Hertz, a unit of frequency expressed in cycles per second i.e. Abbreviation for the Latin phrase id est, meaning “that is”

IEC Acronym for International Electrotechnical Commission

IP Acronym for Intellectual Property

IMF Acronym for Interoperable Master Format

ISO Acronym for International Organization for Standardization

ITU Acronym for International Telecommunications Union

JPEG Acronym for Joint Photographic Experts Group, the international body that developed the JPEG 2000 standard

Key Electronic data used to allow data encryption and decryption

Key Epoch The period of time during which a given decryption key is valid. The key epoch defines a minimum practical time period for use of encrypted track files. kHz Acronym for kilo Hertz, one thousand cycles per second, a measure of frequency

KLV Acronym for Key Length Value – used by the MXF to parse binary data

Localizations Text on screen representing either non-source language dialog or information pertinent to the story such as time and place. This is specifically the text that is absent in text-less masters. This text is localized or translated for various markets either through subtitles or entire image replacement.

LTC Acronym for Linear Time Code

LUT Acronym for Look Up Table

Main Titles A credit sequence generally shown near the beginning of a motion picture

Metadata Data about data or data describing other data. Information that is considered ancillary to or otherwise directly complementary to essence. Information that is useful or of value when associated with the essence being provided.

93 Private Draft Document. Not for Publication

Term Description

MTBF Acronym for Mean Time Between Failure

MXF Acronym for Material eXchange Format

NDF Acronym for Non Drop Frame (Timecode)

NTSC Acronym for National Television System Committee, which developed the NTSC television broadcasting standard

Open Referring to visual data that is supplemental to a motion picture being displayed on-screen

Operational An MXF construct to define file structures Pattern

Packing List A list describing the files and providing a means for authentication of the files as delivered in a package

PAL Acronym for Phase Alternation by Line, a television broadcasting standard.

PlayList Conceptually, the format and structure of the various lists used to define the playback of content.

PNG Acronym for Portable Network Graphics, an extensible file format for the lossless, portable, well-compressed storage of raster images defined by the PNG Development Group.

QC Acronym for Quality Control

RAND Acronym reasonable and non-discriminatory

Reel A conceptual period of time having a specific duration of generally 10 to 20 minutes. Used primarily in feature film production.

Renewable A software component is renewable if it can be remotely, smoothly and possibly automatically upgraded or replaced without significantly disturbing system operations. A system shutdown and normal restart is acceptable, provided that after the restart, the system can be operated as before.

Replaceable A component is said to be replaceable if it can be upgraded or replaced without significantly disturbing system operations. A system shutdown and restart is acceptable, provided that after the replacement, the system can be operated as before.

SD Acronym for Standard Definition

SHA1 Acronym for Secure Hashing Algorithm 1

SMPTE Acronym for Society of Motion Picture and Television Engineers

Subpicture A multiple-image file format for the transport of visual data supplemental to a motion picture that is intended only for graphic overlay with the main image output of a digital projector

Subtitle Text that is a representation, in a different language, of dialog occurring during scenes of a motion picture. Generally associated with dialog translation for localization of a motion picture in a particular territory.

TCP/IP Acronym for Transmission Control Protocol / Internet Protocol

TDES or 3DES Acronym for Triple Data Encryption Standard. TDES or 3DES was adopted as a federal standard in 1998 [FIPS (46-3) and ANSI standard X9.32]

Track File The smallest element of a package that can be managed or replaced as a distinct asset. A

94 Private Draft Document. Not for Publication

Term Description

track file may contain Essence and/or Metadata, and its duration matches an associated Reel.

UDP Acronym for User Datagram Protocol

UL Acronym for Universal Label used in MXF

Unicode The Universal Multiple-Octet Coded Character set, the [ISO/IEC 10646:2003] standard that defines a single code for representation, interchange, processing, storage, entry and presentation of the written form of the world’s major languages urn Acronym for uniform resource name

USB Acronym for Universal Serial Bus, standardized serial communications connection found on computers

UTC Acronym for Universal Coordinated Time

UUID Acronym for Universal Unique IDentifier

VFX Acronym for Visual Effects

VI Acronym for Visually Impaired

VOD Acronym for Video on Demand

Visually An image compression method is considered visually lossless when the processed image is Lossless indistinguishable from the unprocessed image under normal theatrical viewing conditions.

VPN Acronym for Virtual Private Network.

VBR Acronym for Variable Bit Rate

W3C Acronym for The World Wide Web Consortium, the organization responsible for the development of Internet protocols

XML Acronym for eXtensible Markup Language

95 Private Draft Document. Not for Publication

10. Composition Play List (CPL) Example Composition PlayLists (CPLs) are scripts that link the IMF Track Files together into synchronized pieces of content. CPLs are written in XML.

Composition Play List (CPL) Example urn:uuid:cde69c7b-a055-4373-84f5-e8ffea82f345 Wall-E_FTR_S_EN-XX_US-G_51_2K_DI_20080529_PX 2008-05-30T03:21:28-07:00 Pixar Animation Studios Pixit 0.2.45 Wall-E_FTR_S_EN-XX_US-G_51_2K_DI_20080529_PX feature urn:uuid:cde69c7b-a055-4373-84f5-e8ffea82f345 US English 2.35 BluRay Master HDCamSR(1.85 Picture, 5.1 Sound, Dubbed) French-Directors Cut 23.98 23.98 http://www.ISO.org/2009 48 5.1 24 02:10:15:10 EN US http://www.mpaa.org/2003-ratings urn:uuid:f63fdd78-39b6-413b-a9c5-7231446c6463 PartOne urn:uuid:54e8ef95-26be-854e-bb2e-c95812107c91 PartOne of Show 23.98 urn:uuid:8116c6ef-d870-4038-9f0a-46571d769858 7728 180 2548 urn:uuid:035b894b-b82c-49f8-abbc-87230e526231 2/B+hACcPMSFRl9WPCjEIoJoeRc= 1920 1080 10 422 1.78 urn:uuid:8116c6ef-d870-4038-9f0a-56571d769858 9728 80 1548

96 Private Draft Document. Not for Publication

Composition Play List (CPL) Example urn:uuid:035b894b-b82c-49f8-abbc-97230e526231 2/B+hACcPMSFRl9WPCjEIoJoeRc= 1920 1080 10 422 1.78 urn:uuid:8116c6ef-d870-4038-9f0a-66571d769858 7728 180 3504 urn:uuid:035b894b-b82c-49f8-abbc-07230e526231 2/B+hACcPMSFRl9WPCjEIoJoeRc= 1920 1080 10 422 1.78 urn:uuid:54e8ef95-26be-854e-bb2e-c95812107c91 24 1 urn:uuid:8116c6ef-d870-4038-9f0a-56571d769858 9728 80 1548 urn:uuid:035b894b-b82c-49f8-abbc-97230e526231 2/B+hACcPMSFRl9WPCjEIoJoeRc= 48000 24 6 urn:uuid:ba79548c-852c-c04b-af1a-9cde6999b846 js/VpADyL4sEQF+YarPI61VJ1pw= urn:uuid:8116c6ef-d870-4038-9f0a-56571d769858 80 2/B+hACcPMSFRl9WPCjEIoJoeRc= urn:uuid:035b894b-b82c-49f8-abbc-97230e526231 1548 9728 http://www.ISO.org/2009 urn:uuid:8116c6ef-d870-4038-9f0a-56571d769858 9728 80 1548 urn:uuid:035b894b-b82c-49f8-abbc-97230e526231 2/B+hACcPMSFRl9WPCjEIoJoeRc= urn:uuid:8116c6ef-d870-4038-9f0a-56571d769858

97 Private Draft Document. Not for Publication

Composition Play List (CPL) Example 9728 80 1548 urn:uuid:035b894b-b82c-49f8-abbc-97230e526231 2/B+hACcPMSFRl9WPCjEIoJoeRc= urn:uuid:3c59dc3a-a247-475c-adbd-3ffd64dd5890 urn:uuid:b187df09-0e92-4b0f-9e51-c4313e32c8ee 23.98 urn:uuid:8116c6ef-d870-4038-9f0a-46571d769858 7728 180 2548 urn:uuid:035b894b-b82c-49f8-abbc-87230e526231 2/B+hACcPMSFRl9WPCjEIoJoeRc= 1920 1080 10 422 1.78 urn:uuid:54e8ef95-26be-854e-bb2e-c95812107c91 24 1 urn:uuid:8116c6ef-d870-4038-9f0a-56571d769858 9728 80 1548 urn:uuid:035b894b-b82c-49f8-abbc-97230e526231 2/B+hACcPMSFRl9WPCjEIoJoeRc= 48000 24 6 urn:uuid:ba79548c-852c-c04b-af1a-9cde6999b846 js/VpADyL4sEQF+YarPI61VJ1pw= dnQualifier=Qf4HGTjMqjGmHQOhA\ +xOqtQbSGY=,CN=.DC.DMS.DC2.INTEROP,OU=DC.DOREMILABS.COM,O=DC2.INTEROP.DOREMILABS.COM 4 dnQualifier=aen9g2oYiD4\+ouix29qpzVOJ7rk=,CN=ME CS SM.DMSJP2K- 70002.DC.DC2.INTEROP,OU=DC.DOREMILABS.COM,O=DC2.INTEROP.DOREMILABS.COM

98 Private Draft Document. Not for Publication

Composition Play List (CPL) Example FC2hBD4lwXDqSGk97QrrAb69FWA= pdvBqADkTjzVqbLkO2cbLSn2NQ0mD+keDVr6l2MbApDpA+e+ RGLkr0vXUpul86XpP7zg+X5q711ju3u+tuJIMnnl5TAQ6VnRb71WQuRhtI5M2dRe2Mm+wSsm 95fNVAaTDAWJFwW6dS1iJ6dR9a4YAGOVJEpA8WJCl1zKTe2FCZAYlSZvgMQdTV+7cMeYB086 XO9VkBZ2lG+2Ph5wOH/oAJp28w8BkXgyELJiSAsiDmY0aA2ua9aHbocDv5C3bRfeslr85LyL KCt8b9B3xx+USiZ26FY1vo40N0eLNcQzjYOIaDZe/tyKMwhKKD24hu8wxwepeIHrZTjyvwgS jY66nw== dnQualifier=Qf4HGTjMqjGmHQOhA\ +xOqtQbSGY=,CN=.DC.DMS.DC2.INTEROP,OU=DC.DOREMILABS.COM,O=DC2.INTEROP.DOREMILABS.COM 4 MIIEjDCCA3SgAwIBAgIBBDANBgkqhkiG9w0BAQUFADCBhjEj MCEGA1UEChMaREMyLklOVEVST1AuRE9SRU1JTEFCUy5DT00xGjAYBgNVBAsTEURDLkRPUkVNSUxB QlMuQ09NMRwwGgYDVQQDExMuREMuRE1TLkRDMi5JTlRFUk9QMSUwIwYDVQQuExxRZjRIR1RqTXFq R21IUU9oQSt4T3F0UWJTR1k9MB4XDTA3MDEwMTAwMDAwMFoXDTI1MTIzMTIzNTk1OVowgZgxIzAh BgNVBAoTGkRDMi5JTlRFUk9QLkRPUkVNSUxBQlMuQ09NMRowGAYDVQQLExFEQy5ET1JFTUlMQUJT LkNPTTEuMCwGA1UEAxMlTUUgQ1MgU00uRE1TSlAySy03MDAwMi5EQy5EQzIuSU5URVJPUDElMCMG A1UELhMcYWVuOWcyb1lpRDQrb3VpeDI5cXB6Vk9KN3JrPTCCASIwDQYJKoZIhvcNAQEBBQADggEP +NZnn+yhj0qvA==

dnQualifier=10IWnvjB9MJSQEipA/m0zVdiUEQ=,CN=.DMS.DC2.INTEROP,OU=DC.DOREMILABS.COM,O=DC2.INTEROP.DOREMIL ABS.COM 2 MIIEgzCCA2ugAwIBAgIBAjANBgkqhkiG9w0BAQUFADCBgzEj MCEGA1UEChMaREMyLklOVEVST1AuRE9SRU1JTEFCUy5DT00xGjAYBgNVBAsTEURDLkRPUkVNSUxB QlMuQ09NMRkwFwYDVQQDExAuRE1TLkRDMqcXgPOS7ONYGY/A==

dnQualifier=k2\+ZCfLtrt7LaaHYAWzWbreV/ME=,CN=.PRODUCTS.DC2.INTEROP,OU=DC.DOREMILABS.COM,O=DC2.INTEROP.D OREMILABS.COM 4 MIIEgTCCA2mgAwIBAgIBBDANBgkqhkiG9w0BAQUFADCBiDEj MCEGA1UEChMaREMyLklOVEVST1AuRE9SRU1JTEFCUy5DT00xGjAYBgNVBAsTEURDLkRPUkVNSUxB QlMuQ09NMR4wHAYDVQQDExUuUF7WaY7Tq4yzl6nL28=

dnQualifier=K9jID4QvfqS1sS6IjiE8zEqSd/g=,CN=.ROOT.DC2.INTEROP,OU=DC.DOREMILABS.COM,O=DC2.INTEROP.DOREMILA BS.COM 2 MIIEgjCCA2qgAwIBAgIBAjANBgkqhkiG9w0BAQUFADCBhDEj MCEGA1UEChMaREMyLklOVEVST1AuRE9SRU1JTEFCUy5DT00xGjAYBgNVBAsTEURDLkRPUkVNSUxB QlMuQ09NMRowGAYDVQQDEx0CTLUW/J9um+eSrj

dnQualifier=K9jID4QvfqS1sS6IjiE8zEqSd/g=,CN=.ROOT.DC2.INTEROP,OU=DC.DOREMILABS.COM,O=DC2.INTEROP.DOREMILA BS.COM 1 MIIEfjCCA2agAwIBAgIBATANBgkqhkiG9w0BAQUFADCBhDEj

99 Private Draft Document. Not for Publication

Composition Play List (CPL) Example MCEGA1UEChMaREMyLklOVEVST1AuRE9SRU1JTEFCUy5DT00xGjAYBgNVBAsTEURDLkRPUkVNSUxB QlMuQ09NMRowGAYDVQf0AsEwAFJR/9crGysLGnJsI=

100 Private Draft Document. Not for Publication

11. Annex - Output Profile List XML Examples The following examples to illustrate various types of OPL

11.1. OPL Outline in pseudo-XML Here’s an outline of the OPL in pseudo-XML to illustrate some of the components:

#------begin pseudo-XML ------# # Below specifies Preamble, Note the reference to the CPL, compositionPlaylistReference urn:uuid:cde69c7b-a055-4373-84f5-e8ffea82f345 Avatar_FTR_S_EN-XX_US_PG13_51_2K_DI_20080529_PX 2008-05-30T03:21:28-07:00 urn:uuid:bbf69c7b-a055-4373-84f5-e8ffea123fa1 US English 2.35 50 Mbps Master EN US

# Below specifies desired Output Format for Video/Image # Below specifies desired Output Format for Audio # Below specifies the desired Encoding/Transcoding

101 Private Draft Document. Not for Publication

#------end of pseudo-XML ------#

11.2. A Simple OPL This is the minimal OPL that simply calls out a reference to a composition play list. It is up to the facility to apply any of the transformations desired to generate any desired output.

- # Below specifies Preamble, Note the reference to the CPL, compositionPlaylistReference urn:uuid:cde69c7b-a055-4373-84f5-e8ffea82f345 Avatar_FTR_S_EN-XX_US_PG13_51_2K_DI_20080529_PX 2008-05-30T03:21:28-07:00 - urn:uuid:bbf69c7b-a055-4373-84f5-e8ffea123fa1

102 Private Draft Document. Not for Publication

11.3. A Complex (level1) OPL This is the first level of complexity for an OPL. It calls out the desired Output format from a Image and Audio format. It is up to the “post” facility to determine the best way to deliver the desired output.

- # Below specifies Preamble, Note the reference to the CPL, compositionPlaylistReference urn:uuid:cde69c7b-a055-4373-84f5-e8ffea82f345 Avatar_FTR_S_EN-XX_US_PG13_51_2K_DI_20080529_PX 2008-05-30T03:21:28-07:00 - urn:uuid:bbf69c7b-a055-4373-84f5-e8ffea123fa1 US English 1.78 50 Mbps Master US - - - 50 8 - Rec-709 4:2:2 - MPEG-LA - I-FRAME ONLY HIGH_PROFILE HIGH_LEVEL - 1.78 - 0 0 1919 1079 + - 10 0 1010 1079 24 ...continued on the next page

103 Private Draft Document. Not for Publication

A Complex (level1) OPL (continued from the previous page)

# Below specifies desired Output Format for Audio - 48000 16 -

11.4. A Complex (level2) OPL This example illustrates another level of complexity for an OPL, creating an NTSC from an HD IMP.

- # Below specifies Preamble, Note the reference to the CPL, compositionPlaylistReference urn:uuid:abc69c7b-a055-4373-84f5-a8ffea82f345 Ratatouille_TH_FEA_DPX-V_1080P23_ENG-ENG_1234567 2007-08-15T03:21:28-07:00 Disney Pixar - urn:uuid:aaf69c7b-a055-4373-84f5-e8ffea123fa1 US English 2.39 DPX Master 01:51:15:10 EN US # Below specifies desired Output Format for Video/Image, but not the steps to get there - - ITU Rec-601 4:2:2 YCBCR 10 Linear Limited #64-940 - Uncompressed - - 0 0 719 485

...continued on the next page

104 Private Draft Document. Not for Publication continuation of a Complex (Level 2) OPL:

+ 1.21 - 59.94 I # Interlaced DF # Drop-Frame # Below specifies desired Output Format for Audio - 48000 24 800.8 - LPCM - - English 2.0 L - English 2.0 R - ME 2.0 L - ME 2.0 R No # Below specifies the desired Pre-Process Parameters - - - Lanczos 3-Lobe 720 486 -

...continued on the next page

105 Private Draft Document. Not for Publication

continuation of a Complex (Level 2) OPL:

- - ITU Rec-601 4:4:4 YCBCR 10 Linear Limited #64-940 - Mean 4:2:2 - - 59.94 I DF - Yes #Use 3:2 Pulldown 01:00:00:00 #Location of A Frame

In this example, the original IMP contains the following content: • 4:4:4 10-bit RGB DPX frames at 24Hz in Rec.709 • Active picture at 1920x803 • 5.1 English, 5.1 M&E, 2.0 English, 2.0 M&E

The Output Profile List will create the following content:

• 525 10-bit YCBCR at 59.94Hz in Rec.601 (uncompressed) • 16x9 720x486 with letterbox mattes • 2.0 English, 2.0 M&E

The process to create this 525 master is as follows: 1. Resize the image from 1920x803 to 720x362 using Lanczos 3-Lobe filter 2. Add black (RGB=000) mattes to above and below the image to fill to 720x486 3. Convert to 4:4:4 YCBCR using equations in SMPTE 293M 4. Sub-sample to 4:2:2 using Mean filtering 5. Change timecode by adding 3:2 pull-down and changing rate to 59.94 fields/second 6. Resample audio to 2002 samples per frame It calls out the desired Output format from a Image and Audio format. It is up to the “post” facility to determine the best way to deliver the desired output.

106 Private Draft Document. Not for Publication

11.5. A Pre-processing section for an OPL This example illustrates how one may include Pre-processing parameters in an OPL

image-alpha urn:uuid:ca209360-1107-11df-8a39- 0800200c9a66 #note: could be single image or an image sequence (must have alpha) rec709 500x300 1.0 1.0

#note: assuming upper-left of image (or image sequence) is origin. 1700 550 50 blend 0 100101 text rec709 Property of Warner Bros. white (or #FFFFFFF) black (or #000000) arial center 20 1.0 1.0

107 Private Draft Document. Not for Publication

#note: assuming upper-left of image (or image sequence) is origin. 960 1050 100 over 0 10000 text graphic Property of Warner Bros. white (or #FFFFFFF) black (or #000000) arial left 20 1.0 #note: assuming upper-left of image (or image sequence) is origin. 160 1000 100 over 20000 30000 fadeup 48 11 0 0 58 100 0 dissolve

108 Private Draft Document. Not for Publication

48 100 0 100 100101 100 0

109 Private Draft Document. Not for Publication

12. Annex - Ideal Implementation

12.1. Real-time Playback Real-time playback of the compressed image shall be supported in order to allow for the playback of the compressed streams in real-time environments such as broadcast systems and screenings.

12.2. Decode Compressed Image Faster than Real Time The ability to decode the compressed image faster than real time shall be required in order to support transcoding of the image to other file formats in a timely manner.

110