White paper

Digital File Recommendation

Version of July 2015

edited by

Whitepaper

1 DIGITAL VIDEO FILE RECOMMENDATIONS ...... 4 1.1 Preamble ...... 4 1.2 Scope ...... 5 1.3 File Formats ...... 5 1.4 ...... 5 1.4.1 Browsing ...... 5 1.4.2 Acquisition ...... 6 1.4.3 Programme Contribution ...... 7 1.4.4 Postproduction ...... 7 1.4.5 Broadcast ...... 7 1.4.6 News & Magazines & Sports ...... 8 1.4.7 High Definition ...... 8 1.5 General Requirements ...... 9 1.6 Production And Editing Requirements...... 9 1.7 Video Format ...... 10 1.7.1 Standard Definition ...... 10 1.7.2 High definition ...... 10 1.8 Digital Video Signal ...... 11 1.8.1 Standard Definition ...... 11 1.8.2 High Definition ...... 12 1.8.2.1 Video Signal Is Component Digital (4:2:2) ...... 12 1.8.2.2 Start of Program ...... 13 1.8.3 Ultra High Definition (High Efficiency Video Coding – HEVC) ...... 13 1.8.4 Digital Audio Recording Levels ...... 13 1.8.5 Additional Specifications ...... 16 2 MXF – File Recommendation...... 17 2.1 Introduction ...... 17 2.2 MXF ...... 17 2.2.1 Structural and Descriptive Metadata ...... 17 2.2.2 Structural Metadata ...... 17 2.2.2.1 Partitions ...... 17 2.2.2.2 Operational Pattern ...... 19 2.2.2.3 Header Metadata ...... 20 2.2.2.4 Index Tables ...... 21 2.2.2.5 Essence Container ...... 21 2.2.3 Descriptive Metadata ...... 23 2.2.3.1 Metadata Tracks ...... 23 2.2.3.2 Production Framework...... 25 2.2.3.3 Clip Framework ...... 26 2.2.3.4 Scene Framework ...... 27 2.2.3.5 Other Metadata ...... 27 2.2.3.6 Data Tracks ...... 28 2.3 Operational Patterns ...... 29 2.3.1 Op1a And OpAtom ...... 29 2.3.1.1 Op1a ...... 29 2.3.1.2 Atomic Op1a ...... 30 2.3.1.3 OpAtom ...... 31 2.3.1.4 OpAtom vs. Op1a ...... 33 2.3.2 External References On Atomic MXF files...... 34 2.3.2.1 Op1a (embedded material) ...... 34

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 2 of 56 Whitepaper

2.3.2.2 Op1b (external material) ...... 35 2.3.2.3 OpAtom1b ...... 36 2.3.2.4 Resolving External References ...... 37 2.3.3 Uses Of Op1b-3c Operational Patterns ...... 38 2.3.3.1 Op1b ...... 38 2.3.3.2 Op1c ...... 39 2.3.3.3 Op2b ...... 40 2.3.3.4 Op2c ...... 40 2.3.3.5 Op3b ...... 41 2.3.3.6 Op3c ...... 42 2.4 Interoperability And Metadata Exchange In Production Workflow ...... 43 2.5 Recommendations On Metadata Usage ...... 45 3 HEVC ( for H.265) ...... 47 4 Audio Standards ...... 48 4.1 Format PCM: AES 1/2 ...... 48 4.1.1 Mono ...... 49 4.1.2 Stereo...... 49 4.2 Dolby Surround ...... 50 4.2.1 Format Dolby-E : AES 3/4 (and AES1/2 On Special Request) ...... 50 4.2.2 Procedures For Measuring Dolby-E...... 51 4.3 Organization Of The Content ...... 51 4.3.1 AUDIO AES 1/2 ...... 52 4.3.1.1 From A PCM Stream ...... 52 4.3.1.2 For A Dolby E Stream ...... 53 4.3.2 AUDIO AES 3/4 ...... 55 4.3.2.1 For The Dolby-E Stream ...... 55

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 3 of 56 Whitepaper

1 DIGITAL VIDEO FILE RECOMMENDATIONS

1.1 Preamble

This paper aims to simplify and integrate workflows across production and broadcast departments of RTL Group stations but also for contribution and distribution purposes with external partners. This paper is considered as a useful recommendation for any station that might want to use it. A first major corner stone will be to define specifications for digital video file exchange. At the end of this standardization process, we expect to facilitate the following business benefits:

▪ Streamline processes: Desktop tools will simplify planning, research, logging and editing for production staff, and enable more informed decisions about cataloguing and retention of material.

▪ Reduce reliance on tape: Widespread access to audiovisual content on the desktop will eliminate the need for numerous VHS copies to be made. Tape transfers for post-production, technical review, etc. will become unnecessary.

▪ Better control of workflow: It will be easier to enforce and automate business rules, such as not allowing a program to be delivered to playout until its metadata is complete and it has been authorized.

▪ No more re-typing: Currently staff have to re-enter metadata at each step of workflow. This will become unnecessary.

▪ Easier exchange of content: Producers will be able to view rushes more quickly, including over the Internet. Delivery of material will be electronic.

▪ Less vendor tie-in: We propose to adopt open standards to avoid the need to rely on e.g. legacy tape formats and database systems.

▪ New ways of working: No-one can be sure how production will change in the future, but this initiative will be the first step in order to put in place the infrastructure to enable and support this change.

▪ Business continuity: New security requirements within our Group demand a network, different from the existing one. The ability of file delivery is a precondition.

▪ Diversification: Technical developments in the consumer area are challenging our business model. For example new delivery platforms, hard disk recording and rising availability of alternative content at the consumer

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 4 of 56 Whitepaper side need to be addressed. The necessary changes in infrastructure will profit from this initiative.

▪ Investment savings: Harmonized decisions today will help to save investment tomorrow. Existing infrastructure within RTL Group could be made available also for other affiliates of the Group. Reduced need to build infrastructure twice.

1.2 Scope

This document defines the technical file formats applicable to broadcast and viewing material intended for delivery to the TV stations of RTL Group. It should be considered as a generic framework in order to replace quickly tape delivery by this new technology and to ease program exchange between RTL Group TV stations and external partners.

This document is not replacing or overruling certain specifications that might exist between the licensor and the TV stations. The latest specifications will be considered immediately applicable, should any format standard be revised.

1.3 File Formats

File formats/codecs required are depending on the application of the particular content. The division into the quality or grade level are as follows

▪ Browsing ▪ Acquisition ▪ Programme Contribution ▪ Postproduction ▪ Broadcast ▪ News & Magazines ▪ High Definition

1.4 Codecs

1.4.1 Browsing

MPEG-4 and VC-1 (WM-9) are the file formats to be used for this application. In case when frame accuracy is required (dubbing, subtitling) MPEG-4 including time code encoding is needed. For internal exchange of viewing material w/o any later treatment a bandwidth of 500 Kbps is

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 5 of 56 Whitepaper considered to be sufficient. Lower bitrates are acceptable if the files need to be sent over small connections or for use on mobile devices (mobile screening devices).

▪ Framerate: 25 F/s ▪ Picture size: 320 X 240 (quarter picture size) ▪ Sound: 64 Kbps ▪ Total bandwidth: appr. 350 Kbps ▪ Four audio channels ▪ Frame-accurate Time-Code that is identical to source material ▪ MPEG-4 specs 'ISO/IEC 14496'

The encoding/decoding process of MPEG-4 is the following one: Sources – Split into objects – Object coding – Multiplex – Demultiplex – Object decoding – Compositing – Display

The video information is coded separately as synthetic or natural objects, and the resulting bit streams are multiplexed together. The decoder contains a compositor which puts the decoded objects back together, under the control of instructions which are either decoded from the bit stream or over-ridden by input from the user, providing a measure of interactivity.

In coding natural video material, MPEG-4 provides some enhancements to the MPEG-2 toolkit, such as adaptive DC prediction, AC coefficient prediction, reversible VLC coding, global , quarter- pixel motion estimation and shape-adaptive DCT coding as well as coding of textures and the use of sprites. However, the consensus seems to be that the chief interest of MPEG-4 is in offering increased functionality rather than a huge leap in coding efficiency.

1.4.2 Acquisition

For news acquisition, DV25 (DVCAM & DV) is recommended, as it is a flexible and very efficient format

• Conventional i.LINK convey only (4:2:0) format • All domestic product employs (4:2:0) format • Consistency o DV(4:2:0) -> DVCAM(4:2:0)-> Editing-> DVD(4:2:0) o DV(4:2:0) -> DVCAM(4:2:0)-> Editing-> TX(4:2:0) o DV(4:2:0) -> DVCAM(4:2:0)-> MPEG-2

As additional acquisition formats for News XD CAM HD 35/50 Mbit are also accepted (see definition in 1.4.7).

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 6 of 56 Whitepaper For high quality drama, series, feature Films and co-productions it is recommended to use: - MPEG-2 MP@ML 4:2:2 in long GOP - MPEG-2 MP@ML 4:2:2 I-Frame only, IMX format

1.4.3 Programme Contribution

In order to reduce bandwidth on contribution lines a Long is recommended. - MPEG-2 4:2:2 and 4:2:0 in long GOP - Filetransfer is recommended with MXF standard

In addition XD CAM HD 35/50 Mbit are also accepted (see definition in 1.4.7).

1.4.4 Postproduction

High quality drama, documentary and series must be delivered in the MPEG-2 MP@ML 4:2:2 I-Frame only, IMX format. Bandwidth could be 30, 40 or 50 Mbps. Again MXF compliance must be respected.

For the interchange of audio-visual material and associated metadata especially in the Production and PostProduction area Advanced Authoring Format (AAF-Format) and MXF is used.

In addition XD CAM HD 35/50 Mbit format is also accepted (see definition in 1.4.7).

1.4.5 Broadcast

This section covers items that only need to be aired as they are without postproduction.

MPEG-2 MP@ML 4:2:2 in long GOP (12 frame) is the file format for ready to air content. Bandwidth could vary between 10-18 Mbps. All files must be MXF compliant. The MXF reference is SMPTE 377M.

The , MXF, is a file format optimised for the interchange of material for the content creation industries. MXF is a wrapper format intended to encapsulate and accurately describe one or more “clips” of essence. These essence “clips” may be pictures, sound, data or some combination of these.

A MXF File contains enough information to allow two applications to interchange essence without any a-priori information. The MXF metadata allows applications to know the duration of the file, what essence codecs are required, what timeline complexity is involved and other key points to

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 7 of 56 Whitepaper allow interchange. Key to MXF is the accurate description of the essence. This standard is only used in the playout area, and not foreseen to handle any postproduction (i.e. trailer production).

MPEG-2 MP@ML 4:2:2 in long GOP is the preferred archiving format for broadcast material. This format is applicable for example to commercials, video clips and all other programs with no postproduction need. Actually tests are done in order to check the potential of such a picture quality for compositing postproduction work.

HD Broadcast files XD CAM HD 35/50 Mbit are also accepted (see definition in 1.4.7)

1.4.6 News & Magazines & Sports

In these domains the DV 25 format is the preferred file format. The compression is 25-Mbit/s Discrete Cosine Transform (DCT)-based intra frame encoding. DV25 formats include Consumer DV and DVCAM standards.

Video ▪ PAL sampling: 720 x 576 pixels ▪ PAL 4:2:0

Audio ▪ Four channels at 32 kHz and 12 bits per sample ▪ Alternative: two channels at 32/44, 1/48 kHz and 16 bits per sample

Alternatives: ▪ Production: MPEG-2 25 Mbit/s I-frame only ▪ Contribution: MPEG-2 long GOP (4-15Mbit/s)

In addition XD CAM HD 35/50 Mbit format is also accepted (see definition in 1.4.7).

1.4.7 High Definition

The workgroup defines XDCAM HD to be used as video standard. The following parameters are to be applied:

Format: HD422

MPEG-2 bit rate: 50Mbps – CBR

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 8 of 56 Whitepaper Frame dimensions: 1920 x 1080

Color sample ratio: 4:2:2

MPEG-2 standard: MPEG 422P@HL

Aspect ratio : 16:9.

5.1 audio encoding: or DolbyE.

MPEG HD422 doubles the chroma-resolution compared to the previous generations of high-definition video XDCAM formats. To accommodate the improved chroma detail, video bitrate has been increased to 50 Mbit/s. This format is used only in XDCAM HD422 products.

Currently also feature films as well as all co-productions in the drama domain are still delivered as HD –D5, HDCam or HDCam SR videotape, compliant with SMPTE 274-1998 and SMPTE 295-1997 standards. The Original aspect ratio must be respected.

1.5 General Requirements

▪ Any secondary versions of masters (example: differing audio track configurations) shall be made with identical time codes. All efforts should be made to minimize encoding generations (cascading) where possible.

▪ All Files should have VITC recorded on lines 19 and 21 (both fields) at the current standard of 80 IRE units

▪ Time code crossover of 00:00:00:00 should never be used.

▪ We recommend using a minimum number of files per show or program item in order to minimize shipping costs and risk of confusion.

▪ Only the latest and best equipment must be used for file encoding.

▪ Wide-screen signalling (1:1.78).

▪ Local AD-ins.

1.6 Production And Editing Requirements

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 9 of 56 Whitepaper ▪ Production audio and video will be continuous from end of each file to beginning of the next file with no repetition of program material. The program should be edited in a way that it could be aired without any commercial breaks. Existing commercial breaks and bumpers should be removed (fades from/to black should have a maximum of one frame total black).

▪ Every file should be delivered with a full evaluation report (Metadata structure to be defined in a next step). Evaluation should cover all useful information for a particular grade.

▪ Titles shall not be outside PAL-SMPTE safe title area (SMPTE RP-27- 3, 1983). Primary action should be within the 4:3 safe action area.

▪ The video signal shall be either the 1080/50i (interlaced) or 1080/25p (progressive) standard.

▪ To allow compatibility, unless otherwise agreed with programme operations, High Definition programmes should conform to the same safe area criteria as Standard Definition. (EBU R95-2000).

▪ As soon as available, we recommend the 1080/50p standard, recommended by EBU recently.

1.7 Video Format

1.7.1 Standard Definition

▪ Video signals shall be 625 line / 50 field per second PAL-format, with sync and blanking in accordance with CCIR recommendation 472-1 and the specifications given in CCIR report 624-1 and 624-3 which apply to colour video signals for use in PAL B, G, H or I television systems.

▪ The maximum level of luminance (video peaks) must not exceed 100%; the unfiltered composite signal must not exceed 133%. The unfiltered composite signal of inserted titles must not exceed 100%. Black levels of the picture content have to be between 0 and 2%. The blacks should not be crushed. If the video peaks during a whole program do not exceed 0,45 V (50%), the program will not be accepted.

1.7.2 High definition

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 10 of 56 Whitepaper ▪ Video signals shall be 1080 line/50i or 1080 line/25p (progressive segmented) frames per second, with sync and blanking in accordance with SMPTE 274M/295M.

▪ The maximum level of the RGB level should not exceed ± 700 mV when set at a 350 mV offset. Blacklevels of the picture content have to be between 0 and 1%. The blacks should not be crushed. If the overall video level during a whole program do not exceed 0,5 V (70%), the program will not be accepted.

▪ No illegal colours are allowed

1.8 Digital Video Signal

1.8.1 Standard Definition

Video signal is component Digital (4:2:2)

Frame rate: 25 frames /second Line rate: 625 lines/frame Sampling rate: Y = 864 samples/line (13.5 MHz), activ 720 samples Cr, Cb = 432 samples/line (6.25 MHz), activ 360 samples Blanking: The active line should have: Y = 720 samples / Cr, Cb = 360 samples When an analogue video format is converted into a digital format a horizontal blanking of 12 µs is tolerated; the active line should not have less than 700 samples. Quantisation: 10 bits

ITU-R 601 is designed to be compatible with the resolution and frame rates used by the existing analogue standards, NTSC, PAL and SECAM, i.e. it uses 2:1 interlacing with a frame rate of 30 (NTSC) or 25 (PAL, SECAM) frames per second and 525 (NTSC) or 625 (PAL, SECAM) lines per frame. The standard deals with colour by working in the YUV domain, taking samples of the luminance and two chrominance signals to produce a bit- stream representation of the video. However, it deals with the luminance and chrominance components differently as described below.

A major concern in the design of the standard was that the bit rate of the digital video should remain the same irrespective of the resolution and frame rate of the incoming analogue video. This means that the number of samples taken per line depends on whether the input is NTSC or PAL/SECAM. In the case of the luminance component these are:

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 11 of 56 Whitepaper ▪ NTSC: 525 lines per frame with 858 samples per line at 29.97 frames per second. ▪ PAL/SECAM: 625 lines per frame with 864 samples per line at 25 frames per second. This gives a data rate of 13:5 \Theta 10. ▪ Six luminance samples per second, or one sample every 7.4 nanoseconds, in each case.

The sampling rate adopted for the chrominance components depends on the version of the standard used. As noted in the previous section, the human eye is more tolerant of error in colour and this is exploited by two of the versions by using a lower sampling rate for the chrominance components, hence reducing the bit rate of the digital video. Details of each version are given below.

ITU-R 601 4:4:4 is the full resolution version, with the same number of lines per frame and samples per line used for the chrominance as for the luminance components. Data rate - 40:5 \Theta 10 6.

With ITU-R 601 4:2:2, the number of samples per line is half of that used for the luminance component although the number of lines per frame is the same, giving a reduction of 33% in the overall bit rate. The 4:2:2 indicates that there are 4 luminance samples for every 2 chrominance samples. Data rate - 27 \Theta 10.

▪ Time code must be continuous.

1.8.2 High Definition

1.8.2.1 Video Signal Is Component Digital (4:2:2) “ITU 7.09-5 compliant”

Frame rate: 50 fields (50i); 50 segments (25psf) /second Interlace ratio: 1:1 (25psf); 2:1 (50i) Line rate: 1125 total lines and 1080 active lines Sampling rate: (full line) R, G, B, Y = 2640 samples/line (74, 25 MHz) Cr, Cb = 1320 samples/line (37,125 MHz)

(active line) R, G, B, Y = 1920 samples/line (74,25 MHz) Cr, Cb = 960 samples/line (37,125 MHz)

Quantization: Uniformly quantized PCM for each of the video component signals 8 or 10 bit/sample, preferable 10 bit/sample.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 12 of 56 Whitepaper

In the 8-bit mode:

Quantization levels Blacklevels (R,G,B,Y) = 16 Achromatic (CB, CR) = 128

Nominal peak (R,G,B,Y) = 235 (Cb, Cr) = 16 and 240

Quantization levels assignment

Video data = 1 through 254

In the 10-bit mode: Blacklevels (R,G,B,Y) = 64 Achromatic (CB, CR) = 512 Nominal peak (R,G,B,Y) = 960 (Cb, Cr) = 64 and 960

Quantization levels assignment

Video data = 4 through 1019

1.8.2.2 Start of Program Files should commence with either “0” or “black”. If “0” is selected it has to be the first frame.

1.8.3 Ultra High Definition (High Efficiency Video Coding – HEVC) The first version of the new standard has recently been consented as Recommendation ITU-T H.265 and will shortly be approved by ISO/IEC as ISO/IEC 23008-2 (MPEG-H part 2). Details will be added, once an official approval is out.

Quantization: Uniformly quantized PCM for each of the video component signals in 8 or 10 bit/sample

1.8.4 Digital Audio Recording Levels

Based on the EBU document R 128 2011 (Loudness normalisation and permitted maximum level of audio levels), the following is recommended:

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 13 of 56 Whitepaper

• that the descriptors Programme Loudness, Loudness Range and Maximum True Peak Level shall be used to characterise an audio signal

• that the Programme Loudness Level shall be normalised to a Target Level of -23 LUFS. The permitted deviation from the Target Level shall generally not exceed ±1 LU for live programs, ±0.1 for short items and long items.

• that the audio signal shall generally be measured in its entirety, without emphasis on specific elements such as voice, music or sound effects

• that the measurement shall be made with a loudness meter compliant with both EBU Tech Doc 3341 [4]

• that this measurement shall include a gating method with a relative threshold of 10 LU below the ungated LUFS loudness level as specified in EBU Tech Doc 3341 - 2011

• that Loudness Range shall be measured with a loudness meter compliant with EBU Tech Doc 3342 - 2011[5]

• that the Maximum Permitted True Peak Level of a programme during production shall be -1 dBTP (dB True Peak), measured with a meter compliant with both ITU-R BS.1770 and EBU Tech Doc 3341 - 2011

• that loudness metadata shall be set to indicate -23 LUFS for each programme that has been loudness normalised to the Target Level of -23 LUFS

• that loudness metadata shall always correctly indicate the actual programme loudness, even if for any reason a programme may not be loudness normalised to -23 LUFS

• that audio processes, systems and operations concerning production and implementation should be made in compliance with EBU Tech Doc 3343 - 2011 [6]

• that audio processes, systems and operations concerning distribution should be made in compliance with EBU Tech Doc 3344 - 2011 [7].

Definitions:

• Programme: An individual, self-contained audio-visual or audio-only item to be presented in Radio, Television or other electronic media. An advertisement (commercial), trailer, promotional item (“promo”),interstitial or similar item shall be considered to be a programme in this context

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 14 of 56 Whitepaper

• Loudness Range (LRA): this describes the distribution of loudness within a programme

• Maximum True Peak Level: the maximum value of the audio signal waveform of a programme in the continuous time domain.

In addition to the average loudness of a programme (‘Programme Loudness’) it is recommended that the descriptors ‘Loudness Range’ and ‘Maximum True Peak Level’ are to be used for the normalisation of audio signals, and comply with the technical limits of the complete signal chain as well as the aesthetic needs of each programme/station depending on the genre(s) and the target audience.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 15 of 56 Whitepaper 1.8.5 Additional Specifications

▪ Any other spurious deficiency that produces noise, smear, blocking, tearing, jitter, blurring, uncompensated color-temperature, lack of resolution, unsharpness, flicker, phase error, synchronization error or any editing malfunction is not permitted. The same is with audio-noise, flutter, crackles, clicks, pops, distortions, background noises (i.e. air condition), time code bleed-through to an audio track.

▪ All audio elements must be in sync to picture on the final master file.

▪ Program delivered with a 5.1. Dolby digital track shall be delivered as a Dolby E signal and separated track stereo mix on the videotape.

▪ Separate audio files must be in sync to the master file.

▪ As program loudness only TV-mix is to be accepted with 20 LU as highest level. Average levels are considered 10-15 LU (EBU–Tech 3341)

▪ Enhancement-levels should be set to normally acceptable levels. Over enhancement should not be used to compensate for poor tube focus or telecine alignment. Noise reduction should only be used where flicker or grain are judged to be out of specification.

▪ Picture-formats of 2,35:1 and 1,85:1 are not recommended in the SD domain. These formats must be converted during telecine process to a picture-format of 16/9 (anamorphic).

▪ All film-transfers must be made on the latest and best equipment and shall preferentially be made as HD transfers.

▪ Second screen synchronization support will be implemented according to EBU standards, once available, allowing the implementation of e.g. program discovery apps, check-in apps, second screen advertising apps, social-chat apps, program specific apps, remote control apps.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 16 of 56 Whitepaper

2 MXF – File Recommendation

2.1 Introduction

The purpose of this document is to provide the keys for understanding the use of MXF in the broadcast industry. When designing an MXF workflow, it is important to ensure the interoperability between the different systems in use as well as an efficient carriage of metadata. In this document we will first go through metadata and operational patterns usage in MXF file format and we will conclude with a presentation of production workflows and recommendations on metadata usage.

2.2 MXF Metadata

2.2.1 Structural and Descriptive Metadata

MXF is a wrapper format that is capable of carrying several video and audio formats in a single file. MXF conveys two types of metadata in order to describe the contents of a file: • Structural Metadata: it provides information on the video and audio essences (format, duration, etc.) and on the way they should be played out. It also includes synchronisation and time code information. • Descriptive Metadata: this is the “user” metadata annotating the file’s content. It is perfectly valid to create MXF files without any descriptive metadata however it can be useful to indicate for instance the title of the video or the audio language, etc.

2.2.2 Structural Metadata

2.2.2.1 Partitions

Partitions are a subdivision of the data stored within an MXF file. There are three types of partition: o Header Partition: this is the first partition of the file where is usually stored all the information available when starting the recording. o Footer Partition: this is the last partition of the file where is usually stored all the information collected during the recording. o Body Partition(s): these are all the in-between partitions where is usually stored the audio and video data.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 17 of 56 Whitepaper The following figure shows the overall structure of an MXF file:

MXF File

Header Body Body …x N Body Footer Partition Partition Partition Partition Partition

Header Index Essence Metadata Table Container

Picture Sound …x N Picture Sound

Header, Body and Footer partitions may contain a header metadata and an index table. Header and Body Partitions (not Footer) may contain audio and video data. Partitioning a file has several benefits notably in a streaming environment: o It provides regular entry points in order to start a play from any location in the file. o It enables the update of the structural and descriptive metadata while recording. o Index tables and essence data can be organized to enable a play while record capability. o It provides regular entry points to perform partial restore.

A partition can be tagged as open/closed and incomplete/complete to estimate its accuracy: o Open/closed: an open partition may contain metadata that was valid at the time of recording but not valid anymore at the end of the recording. In a closed partition, all the information in the partition is accurate. o Incomplete/complete: an incomplete partition is a partition where part of the metadata could not be set at the time of recording. A complete partition is a partition where all the metadata could be set at the time of recording. For instance the duration of a video might be known only when the recording is completed.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 18 of 56 Whitepaper 2.2.2.2 Operational Pattern

MXF is a wrapper file format embedding (or referencing) several video and audio streams (essences) that are synchronized in order to produce a play-out. The level of complexity of an MXF file’s play-out is denoted by the Operational Pattern (Op).

Single Item Play-list Items Edit Items OP 1 2 3 MP MP MP

a FP FPs FPs MP MP MP

b OP Atom FPs and FPs FPs and

MP MP1 MP1 MP1

c or or or MP2 MP2 MP2

• File Package (FP) or Source Package (SP): it describes the source video and/or audio data. It includes a time code track synchronizing a set of audio, video, data and metadata tracks. • Material Package (MP): it describes the play-out of an MXF file and is built by referencing several File Packages. It also includes a time code track as well as several video or audio tracks corresponding to an editing of the source File Packages. It may optionally contain tracks with descriptive metadata. • Material and File Packages are uniquely identified thanks to their UMID.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 19 of 56 Whitepaper 2.2.2.3 Header Metadata

Header Metadata Preface

Identifications Essence Containers DM Scheme

Content Storage

Packages

Material Package

Tracks Source Package

Tracks

Descriptors

Essence Container Data

The header metadata is a set of metadata organized as a tree. It is located at the beginning of a partition. It notably contains: • Identifications: a history of the modifications that were applied to the file (by who and when) • Content Storage: describes the material and source packages used by the file. • Essence Container Data: provides the mechanisms enabling to link the source package tracks to their corresponding physical data (stored in or out of the file). The Essence Container is the structure containing the physical data (usually video frames and audio samples). • Tracks: o Picture and Audio Track: defines a video or audio stream of a given duration. When linked to a descriptor it also provides in-depth information on the format (interleaving, size, pixel aspect ratio, sample rate, etc.) o Data Track: defines a stream that is not video or audio (subtitles for instance)

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 20 of 56 Whitepaper o Time code Track: used for synchronization. Define a starting time code and duration. Time code tracks can be discontinuous in order to reflect the TC changes of a digitized tape. o Metadata tracks: used to store “user” metadata annotating the file’s content.

2.2.2.4 Index Tables

Index tables provide the mechanism to convert a frame number or time code value into a byte offset in the MXF file. They are optional but allow performing tasks such as random access to the video and audio content of an MXF file.

2.2.2.5 Essence Container

The essence container is the structure where is stored the video and audio data corresponding to the file package’s tracks.

OP1a Video

MP Audio

1 2-track FP

For instance, if we consider this MXF file the video and audio tracks can be wrapped in different ways:

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 21 of 56 Whitepaper

▪ Clip Wrapping Body Partition

ENTIRE ENTIRE VIDEO AUDIO

KLV KLV

Note: KLV stands for Key/Length/Value and is the smallest indivisible block of data that can be stored in an MXF file. An MXF file is a succession of KLV elements. o Key (16 bytes) is an SMPTE Universal Label identifying the Value. o Length (2, 4 or 8 bytes) states the length of the Value. o Value is the data itself.

The previous figure illustrates the content of the body partition with clip wrapped essences. The essence container is built with only two KLVs, one will contain the entire video and the second one the entire audio. This kind of wrapping is convenient when the video or audio do not need to be stored on a per-frame or per-sample basis but it makes it is impossible to use in a streaming environment as a linear decoding of the MXF file does not provide access to audio and video simultaneously.

▪ Frame Wrapping

Body Partition

Video Audio Video … Audio Video Audio Frame Sample Frame Sample Frame Sample

KLV KLV

In Frame wrapping the video frames are stored in separate KLVs. In the meantime KLVs of audio are built and multiplexed with the video so that they can be played synchronously. It is best to use frame wrapped MXF files when working in a streaming environment.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 22 of 56 Whitepaper 2.2.3 Descriptive Metadata

2.2.3.1 Metadata Tracks

Descriptive metadata is usually stored as part of the header metadata. It is stored as one or several trees linked to metadata tracks:

Descriptive metadata

Material Package Timeline Track Event Track Static Track Sequence (DM) DM Segment

Source Package Timeline Track DM Framework Event Track Static Track Sequence (DM) DM Segment

DM Framework

• Descriptive metadata can be stored as part of the material package or source package whether it is annotating the “play-out” or the media sources. • There are three types of metadata tracks: o Static track: to store metadata that is not related to a time span. This is a metadata annotating the package globally (name of the sequence, etc.). o Timeline track: to store metadata that is valid for the entire duration of the package but may change (shot location, etc.). o Event track: to store metadata related to events with a limited or null time span (end of an interview, etc.).

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 23 of 56 Whitepaper

DM Static Track 1, Segment 1 Timecode Track Header Data Track Picture Track Metadata Sound Track 1 Sound Track 2 DM Timeline Track 2, Segments 1, 2 3 DM Event Track 3, Segments 1 & 2

Links to all Links only the Links only to essence picture track portions of the Root Sets tracks. picture track. (Preface, Ident & Production Clip Scene Content Framework Framework Storage) Framework

Package DM DM DM (Material, Segment Segment Segment File and Source) DM Track & DM Track & DM Track & Sequence 1 Sequence 2 Sequence 3

• Descriptive metadata trees follow a standardized scheme called DMS1 (Descriptive Metadata Scheme 1) defining three frameworks: o Production Framework: contains descriptive metadata annotating the MXF file as a whole (identification and ownership details, etc.). Stored on static tracks. o Clip Framework: contains descriptive metadata annotating a clip (capture and creation information). Stored on timeline tracks. o Scene Framework: contains descriptive metadata annotating a scene being part of a clip (actions and events). Stored on event tracks.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 24 of 56 Whitepaper

2.2.3.2 Production Framework

The previous figure shows a complete « Production Framework » tree. The tree is built as a list of metadata sets (Titles, Identification, etc. see above tree) linked to each other. Then each metadata sets contain a list of properties used to store the metadata “values” (Extended Text Language Code, Main Title, etc.). Finally, most of the metadata sets can be repeated several times (for instance there might be several “Participants”).

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 25 of 56 Whitepaper 2.2.3.3 Clip Framework

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 26 of 56 Whitepaper 2.2.3.4 Scene Framework

2.2.3.5 Other Metadata

• MXF leaves room for creating its own metadata schemes (denoted as “Dark Metadata”). Hence it is possible to create metadata trees matching exactly one’s need. However, it raises interoperability issues as third-party MXF systems will not be able to exploit this metadata (though they should still be able to play the file). • As shown in the following figure, header metadata can be repeated and updated during the MXF file creation process. Hence, it is not required to have knowledge of all the metadata before launching the recording process.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 27 of 56 Whitepaper

Header Body Body …x N Body Footer Partition Partition Partition Partition Partition

Header Metadata Header Metadata (open and incomplete) (closed and complete)

Structural Descriptive Header Metadata Multiplexed Video Metadata Metadata (updated) & Audio Data

2.2.3.6 Data Tracks

SMPTE 394M states that “metadata is best placed in the header metadata whenever possible since that provides the greatest accessibility for all MXF readers”. However, streams of metadata such as time code streams are best placed directly in the essence container and multiplexed with the video and audio streams. It provides instant access to the time code when decoding the essence containers.

OP1a Picture

MP Sound

Data

1 3-track FP

Header Body Body …x N Body Footer Partition Partition Partition Partition Partition

Header Metadata Index Multiplexed Video, Header Metadata (open and incomplete) Table Audio and Data. (closed and complete)

Picture Sound Data … Picture Sound Data

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 28 of 56 Whitepaper

2.3 Operational Patterns

The MXF files should be easy to transport, edit and store from the ingest system to the broadcast system. Which MXF operational pattern best matches these? Here we will first study the main differences between the two most commonly used operational patterns (Op1a and OpAtom) as a starting base for our reflection.

2.3.1 Op1a And OpAtom

2.3.1.1 Op1a

OP1a a file embedding 3 MP video or audio tracks.

1 3-track FP : MXF FILE

▪ Op1a files contain a single Material Package (only one possible play-out). ▪ Op1a files contain a single File Package. ▪ The File Package may contain several tracks. ▪ MP = FP the source streams are played as stored.

OP1a File

Header Body Body …x N Body Footer Partition Partition Partition Partition Partition

Header Metadata Index Table Video & Header Metadata Index Table (open and incomplete) (partial/optional) Audio (closed and complete) (complete)

The Op1a norm states that: ▪ Op1a files may contain several body partitions → this makes it easier to perform partial restore and play while record tasks. ▪ Header metadata can be repeated and updated while recording → in a streaming environment; it is not required to have knowledge of all the structural and descriptive metadata before launching the wrapping process.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 29 of 56 Whitepaper ▪ No restrictions on index tables’ location → in a streaming environment, possibility to create sparse index tables to reference essence as it is recorded. Eventually create a complete index table at the end of the file. ▪ Frame wrapped Op1a is perfectly tailored to ensure AV synchronization.

2.3.1.2 Atomic Op1a An atomic Op1a file is an Op1a with a single essence track as well as extra constraints to ensure the capability of “play while record” in a streaming environment. Its definition is part of the SMPTE Recommended Practice for MXF Master Storage.

Atomic OP1a a file embedding a MP single video or audio track.

1 1-track FP : MXF FILE

▪ Atomic Op1a files contain a single Material Package. ▪ Atomic Op1a files contain a single File Package. ▪ Atomic Op1a files contain a single track.

Atomic OP1a File

Header Body Body …x N Body Footer Partition Partition Partition Partition Partition

Header Metadata Index Table Video or Header Metadata (open and incomplete) (partial) Audio (closed and complete)

▪ Atomic Op1a files contain a header partition with a possibly open and incomplete header metadata. The footer partition contains a closed and complete header metadata → header metadata can be updated during recording. ▪ The file contains several body partitions. Each body partition contains alternatively a sparse index table or essence data. Each sparse index table is referencing the data from the following body partition → in a

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 30 of 56 Whitepaper streaming environment, it ensures a continuous recording of the index table and the play while record capability. ▪ Atomic Op1a files contain a single essence data which eases up edition of the file’s content.

2.3.1.3 OpAtom OpAtom can be subdivided in four extra patterns: OpAtom1a, 1b, 2a and 2b as shown in the following figure:

OPAtom1a MP OPAtom1a : a file with a single video or audio track. MP = FP. FP

OPAtom1b OPAtom1b: n files with a MP common material package. Tracks are played simultaneously.

2 1-track FPs

OPAtom2a MP OPAtom2a: n files with a common material package. Tracks are played one after the other. 2 1-track FPs

OPAtom2b OPAtom2b: n files with a common material package. MP Tracks are played simultaneously and one after 4 1-track FPs the other.

▪ Each set of OpAtom files share a common Material Package. ▪ Each OpAtom file has a single Material Package (only one possible play- out). ▪ Each OpAtom file contains a single File Package.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 31 of 56 Whitepaper ▪ Each File Package contains a single essence track. ▪ OpAtom does not allow defining cuts in the File Packages.

OPAtom File

Header Body Footer Partition Partition Partition

Header Metadata Video or Audio Index Table (closed and complete) Data (complete)

The OpAtom norm states that: ▪ OpAtom files should have a header partition, a single body partition and a footer partition → multi-partitioning is not allowed which may prove difficult to perform tasks such as play while record and partial restore, etc. ▪ There is a single closed and complete header metadata and it should be included in the header partition → in a streaming environment, it requires knowledge of all the structural and descriptive metadata before launching the wrapping process. ▪ The footer partition must contain a complete index table referencing all the video frames or audio samples of the body partition → with VBR essence, the index table needs to be stored separately until the recording of the file ends. Random access to the data is not possible before the end of the recording.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 32 of 56 Whitepaper 2.3.1.4 OpAtom vs. Op1a The following table summarizes the pros and cons of OpAtom, Op1a and Atomic Op1a:

OpAtom Atomic Op1a Op1a Index Tables Complete Sparse Sparse or Complete Multi-Partitioning No Yes Yes or No** Partial Restore Yes* Yes Yes* ** Play while Record No Yes Yes or No** Metadata Update No Yes Yes or No**

* Partial Restore is possible but is optimized with multi-partitioning. ** It is possible to build Op1a files meeting these requirements but not all the Op1a files will have these features enabled.

From this table we would recommend the use of Atomic Op1a pattern. This pattern seems to match the network environment requirements and ensures an efficient use of all the MXF features in a production environment: partial restore, play while record and metadata update. However, the most common operational pattern used today in MXF-aware hardware and software is Op1a (not atomic Op1a). This assertion implies that conversions of operational patterns will be probably required from the ingest time to the broadcast time.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 33 of 56 Whitepaper 2.3.2 External References On Atomic MXF files. In this chapter, we will consider using atomic MXF files (containing a single essence stream) and the pros and cons of embedding the streams within the MXF file or using external references. For the purpose of the demonstration, we will study three MXF files. Each of them will share the same MP but will use different mechanisms for storing the data: • Op1a MXF file with embedded audiovisual material. • Op1b MXF file with external audiovisual material. • OpAtom1b MXF file.

2.3.2.1 Op1a (embedded material)

OP1a a file embedding 3 MP video or audio tracks.

1 3-track FP : MXF FILE

In this configuration, there is a single File Package and a single Material Package. All the data is contained in a single file.

Pros: ▪ A single file to transport. ▪ Op1a-3c provides advanced mechanisms for synchronization. ▪ Op1a-3c provides advanced mechanisms to build complex editing.

Cons: ▪ Changing the content of a FP track requires rebuilding the entire file. ▪ A single track cannot be transported individually. We need to transfer the whole data. ▪ Several files to carry → need for an external mechanism retrieving the referenced files.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 34 of 56 Whitepaper 2.3.2.2 Op1b (external material)

OP1b with external ref: a file without internal data is OP1b referencing 3 external uni- MP track MXF files.

3 1-track MPs

Atomic Op1a Atomic Op1a Atomic Op1a or OpAtom1a or OpAtom1a or OpAtom1a 3 1-track FPs

: MXF FILE

In this configuration, an MXF file without any internal audio or video data is referencing three external “atomic” MXF files. We define an “atomic” MXF file here as an OpAtom1a file or an Op1a file with a single essence track. The MP is referencing the FPs located in different files.

Pros: ▪ Changing the content of a FP only requires rebuilding a single file. ▪ The MP can be conveniently edited without transporting essence data. ▪ Externally referenced files can be conveniently reused in several MPs. ▪ Each atomic file can be previewed individually. ▪ Op1a-3c provides advanced mechanisms to build complex editing.

Cons: ▪ Several files to carry → need for an external mechanism retrieving the referenced files.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 35 of 56 Whitepaper 2.3.2.3 OpAtom1b

OPAtom1b : 3 files with a common material package. OPAtom1b Each file embeds a single video or audio track. MP

3 1-track FPs

: MXF FILE

With an OpAtom operational pattern, each MXF file can only contain a single video or audio track. Hence, in our example, we need three MXF files. We have a single Material Package referencing three File Packages. Each file will contain a copy of the Material Package and a different File Package.

Pros: ▪ Changing the content of a FP only requires rebuilding a single file (though the header metadata still need to be updated in each file). ▪ Each source track can be transported individually.

Cons: ▪ OpAtom does not allow partitioning (and hence limits the application of play while record, partial restore, etc.) ▪ OpAtom does not provide mechanisms for tracks’ synchronization.

We recommend building a workflow using as much as possible externally referenced “atomic” MXF files. This mechanism provides the maximum flexibility as well as the ability to build complex editing (Op1a to 3c). More generally, the ideal workflow will probably require Operational Pattern conversion depending on the application (for instance broadcast servers usually accepts Op1a pattern with embedded material). In the mean time, the operational pattern may be restricted by the software or hardware being used!

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 36 of 56 Whitepaper

2.3.2.4 Resolving External References MXF provides different ways to work with external references. We will see here which approach is the most appropriate for a networked broadcast environment.

MP 1 UMID

2 ?

Atomic Asset Op1a Management System 3

Solving external references with an asset management system: 1. The MP references a FP. The UMID of this FP is extracted from the MXF file. 2. The UMID is sent to the asset management system. The system will look in the database for the location of the corresponding atomic MXF file. 3. The externally referenced MXF file is retrieved and linked to the material package.

MP 1 Network + Text Locator UMID

3 =? 2 Atomic UMID Op1a

Solving external references with the network locator and text locator metadata from the MXF file: 1. The MP references a FP. The network locator (“//archive/mxf/program10/” for instance) and text locator (“video_movie15.mxf”) are extracted from the FP’s metadata. 2. The atomic MXF file is retrieved on the network.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 37 of 56 Whitepaper 3. The UMID referenced in the MP is extracted. The UMID from the atomic MXF file is also extracted. The values are compared and if they match then the correct file was retrieved → the atomic MXF file is linked to the MP.

We recommend the use of the first workflow. The second solution requires storing the location of the referenced file within the metadata of the MXF file. If the referenced files are moving, then this metadata is no longer valid and the MXF file must be updated accordingly. On the contrary, the first solution only requires updating the location within the asset management system. Furthermore, the externally referenced files do not necessarily need to be MXF files (for instance it is perfectly valid for an MXF file to reference an external DV file). In that case, the media asset management system can also be used in order to resolve the UMID-link.

2.3.3 Uses Of Op1b-3c Operational Patterns We have seen in the previous chapters that the desired MXF architecture for a broadcast environment was the use of higher operational pattern files referencing external data essence wrapped in atomic Op1a files. Here we will see the use we can make of operational patterns Op1b to Op3c in this context.

2.3.3.1 Op1b

OP1b OP1b with external ref: 1 file without internal data is MP referencing 3 external uni- track MXF files.

Video Audio A1 Audio A2

Op1b will be used to play-out synchronously several tracks: • The simplest case will be the synchronization of a video file with an audio file for a standard play-out. • There are no limits to the number of video and audio tracks that can be played together. For instance the video of an interview can be synchronized with two audio records (microphone of the interviewer and microphone of the interviewee).

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 38 of 56 Whitepaper

2.3.3.2 Op1c

OP1c with external ref: 1 file OP1c with 2 possible play-outs is referencing 3 external uni- MPs track MXF files.

Video (Movie) Audio English Audio French

Op1c will be used to define various play-outs of synchronized tracks:

• The simplest case will be the synchronization of a video file with audio files from different languages. • Op1c allows building several play-outs without duplicating the essence data. In our example, the same MXF video file is used twice. • Op1c can also ease up proxy manipulation: o One MP references the full-quality video and audio files. o The other MP references the low-quality proxy video and audio files. o The proxy and full-quality files can be manipulated, edited and stored simultaneously.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 39 of 56 Whitepaper 2.3.3.3 Op2b

OP2b with external ref: 1 file playing the movie and then OP2b the credits. The movie and MP the credits are stored in separate video and audio files.

Video (TV show) Audio (TV show) Video (Credits) Audio (Credits)

Op2b is an extension of Op1b where source video and audio tracks are butted to each other. • This operational pattern enables the creation of a program from essence data originating from different sources. • Example of use: a TV show may have the same credits everyday while the content of the program itself is changing daily. Op2b allows reusing conveniently the credits video and audio MXF files. • Example of use when digitizing a tape: the tape may have a discontinuous time code. Every time a time code discrepancy is detected a new atomic Op1a video and a new atomic Op1a audio files are created. These files will store the new time code in their File Package. At the end the Op2b file play-out the video and audio just as it was stored on the tape and the time code information is preserved.

2.3.3.4 Op2c

OP2c OP2c with external ref: 1 file with 2 possible play-outs is MPs English referencing the same video files but different audio files. French (TV show) (Credits)

Op2c is an extension of Op1c and Op2b. It is used to define various play-outs where source video and audio tracks are butted to each other. • This operational pattern enables the creation of a several versions of a program built with essence data originating from different sources.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 40 of 56 Whitepaper

• Example of use: a daily TV show may have the same credits everyday while the content of the program itself is changing. Furthermore, if the program must be broadcasted in countries with different languages, the same video source file can be reused while a different audio track is used.

2.3.3.5 Op3b

OP3b with external ref: the OP3b play-out is built by cutting MP the source MXF files.

Video (TV show) Audio (TV show) Video (Credits) Audio (Credits)

Op3b is an extension of Op2b where source video and audio tracks are cut and butted to each other. • This operational pattern enables the creation of a program from cuts of essence data originating from different sources. • Example of use: producing the teaser of a TV show. It will be built by cutting the source video and audio tracks of the complete program. Without duplicating the video and audio source files, it is possible to define the play-out of the program’s teaser.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 41 of 56 Whitepaper 2.3.3.6 Op3c

OP3c OP3c with external ref: 1 file (Movie) (Credits) with several play-outs built with cuts from video and MPs English audio sources. French English (censored) French (censored) (Movie Teaser) English (teaser)

French (teaser)

Op3c is an extension of Op2c and Op3b:

• This operational pattern enables the creation of several versions of a program built with cuts of essence data originating from different sources. • Example of use: two versions of a movie must be broadcasted in different languages. In the mean time a teaser is built for each language.

We have seen the possible uses of the operational patterns 1b/3c. Op3c is the pattern providing the more functionality. However it is rather complex and some applications do not require all these features. The ideal workflow will probably require different flavours of operational patterns depending on the operation undertaken. We will study more precisely these workflows in practise in the following chapter.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 42 of 56 Whitepaper

2.4 Interoperability And Metadata Exchange In Production Workflow

A file-based TV workflow can be very simple or very tricky. The following picture shows a simple workflow where key steps using MXF files format are described. Two sample cases are explained: the first one describes a simple workflow using MXF Op1a. The second one is based on a compound MXF implementation. The pros and cons for each configuration are described below.

1st case : better interoperability

MAM : Media Asset Management

Automation MTD MTD

Ingest Online Editing Online Play-out Storage Storage server Op1a Op1a Op1a Op1a

Archives Op1a

Pro:

▪ This workflow is the best to ensure interoperability right now as most systems support MXF Op1a. On a production prospective, this workflow gets rid of unwrapping/rewrapping MXF files.

Cons:

▪ This MXF workflow might not be really flexible in the future. Working with a separate audio track from MXF file or adding a new track into archive will require unwrapping the current essences from MXF in order to wrap the new track. ▪ There is no metadata handling inside MXF. The persistency of metadata is not ensured between each process.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 43 of 56 Whitepaper 2nd case : better flexibility

MAM : Media Asset Management

MTD MTD Automation Atom. MXF -> Op1a

Ingest Online Editing Online Play-out Storage Storage server Atomic Op1a Op1a Op1a MXF + Op1b Archives (MTD) Atomic MXF + Op1b (MTD)

Pro:

▪ This workflow mixes the use of MXF Op1a and MXF OpAtom+Op1b (external link) for a better flexibility. The Production workflow (Ingest to play out) is still based on Op1a although the archive process is based on Atomic MXF files (OpAtom or Atomic Op1a)+Op1b MXF implementation. ▪ This architecture better fits the needs of archive growth. Adding a new track to an existing MXF program will only require generating a new Op1b file. ▪ MXF Op1b file might be used to transport the descriptive metadata between the two online storages or to store descriptive metadata into Archive.

Cons:

▪ This workflow requires converting MXF operational patterns. This might have negative effects on performance and cost-effectiveness.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 44 of 56 Whitepaper 2.5 Recommendations On Metadata Usage

• Plan your structural metadata and choose your operational pattern according to what have been discussed in chapter 4 and 5. The choice may be different depending on the application (editing, archiving, broadcasting, etc.) and the hardware/software being used.

• Maximize the use of externally references atomic MXF files to ease up editing and reduce the needs in terms of storage capacity and network bandwidth.

• Create partitions to allow header metadata update during recording, play while record and partial restore. This is also determining for the choice of the operational pattern (OpAtom does not allow multi-partitioning).

• Consider building proxy MXF files during the ingest process. It can speed up the editing process and enable an efficient partial restore (the operator retrieves the proxy file, set its TC IN and TC OUT thank to the MXF time code and then ask for partial restore).

• Use Fill Items (klv containing no data) to plan beforehand future metadata insertion. After recording, the file will be edited several times and its header metadata size will possibly grow. Fill items leave room for expansion.

• How do I define time code? For instance when moving from a tape system to an MXF system, how am I going to keep track of the tape’s time code? o Directly in the header metadata by defining discontinuities in the time code track of an Op1a file? o By creating a new Op1a file every time a new time code is found and generating an Op2b file referencing the Op1a files? o By recording the time code as a data stream interleaved with the video and audio?

• List all the metadata you need and see where it should be stored: o Directly in the header metadata for maximum reusability? o As a data stream for metadata that is continuously updated? o Directly within the essence (AES channel status data, VITC, etc.)?

• Try to reuse the standardized descriptive metadata schemes as much as possible to ensure interoperability. If necessary define your own “dark metadata” schemes.

• Try to automate descriptive metadata insertion at an early stage (ingest).

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 45 of 56 Whitepaper

MXF METADATA V1.0 Draft 26/10/2006 Name Type Exemple Mandatory

Identification UMID 32 byte unique identifier Yes Cheksum of the source file Algorithm to be defined (MD5) Yes Exchange ID (UDID) To be defined Yes File name 16 digits ascii.ext 145EDT253.mxf, 4588pub.mpg Yes

Title Data Title text Secondary Title text Serie Title text Episode Title text

Production data Creation date Date format YYYY/MM/DD or DD/MM/YYYY Yes Production Company text Servicing Company text Yes Advertising Agency text Advertiser text Advertising Body identification number

Status information File status Text ex: rush, PAD, etc Yes

File timecodes TC IN HH:MM:SS:FF 00:00:00:00 Yes TC OUT HH:MM:SS:FF 00:01:29:24 Yes File Duration HH:MM:SS:FF 00:01:30:00 Yes

Trimming data (segmentation 1-n) Segment identification text Nomenclature to be defined Mark In HH:MM:SS:FF Mark Out HH:MM:SS:FF Segment duration HH:MM:SS:FF

Audio information Audio Track 1 standard Audio Track 1 language Code Audio Track 2 standard Audio Track 2 language Code Audio Track 3 standard Audio Track 3 language Code Audio Track 4 standard Audio Track 4 language Code Audio Track 5 standard Audio Track 5 language Code Audio Track 6 standard Audio Track 6 language Code Audio Track 7 standard Audio Track 7 language Code Audio Track 8 standard Audio Track 8 language Code

Video information Aspect ratio text 4/3, 16/9, LB coding to be defined Yes Image format text 1.33, 1.66, 1.85, etc or AFD ??? Yes Compression format text MPGE2-2 422P @ML Yes Compression method text VBR File type text PS,TS Frame rate number (images per second) 25 Bitrate (Mbps) number 30 Yes Chroma format text 4:2:2, 4:2:0 Horizontal resolution number 720 Vertical resolution number 576 Gop structure text IBBP Gop length number 12 Gop type text open or closed DCT precision number 10 DCT type text field VBV buffer size number 576 Picture structure text frame Field topfirst text yes or no Frame type text interlaced Quantscale text Nonlinear Scantype text Alternate PES packets size number 2048

Audio information Audio sampling number 48 Compression method text CBR PES packets size number 2048

Encoding Device data Device Type text Device version text Manufacturer text Device Model text

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 46 of 56 Whitepaper

3 HEVC (Codec for H.265) This recommendation concerns the technical standard for higher compression of video files in television broadcasting.

This codec is designed to enable the use of the coded video representation in a flexible manner for a wide variety of network environments as well as to enable the use of multi-core parallel encoding and decoding devices.

The use of this standard allows motion video to be manipulated as a form of computer data and to be stored on various storage media, transmitted and received over existing and future networks and distributed on existing and future broadcasting channels.

All necessary details are reflected in the “Recommendation ITU-T H.265 document”.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 47 of 56 Whitepaper

4 Audio Standards

4.1 Format PCM: AES 1/2

The control of the level is realised on an almost-instant peak-meter with 10 ms DIN 45406.

Current standard: - SMPTE 299M:”24-Bit Digital Audio Format for SMPTE 292M Bit-Serial Interface”

Level alignments for the PCM tracks :

PPM PPM PPM digital digital almost peak 10 ms (dBFS) (dBFS)

prohibited zone max PEAK level

peak zone

alignment level

permanent mode VU meter

© copyri limit for voice intelligibilityght

Figure 1 : Audio Levels

Alignment Level: The alignment level displayed on a digital peak-meter Full Scale: 18dBFS at 1kHz The alignment level displayed on an almost-peak PPM 10ms DIN 45406: -9 dB IEC 268-10 type 1, Recommendation n°59 OIRT is at –9 dB The alignment level displayed at 0Vu on a Vu meter representing a voltage of + 4dBu

Maximum Peak levels:

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 48 of 56 Whitepaper The maximum tolerated peak-level is 9 db above the reference level: - 9 dBFS on a digital peak-meter (dBFS) or 0 dB on an almost-peak PPM 10ms DIN 45406

Dynamics: The audio-dynamics processing must preserve the peak levels of the level meters. The average modulation should stay in a reasonable range on a VU- meter and not be situated permanently in the red area.

4.1.1 Mono

Track Assignment In mono, the national version, for example the VF (French Version) is recorded on track 1 & 2 in PCM mode without compression, fully identical and in phase.

Audio/video Synchronisation The audio and video signals are synchronous within a max deviation of +/- 20 ms.

Reference Frequency Mono: track left & right, continuous frequency of 1000 Hz at reference level

4.1.2 Stereo

Track Assignment Track 1 & 2 of AES 1/2 are delivered in PCM without compression. In stereo mode, track 1 represents the left channel; track 2 represents the right channel.

Audio/video Synchronisation The audio and video signals are synchronous within a max deviation of +/- 20ms.

Audio Phase Average between left & right channel of a stereo signal, intensity and phase.

Reference Frequency Stereo signals: Track 1 of AES 1/2 (left) 1000 Hz with breaks of 0.25s each 3 seconds Track 2 of AES1/2 (right) 1000 Hz continuous The tone of the 2 tracks should be consistent and in phase

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 49 of 56 Whitepaper

4.2 Dolby Surround

Compatibility: the reduction “Surround ➔ Stereo” and “Surround => Mono” should keep the spatial coherence of the sound image. The audio result of the “reduction from Dolby Surround to mono” should not change the intelligibility and tone of the audio message.

The Lt/Rt surround channels will be recorded on track 1 & 2 (stereo) The products originated in Dolby-Surround must not be decoded.

4.2.1 Format Dolby-E : AES 3/4 (and AES1/2 On Special Request)

Audio/video Synchronisation : :

By default the audio and video signals are recorded synchronously. (If a Dolby-E decoder needs to process the signal it obviously will introduce a delay of the audio by 1 frame to the video signal)

Track assignments The multi-channel Dolby-E coding for HD-CAM should be recorded on AES 3 & 4. (and on AES1/2 on special request of the broadcaster)

Dolby-E has 2 or 8 audio channels: ➢ Programme in MONO only: Dolby-E coding will be in 2.0 stereo or the mono signal will be duplicated on track 1 and 2 of the Dolby-E (audio channel 2+2 with a silent second stereo signal) ➢ Programme in STEREO only: Dolby-E will be in 2.0 stereo or 2.0 Lt / Rt (Dolby Surround Prologic II) by encoding AES 1/2 into Dolby-E with the associated metadata (audio channel 2+2 with a silent second stereo signal) ➢ Programme in 6 channels (referred as 5.1): Dolby-E will be 5.1+2.0 Lt/Rt following the SMPTE 320M recommendation and with the associated Meta-data (audio channel 5.1+2) :

▪ Track 1 : Front Left (L) ▪ Track 2 : Front Right (R) ▪ Track 3 : Centre (C ) ▪ Track 4 : Subwoofer (SUB or LFE) ▪ Track 5 : Rear Left (RL) ▪ Track 6 : Rear Right (RR) ▪ Track 7 : Lt (could by defined by the broadcaster) ▪ Track 8 : Rt (could by defined by the broadcaster)

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 50 of 56 Whitepaper Down-Mixing Compatibility:

The reduction “5.1 => Stereo” and “5.1 => Mono” should keep the spatial coherence of the sound image. The audio result of the “reduction from 5.1 => Stereo => mono” should not change the intelligibility and tone of the audio message.

4.2.2 Procedures For Measuring Dolby-E

For a close simulation of “Client” reproducing conditions, the monitoring should not be done directly on the output of the mixer, but with the use of a simulation of the reproducing parameters of the associated meta-data.

The monitoring type should be conforming to the standard IEC 60268-5.

The validation procedure for a Dialog Level :

The dialog level & the measurement of the Loudness must be realised with the following equipment:

▪ Dolby ™ LM100 without speech filter ▪ Dolby ™ DP570 release December 2005 ▪ Or equivalent

The Dialog Level referred in the meta-data must correspond to the Loudness value measured on the entire relevant programme and of all audio channels (2.0 or 5.1)

Depending on the existence of a Dolby-E mix there are different cases : - If Dolby-E exists in the programme, the meta-data of the Dialog Level will be put in the Dolby-E stream - If Dolby-E does not exists in the programme, the meta-data of the Dialog Level will be put on -27 +/- 4 dBLeq(A)

4.3 Organization Of The Content

The following picture represents the expected formatting of a tape for its audio, video and time-code content:

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 51 of 56

Whitepaper

0 0 0

0

m

0 0 0

m

0 : : :

g

:

g

2 7 0

p

2

p

5 5 0

2

: : :

n

s

i

:

n

f

9 9 0 i

video & audio 8

mire f

0

5 5 0

TC End TC

5 : : :

TC Start TC

C

3

:

C

9 0 (mono / 9

vidéo & audio T

9

T +

0 0 1 0 (monostereo/ / Dolby-stéreo / DolbyE) -E)

Fonds Clap ProgrammeProgramme VidéoVideo HD HD neutresneutral

StereoStéréo L G ProgrammeProgramme AudioAudio AES AES 1 trackPiste 1 -- PCMPCM Stéréo Stereo R Programme Audio AES 1 Piste 2 - PCM D Programme Audio AES track 2 - PCM

Gauche Droite Centre Programme Audio AES 2 track 3&4 Dolby-E Sub Programme Audio AES 2 Pistes 3 & 4 - Dolby E (5.1+2.0) Ar G Dolby E (5.1+2.0) 5.1+2.0 Ar D

Stéréo G Stéréo D

Figure 2: Organisation of the content

4.3.1 AUDIO AES 1/2

4.3.1.1 From A PCM Stream

from 09:58:22:00 to 09:59:51:24

Recording of 1 minute 30 seconds of a reference frequency on reference level (- 9 dB on an almost-instant peak-meter 10ms or -18 dBFS) ➢ 1000Hz for mono audio on track 1&2, ➢ 1000 Hz for stereo audio, discontinued and interrupted for 0,25s each 3s on track1 and continuous on track 2

The tones of the 2 tracks must be coherent, (same origin) and in phase.

from 09:59:52:00 to 09:59:59:24

Silence for 8 seconds

from 10:00:00:00

Start of programme

from TC End of Programme

Starting at TC end of Programme for a duration of 30 seconds: Black with Silence

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 52 of 56 Whitepaper 4.3.1.2 For A Dolby E Stream

from 09:58:2 :00 to 09:59:51:24

For a configuration 5.1 + 2.0: For this duration the following audio outline should be recorded on the Dolby-E track. The 2 seconds sound should be 1000Hz and 40Hz for the track 4 (LFE)

GaucheLEFT 10s 10s 10s 10s 10s 10s 10s 4s 1 2s 2s 2s 2s 2s 2s 2s 2s

DroiteRIGHT 4s 10s 10s 10s 10s 10s 10s 10s 2 2s 2s 2s 2s 2s 2s 2s 2s2s

CentreCENTER 2s 10s 10s 10s 10s 10s 10s 10s 2s 3 2s 2s 2s 2s 2s 2s 2s 2s

SubSUB 10s 10s 10s 10s 10s 10s 10s 6s 4 2s 2s 2s 2s 2s 2s 2s

ArREAR. Gauche LEFT 8s 10s 10s 10s 10s 10s 10s 8s 5 2s 2s 2s 2s 2s 2s 2s

ArREAR. Droit RIGHT 6s 10s 10s 10s 10s 10s 10s 10s 6 2s 2s 2s 2s 2s 2s 2s

Lt 7 3s 8s 0.25s Rt 8 1'30" © 1'30" copyright Figure 3: Head leader Dolby-E 5.1+2.0

For a configuration 2.0: The organization of the PCM stereo audio will be identical.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 53 of 56 Whitepaper

Lt

1 3s 8s 0.25s Rt 2 1'30"

3

4

5

6

7

8 © 1'30" copyright Figure 4 Head Leader Dolby-E 2.0

from 09:59:52:00 to 09:59:59:24

Silence with a duration of 8 seconds

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 54 of 56 Whitepaper 4.3.2 AUDIO AES 3/4

4.3.2.1 For The Dolby-E Stream

Gauche 10s 10s 10s 10s 10s 10s 10s 4s 1 2s 2s 2s 2s 2s 2s 2s 2s

Droite 4s 10s 10s 10s 10s 10s 10s 10s 2 2s 2s 2s 2s 2s 2s 2s 2s2s

Centre 2s 10s 10s 10s 10s 10s 10s 10s 2s 3 2s 2s 2s 2s 2s 2s 2s 2s

Sub 10s 10s 10s 10s 10s 10s 10s 6s 4 2s 2s 2s 2s 2s 2s 2s

Ar. Gauche 8s 10s 10s 10s 10s 10s 10s 8s 5 2s 2s 2s 2s 2s 2s 2s

Ar. Droit 6s 10s 10s 10s 10s 10s 10s 10s 6 2s 2s 2s 2s 2s 2s 2s

Lt

7 3s 8s 0.25s Rt 8 1'30"

1'30" Figure 5: Head leader Dolby-E 5.1+2.0

from 09:58:22:00 to 09:59:51:24 For a 5.1 configuration: The audio outline for this duration should be recorded on the Dolby-E track. The 2-second sound is a 1000Hz and a 40Hz for track 4 (LFE).

(The organisation of the Stereo PCM audio is identical for the 2.0 part)

For a 2.0 configuration The organisation of the Stereo PCM audio is identical.

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 55 of 56 Whitepaper

Figure 6 : Head leader Dolby-E 2.0 Lt

1 3s 8s 0.25s Rt 2 1'30"

3

4

5

6

7

8

1'30"

from 09:59:52:00 to 09:59:59:24

Silence with a duration of 8 seconds

White Paper RTL Group/Standard Recommendations/Draft January 2007 Page 56 of 56