SMPTE ST-2110 Standard for Television Production in an IP enviroment Introduction

The phenomenal success of Internet technology in recent years is leading broadcasters to adopt new ways of working. Almost every part of a broadcaster's production chain has now evolved into an IT- and IP- based infrastructure. Today, only one part of the production chain still relies on dedicated SDI based networks: this is live production. SDI is mainly physical, but IP is mostly virtual (software).

For the first time in the history of live television we can abstract away the , audio and metadata streams from the underlaying hardaware. This innovation presents unprecedentd opportunities empowering broadcasters to deliver flexibility scalability, and highly efficient workflows. Standards Matters

The Development and Deployment of open standards is essential to the transition from traditional SDI workflows to IP-based workflows in television production and distribution.

IP networks, specifically using ST2110, allow broadcasters to create frame-accurate metadata and maintain its timing relationship to video and audio throughout the studio, production, and transmission chains. IP Provides Efficient Workflows

REMI (remote-integration model) “At-Home” or backhaul outside broadcasts are easily achieved due to the adoption of ST2110. Rather than send a complete crew and production team to a stadium for a sports event, broadcasters just dispatch the essential camera and sound operators. Video, audio, and any associated metadata is streamed back to the studio over IP circuits provided by Telco’s allowing the whole production to take place at the studio.

This has an obvious advantage in terms of accommodation and subsistence costs for crews. However, by switching the IP circuits from subsequent stadiums to the studio, one studio crew can cover several football matches or events. Change of Paradigm

We no longer think in terms of video line and frame timing but can now think in terms of “events ” or “grains”. Each video frame or audio sample is a grain with a unique timestamp. It is possible to collect many different grains via different protocols or networks together using unique time of origin reference. Open Standards

Existing Standards: SMPTE 2022 Family – 1/2/3/4: MPEG-2 Transport Stream over IP. 5/6: SDI over IP. Both of these are multiplex standards, where the video, audio and signals are wrapped up into a single IP stream. But IP is a multiplex standard. Why we are carrying multiplexes inside multiplexes?

SMPTE-2110 puts each parts of the signal into a different stream. SMPTE 2110

SMPTE 2110-10 System Timing SMPTE 2110-20 Uncompressed Active Video SMPTE 2110-21 Traffic Shapping SMPTE 2110-30 PCM Audio SMPTE 2110-31 AES3 Transparent Transport SMPTE 2110-40 Ancillary Data AMWA-NMOS IS04/IS05/IS06 SMPTE 2110-10 System Timing

This standard specifies the system timing model and the requirements common to of all of the essence streams.

Includes RTP, SMPTE ST 2059, SDP protocols. RTP (Real-time Transport Protocol) proven technology for transporting time-critical data over UDP packets.

SMPTE ST 2059-2(PTP) based on IEEE 1588 standard.

SDP (Session Description Protocol).

SMPTE 2059-2(PTP) is used to distribute time and timebase to every device in the system. PTP uses a master-slave architecture to achieve a synchronous synchronize to the master. Senders mark each packet of video, audio and ancillary data with an RTP timestamps that indicates the sampling time. Receivers compare these timestamps in order to properly align the different essences parts to each other. Users can Mix and Match essence from any source!!!

3 type of clocks in the system: Wall clock - provided by Grandmaster local copy of the wall clock in each node Media clock – derived from the local clock (i.e. 48 kHz for audio, 90 kHz for video) RTP clock – derived from the media clock Essence data (audio samples or video frames) is related to the media clock upon intake - essentially receiving a generation “time stamp” with respect to the media clock .

Fixed / determinable latency by configuring a suitable link offset (“playout delay”) Inter-stream alignment by comparing and relating the time stamps of individual essence data PTP timestamps can be thought of as a continuous counter that is incrementing every nanosecond. The absolute value is referenced to the epoch-time at midnight on 1st January 1970. In other words, the timestamp is the number of nanoseconds that have elapsed since the epoch. When a video, audio, or metadata packet is created, a PTP timestamp is appended to it, enabling the receiver to know exactly when the packet was created to allow it to reconstruct the frame of video, samples of audio, or reference the metadata. It describes the behavior of PTP that we want on a broadcast network—how often messages are transmitted, how communications are established, and so on. Plus, we also transmit a bit of metadata in this profile, which handles certain elements that are not inherently built into PTP. For example, although it seems obvious when you hear it, we had to include the default frame rate of the system. Typically, you just pick up a genlock signal, and any piece of equipment can discover this, but with PTP, there is no way of knowing what TV system you are using. A master generator has to tell a slave through a message in PTP. We pre-tuned PTP in ST 2059 for our industry, and that same work plugs directly into ST 2110, allowing PTP to become the replacement for genlock reference in an IP-based system. Section Description Protocol (SDP).

Each stream has a set of metadata that tells the receiver how to decode waht is inside of it distributed by the control system.

Senders expose an SDP for every stream they make. The metadata contains: • Sender description •Video and/or audio essence •Raster size (in pixels) •Frame-rate (video) •Channel count (audio) •Sampling structure (audio/video) •Bit depth (audio/video) •Colourimetry •Source IP address and port •RTP payload ID (audio/video) •PTP grandmaster source and domain

SMPTE 2110-20

Specifies the real-time, RTP-based transport of uncompressed active video essence over IP networks. An SDP-based signalling method is defined for image technical metadata necessary to receive and interpret the stream. ST 2110-20 describes the transport of full raster video, meaning that we are taking the entire active picture and none of the ancillary data space—we are just transporting the pixels. RTP based SamplingYCbCr-4:4:4,YCbCr-4:2:2, YCbCr-4:2:0, ICtCp-4:4:4, ICtCp-4:2:2, ICtCp-4:2:0 Bit depth 8, 10, 12, 16, 16f ColorimetryBT601, BT709, BT2020, BT2100, ST2065-1, ST2065-3, DCI-D65, DCI-D60 TCS(Transfer Characteristic System): “SDR” (Standard Dynamic Range) “PQ” (HDR using Perceptual Quantization from ITU-R BT.2100), “HLG” (HDR using Hybrid Log-Gamma from ITU-R BT.2100), “LINEAR” (linear encoded samples) “DENSITY” (E.g. SMPTE ST 2065-3.) Only active Video NOT BLANKING

SMPTE 2110-30 and SMPTE 2110-31

SMPTE 2110-30 Built on AES67 (PCM Audio only) AES67 technology components: Synchronization: IEEE 1588-2008, default profile (media profile suggested) local media clock generation Network: IPv4 (IPv6), unicast / multicast & IGMPv2 Transport: RTP/AVP (RFC 3550 & 3551) / UDP / IP Encoding: 16 / 24 bit linear, 48 (44.1 / 96) kHz, channel count: 1..8 Packet Setup: 48 samples (6 / 12 / 16 / 192), max. payload size: 1440 bytes Tiny (compared to the video)

A 2-channel stream is:

(2 channels) * (24 bits) * (48000 samples) * (1.08 RTP)= 2.5 Mbits/sec

An 8-channel stream is:

(8 channels) * (24 bits) * (48000 samples) * (1.05 RTP) = 9.7 Mbits/sec

2110-30 deals only with PCM audio 2110-31 provides bit-transparent AES3 over IP Can handle non-PCM audio Can handle AES3 applications that use the user bits Can handle AES3 applications that use the C or V bits

2110-31 is always “stereo” (like AES3) SMPTE 2110-40

Over the years, lots of things have been put into the SDI “Ancillary Data” system Some are tightly related to the video signal. Some are really separate essence. Some are just along for the ride.

AFD, TimeCode, Close Caption or DVB-Subtitle, etc.

AMWA-NMOS

NMOS consists of three pieces. One is called IS-04, which is the registration and discovery protocol, so that every ST 2110 device on a network, when you plug it in, will talk to a server and disclose its capabilities. The second piece, currently underway, is called IS-05. That piece translates a request for connecting a source to a destination to the necessary network transactions required to cause that connection to occur. It’s essentially the interface for connection management—what your routing switcher control panels talk to. And a third piece, under development, is called IS-06. Its purpose is to define a common API for Software Defined Networking (SDN). Currently, routing inside the switch is done through normal multicast routing techniques. But with SDN, you can gain greater switch utilization, manage latencies, and accrue many benefits by tuning the routing according to what you know about what is going on. So, IS-06 will be another interface, an API to a software-defined network controller.”

By employing a control system that uses the IS-04 Query API, all IS-04 Devices can be discovered and modeled into a user interface, assuming that Nodes have already registered via the Registration API. The user interface can then list all available Senders and Receivers, similar to a physical patch panel of coax connectors. By itself, ST 2110 is just a set of protocols, but these three pieces from AMWA will be the intelligence side of the system, within the greater fabric. The idea is that, using these NMOS tools, vendor control systems can now connect to the network all in the same manner through a method that allows them to collaborate. Right now, to achieve control interoperability, if you have a Brand X router and want to use Brand Y control panels, a protocol gateway is needed. This system provides one single way for vendors to control the routing. That broadens interoperability hugely, and enables best-in-class customer equipment choices. Conclusion

Whereas, ST 2110 takes the video/audio essence apart, we can transport video on one stream, audio on another stream, and metadata on yet other streams. That opens the door to [many advantages] such as taking audios and sending them off independently into an audio sub-system, without the burden of all the video overhead of SDI; or taking a closed-captioning stream and sending it to a service in the Cloud over IP, since IP connects us to the Cloud now, and that cloud service could translate it to many languages—both text, and spoken word—and send the resulting caption and audio streams back for multi-language program integration into the system workflow.

with ST 2110, we can now build highly efficient and flexible media systems, which move around and deal with only the essential pieces needed.

Thank You