AoIP and Networking: Network Primer V2

Introduction to professional audio networks - 2017 edition The established policy of Calrec Audio Ltd. is to seek improvements to the design, specifications and manufacture of all products. It is not always possible to provide notice outside the company of the alterations that take place continually. No part of this manual may be reproduced or transmitted in any form or by any means, Despite considerable effort to produce up to electronic or mechanical, including photocopying date information, no literature published by and scanning, for any purpose, without the prior the company nor any other material that may written consent of Calrec Audio Ltd. be provided should be regarded as an infallible Calrec Audio Ltd guide to the specifications available nor does Nutclough Mill Whilst the Company ensures that all details in this it constitute an offer for sale of any particular Hebden Bridge document are correct at the time of publication, product. West Yorkshire we reserve the right to alter specifications and England UK equipment without notice. Any changes we make Apollo, Artemis, Summa, Brio, Hydra Audio HX7 8EZ will be reflected in subsequent issues of this Networking, RP1 and Bluefin High Density Signal document. The latest version will be available Processing are registered trade marks of Calrec Tel: +44 (0)1422 842159 upon request. This publication is for International Audio Ltd. Dolby®E is a registered trade mark Fax: +44 (0)1422 845244 usage. of Dolby Laboratories, Inc. All other trade marks Email: [email protected] are acknowledged. Calrec Audio Ltd reserve the right to change calrec.com specifications without notice. E & O.E. © 2018 Calrec Audio Ltd. All Rights Reserved. NETWORK PRIMER V2 CONTENTS

Forward 5

Introduction 7

Chapter One: The benefits of networking 11

Chapter Two: Some technical background 19

Chapter Three: Routes to interoperability 23

Chapter Four: Control, sync and metadata over IP 27

4 NETWORK PRIMER V2 NETWORK PRIMER V2 FORWARD

This Primer is a new edition that updates field of computer networking technology the original 2013 Calrec Networking might gain a clearer understanding of the Primer to reflect the many changes that subject. have occurred in the world of broadcast networking technology since then — four This Primer aims to explain the increasing years is a long time in this field. use of data and IP-based networking technology in broadcast applications, Nonetheless, our aim is still to explain examines the benefits that this technology the background and technology behind can offer forward-thinking modern data and IP-based networks with broadcasters, and considers where it may specific reference to its application in take the world of broadcast in the future. broadcasting, such that a broadcast engineer with no formal training in the Hebden Bridge, Spring 2017

calrec.com 6 NETWORK PRIMER V2 NETWORK PRIMER V2 INTRODUCTION

calrec.com INTRODUCTION

The word ‘network’ has a long history; it was first recorded in the mid-sixteenth century, but its roots hint that it may well have been in use much earlier than that. Like much of modern English, it has gone through a variety of ever more abstracted meanings, from a description of a physical object to more figurative, intangible concepts. What began as a way of describing an invention for catching fish was later applied to similarly interconnected physical systems, including those of canals, railways, telephone wires – and eventually computers.

By the early 20th century, the word had come to refer to systems of related entities that were no longer physically connected, as with radio and TV transmitters belonging to a single broadcaster, or groups of business colleagues.

In the last few years, we’ve had ‘wireless networks’ (which medieval speakers of English might have regarded as an oxymoron) and ‘social networks’ composed of users that ‘connect’ solely via data transmissions over the Internet; itself a network of physically disconnected, but nonetheless linked computers.

A similar process of abstraction has been taking place in the world of broadcast technology, transforming the hard-wired, localised studios of the past into more flexible, networked systems. Forty years ago, television studios consisted of cameras and microphones hard-wired into discrete hardware vision mixers, patchbays and audio mixing consoles which routed to specific video tape machines in one location. In such systems,

8 NETWORK PRIMER V2 INTRODUCTION

a separate physical connection is required and even to work ‘in’ - virtual broadcast greater flexibility and geographical for each audio channel, whether from a studios, where the control surface is in freedom available to them. microphone, mixing console, or recording one location, the DSP used for the mixing device. in a completely different building, and the In the longer term, the broadcast industry audio inputs and outputs somewhere else will further borrow from the IT industry Modern networked broadcast systems altogether - perhaps even in a different by shifting away from bespoke hardware offer more flexibility; all of the hardware country. towards software processing running on is permanently connected to a data commodity computing platforms. While network, and the precise nature of the As long as they are all connected to the not all broadcast processes will fit this interconnections between the equipment same network, they should be able to model, many will, and in doing so, offer can be redefined and reassigned at any interoperate just as effectively as the days benefits in scalability and economy. time under software control, remotely if when all these components were part required. of a single mixer in a hard-wired studio, As the shift to an IP infrastructure permanently located in one place. continues, we will be encouraged to drop Such ideas are not new, and small- our conventional signal-based approach in scale proprietary networks of this type These more flexible, abstracted favor of a services model, where content, have existed in broadcasting for many implementations of once-tangible, both live and stored, may be discovered years. But the declining complexity hard-wired, and immovable resources, and accessed by anyone in possession and improving cost-to-benefit ratio of offer broadcasters many benefits, as of access rights and an appropriate IP implementing large networked broadcast we shall see. These include the ability to connection, regardless of their location. systems, coupled with the widening move projects swiftly from one studio to capabilities of the technology, has another by reassigning connections, or However, at the time of writing, much tempted more and more of the world’s controlling aspects of the mixing process remains to be done. Standards enabling forward-thinking broadcasters to move to remotely, without having to be in the same practitioners to define and realise all of networked systems. geographical location as the event being these possibilities are still being written mixed. and ratified, and manufacturers are still In the past few years, the spread of IP- developing networking protocols based based IT networking technology across At the same time, we’re getting closer on open standards that will allow their the world has furthered the development to the goal of being able to transmit equipment to interface and work together. of such networks. Increasingly, broadcast broadcast audio, video, sync information, technology manufacturers and standards control data and metadata together over There is no doubt that there are organisations are developing systems that agnostic, scalable IP-based data networks huge opportunities for forward- utilise, at least in part, hardware, standards consisting simply of data cable and IT thinking broadcasters and technology and data transmission protocols originally switches. manufacturers who are prepared to created for the world of IT, rather than embrace and engage with the changes. developing proprietary systems. This promises further advantages which It’s time to look at some of these benefits will transform how audio, video and in more detail. This is bringing costs down further, as control data is encoded, transported and well as adding a more global dimension managed, such as the prospect of being to the capabilities of the networks being able to dispense with separate audio, developed for broadcast. video and data transports, expensive analogue audio and video cabling, and As IP networks make geography and relatively inefficient embedded-audio the physical proximity of resources less video interfaces such as SDI. and less important, the very concept of a studio itself is becoming more abstract. In the medium term, broadcast workflows Already, it is possible to conceive of - will evolve to take advantage of the

CALREC Putting Sound in the Picture 9 10 NETWORK PRIMER V2 NETWORK PRIMER V2 CHAPTER ONE: BENEFITS OF NETWORKING

calrec.com CHAPTER ONE: BENEFITS OF NETWORKING

A networked broadcast studio, of these simultaneously are still under dedicated to a single room again when the editing suite or transmission station development, but progress towards that job is complete. isn’t necessarily more efficient goal is moving fast. than a hard-wired one — indeed Even with patchbays that allow flexible for smaller broadcasters, the You may well ask what difference it makes interconnection between studio and costs associated with setting up a whether audio is being routed around control room, this would be a challenge network can outweigh the benefits. a studio via CAT5 network cable, fibre- in a traditional, hard-wired studio. What’s But introducing networking into a optic or co-axial links, or even analogue more, achieving such ‘super-studios’ via large-scale broadcast environment tie-lines? And while the idea of leveraging traditional audio connections, whether can benefit the whole system; IP-based IT infrastructure promises analogue or digital, requires the running networked equipment is both more lower-cost connectivity in the longer term, and temporary installation of a lot of accessible and more flexible than for broadcasters with existing facilities, extra, expensive cables and looms, and its hard-wired counterpart. there’s still the cost of installing it all these increases the risk of on-air faults. network connections in the first place. To understand this, consider what the But in a new-build studio designed around invention of audio patchbays did for Routing & Cost Benefits a network from the outset, the cost of arrays of hard-wired studio equipment. the network interconnection cabling is Studios function perfectly efficiently when Nonetheless, there are a number of minimal and all the hardware is already equipment is connected directly, but time benefits associated with a networked connected to the network. The routings is always lost whenever new equipment approach, and the larger and more are simply reassigned by control software. is connected and its output needs to complex the facility, the more attractive A few clicks of a mouse, and the work is be made available to other parts of the these are - especially if the studio is done — or undone. system. Introducing patchbays to studios a new-build project that requires no initially costs money and the time to wire investment to replace or adapt existing Such flexible workflows are increasingly everything up to the patchbay, and some infrastructure. apparent in modern broadcast and studios chose to save themselves that the genie is out of the bottle. The next time and expense. Firstly, modern mixing consoles can take logical step once audio is networked is to control of all the audio routing in a studio if connect one console’s router to another. It However, once a patchbay has been networked, allowing broadcasters to save then becomes very simple to transfer all of integrated into a studio, any input can be themselves the expense of a separate the audio being received at one console, routed to any output, producing significant audio router or patchbay, together with all or even just at one I/O box, to a console in long-term savings. Introducing new the connections and wiring to it. The more a completely different studio for mixing. equipment to the system and giving it the studios you have, the lower the outlay on same routing flexibility becomes a simple routers and patchbays. matter of connecting it to the patchbay. The second advantage is flexibility, When MIDI patchbays were invented, the which is again a particular benefit in routing of signals also became remotely broadcast complexes with large numbers controllable, and this greater flexibility and of mixing consoles, control rooms and remote controllability is another benefit of studios. Using a network and an audio networked studios. routing protocol/management system, it’s possible to route microphone sources But as we approach the 2020s, data from a wallbox in one studio into the networks have sufficient bandwidth to control room of another in seconds, or do much more than route audio around; assign the mixer in one control room the today it’s possible to send broadcast task of mixing the combined output of two video, audio, meta and control data over or more studios, and then return it to being a network. Standards to transmit all

12 NETWORK PRIMER V2 CHAPTER ONE: BENEFITS OF NETWORKING

Mixing Console - including Mixing Console - including control surface, built-in CPU & control surface, built-in CPU & DSP, plus... DSP, plus...

Built-in Router Built-in Router

External I/O Box External I/O Box (Analogue, Digital, (Analogue, Digital, SDI etc) SDI etc) External Router

Mixing Console - including Mixing Console - including control surface, built-in CPU & control surface, built-in CPU & DSP, plus... DSP, plus...

Built-in Router Built-in Router

External I/O Box External I/O Box (Analogue, Digital, (Analogue, Digital, SDI etc) SDI etc)

Figure 1 - STAR NETWORK audio being received from one studio and of high-resolution audio down a single mix it in another, or to route that audio inexpensive -style network cable. Given the bandwidth of today’s networks, to another studio to create different which typically allow many hundreds mixes for (say) international versioning or But even a star network is only the of audio channels to be passed down commentary. On a rolling news show, the beginning once multiple consoles are a single connection, there’s no need to production team in one control room can networked. In Figure 1, each console stop at interconnecting a pair of consoles be mixing the live audio from the news still has its own dedicated I/O interface and their associated I/O. Why not link studio and can hand the audio from that (or more usually in this day and age, many consoles together with a stand- studio over to the incoming team in a several interfaces, each handling different alone audio network router, and thus different control room and go off shift. audio output formats). This is still quite a allow several mixers to freely swap audio traditional structure that owes much to channels? This is the basis for a star Or the team in one control room can the days when the I/O was a fixed part network like the one above (figure 1), with switch from mixing the output of one of individual consoles. A network permits several consoles connected to a stand- studio or sound stage to working on the something more like the modified star alone router. output from a different one in seconds. structure shown in Figure 2. Here an Such things can be done with analogue assortment of I/O interfacing boxes in a In a broadcast complex structured like or digital tie-lines, but a vast amount of central location is shared commonly by this, sound stages or studios are (quite expensive wiring is required. This is not a all the consoles in a broadcast complex, literally) no longer tied to a single control concern with networked audio given that and is connected to them via the central room. It’s very simple to take multi-channel you can route several hundred channels stand-alone router.

CALREC Putting Sound in the Picture 13 CHAPTER ONE: BENEFITS OF NETWORKING

Mixing Console - including Mixing Console - including control surface, built-in CPU & control surface, built-in CPU & DSP, plus... DSP, plus...

Built-in Router Built-in Router

External Router

Mixing Console - including Mixing Console - including control surface, built-in CPU & control surface, built-in CPU & DSP, plus... DSP, plus...

Built-in Router Built-in Router

External I/O Box External I/O Box External I/O Box External I/O Box (Analogue, Digital, (Analogue, Digital, (Analogue, Digital, (Analogue, Digital, SDI etc) SDI etc) SDI etc) SDI etc)

Figure 2 - MODIFIED STAR Networks and Resilience thousands of channels of audio - can NETWORK be duplicated on a few Ethernet-style IT For the same reason, it’s considerably less cables. Packing hundreds of channels of audio costly to design failsafe systems if your into a single high-bandwidth networked broadcast audio is part of a network. In Many broadcasters choose to mirror data connection, where connections broadcast, the failure of mission-critical their resources across a network in this can be easily made and reassigned, connections live on-air is unacceptable. way, assembling identical equipment in encourages the construction of complex Modern broadcast control centres like to physically or geographically separate workflows and network topologies that factor redundancy into their designs, so buildings which may be networked would be difficult to achieve with standard that every connection has a backup which together, but may also continue to audio connections. can easily be pressed into service in the function independently in the event of one event of a failure. half of the network becoming unusable, as When one considers the costs of wiring in figure 3, which shows one such ‘linked a national broadcaster’s transmission Doing this with traditional analogue or star’ network. control centre, it becomes clear that audio digital connections requires twice the networks can offer significant savings. amount of cabling and a lot of complicated In this way broadcasters can be said cable splits. With networked audio, the to achieve ‘resilience’ in their systems entire output of a master control room or more easily and affordably than those transmission centre - which may include

14 NETWORK PRIMER V2 CHAPTER ONE: BENEFITS OF NETWORKING

employing traditional designs. Several Many audio consoles, including those what do studio owners do if routine leading broadcast audio mixing console from Calrec, offer this way of working as maintenance needs to be performed manufacturers also take advantage of part of a standard setup; the fundamental on one studio or control room, and the this technology to offer highly resilient hardware components that drive the latest program that usually transmits from that network structures with redundant generation of consoles (such as the main studio is due to be broadcast? Or, in a hardware as well as routing. processor, router, and DSP cards) may commercial broadcast complex, what be supplied in pairs as standard. Placing happens when a last-minute booking is In such systems, the fundamental one set of the pairs in a different location received from an important client and hardware components of the audio and linking them via the audio network needs to be accommodated without console can be duplicated, and the is a simple matter. Once again, it stands disrupting the usual assignment of studios duplicates placed in separate physical repeating that offering such resilient and control rooms to other clients? locations (figure 3). In the event that one workflows without networked audio would of the locations is rendered inoperable be incredibly complex and expensive. In a networked broadcast complex, by power failure, fire, flooding or some the output of any studio can easily be other unforeseeable natural or man-made Networks & Contingency Planning assigned to a different control room, catastrophe, the components in the other or conversely a familiar control room location can take over and be switched Networks also make contingency planning can be used to mix the output from a into operation seamlessly over the easier and more flexible. For example, different studio. network, ensuring continuity of operation.

Mixing Console - including Building 1 Building 2 Mixing Console - including control surface, built-in CPU & control surface, built-in CPU & DSP, plus... DSP, plus...

Built-in Router Built-in Router

External I/O Box External I/O Box (Analogue, Digital, (Analogue, Digital, SDI etc) SDI etc) External Router External Router

Mixing Console - including Mixing Console - including Mixing Console - including Mixing Console - including control surface, built-in CPU & control surface, built-in CPU & control surface, built-in CPU & control surface, built-in CPU & DSP, plus... DSP, plus... DSP, plus... DSP, plus...

Built-in Router Built-in Router Built-in Router Built-in Router

External I/O Box External I/O Box External I/O Box External I/O Box (Analogue, Digital, (Analogue, Digital, (Analogue, Digital, (Analogue, Digital, SDI etc) SDI etc) SDI etc) SDI etc)

Figure 3 - SPLIT ROUTER CORE

CALREC Putting Sound in the Picture 15 CHAPTER ONE: BENEFITS OF NETWORKING

Remote Production Again, it’s possible to read this and think one broadcast truck, together with its ‘But why introduce network technology? associated staff: camera operators, vision A more recent application for networked Remote Production has been possible and audio mix engineers, and support studios has only become possible now for decades already — that’s what engineers. Such a team usually comprises that IP technology is making data- Outside Broadcast is.’ This is true, but several highly skilled and experienced intensive global intercommunication a IP technology is making it possible to (and therefore equally expensive) reality, such that high-resolution audio undertake Outside Broadcast at a fraction personnel. and video can be moved around the world of what it has hitherto cost, opening up reliably and affordably. new possibilities for broadcasters. With IP technology, it is possible to scale back the equipment and personnel This is the step that promises to render Until recently, remote production — that is, required at the site of the event to some geography irrelevant, and allow the covering a sports, news or entertainment audio and video capture devices (ie. distribution of the components formerly event a significant distance from a cameras and microphones), some digital held in a single studio across several traditional broadcast studio (figure 4) — signal processing hardware of the kind locations, internationally if required. has required the deployment of at least found at the heart of modern broadcast

Figure 4 - REMOTE PRODUCTION

16 NETWORK PRIMER V2 CHAPTER ONE: BENEFITS OF NETWORKING

consoles, and some interface units such the broadcast trucks and staff needed allow audio and video to be transmitted that the captured audio and video can be to make broadcast coverage a reality. In together across an IP network, other than uploaded to a high-resolution broadcast today’s diversifying industry, broadcasters by inefficiently embedding the audio with network. Far fewer staff are required to have an ever-growing need for more the video, de-embedding it for processing set up this equipment on site and keep it content, but have fewer resources or mixing, and re-embedding it afterwards, operational, reducing the financial outlay with which to capture it. Networked which adds processing time at every of a dedicated on-site team. infrastructures can deliver a possible stage. solution to this difficult problem, and Once this equipment is at the event venue broadcast audio manufacturers including Throughout this time of transition, the and connected to a secure IP network, the Calrec are now delivering the technology Holy Grail has been a set of networking clever part is that the audio can be readied to make it happen. standards that allow the use of a single for broadcast centrally, at a traditional high-capacity IP network for all of a broadcast facility. To a networked mixing Interoperability — The Holy Grail broadcaster’s infrastructure: IT, phones, control surface, the audio interface intercoms, broadcast audio and video. boxes and mixing DSP at the venue are The advantage conferred by the network- connected just as dependably as if they based infrastructure described so far is The goal has been that these standards, were in the same room as the control undeniable, but the development of the properly written, will allow overall system surface. technology that allows broadcasters to description and management, equipment implement these new workflows has been monitoring, and describe control protocols The audio captured by the on-site slow and piecemeal. to allow one set of devices to control microphones and uploaded to the others on the network, irrespective of their network by the audio interface boxes at Although data networks have been original manufacturers. the venue can be mixed from afar on the interconnected across the world for control surfaces at the broadcast facility, several years and it has been possible This shining goal is known as by skilled operators who are spared the to adapt IT networking technology to ‘interoperability.’ need to travel to the venue. In doing so, carry audio and video data, it doesn’t they can make use of the monitoring necessarily follow that you can simply The good news is that much progress facilities at the broadcast centre, which feed a live broadcast stream into an has been made towards this end, and the are usually superior to anything outside Ethernet router in Salford, UK and expect realisation of all the benefits it will confer; broadcast vehicles can offer. Importantly, it to emerge unscathed in an edit suite in we will look at the cross-manufacturer because the mixed audio is output locally Shenzhen, China. audio standards that exist so far in chapter by the DSP at the venue, IFB mixes for three of this primer, and consider how on-site commentators or presenters on IP networking technology originally video and audio should work together via site are created without latency, and designed to handle office-based the medium of IP networking in the not- the mixed audio is also returned to the data transport initially proved far from too-distant future in chapter four. central facility over the IP network virtually optimal for routing, mixing, processing simultaneously for broadcast. and controlling real-time multi-channel Before we do that, however, it’s time to audio and video, and although means of move to chapter two and look at a few The cost savings broadcasters can make reliably networking audio were eventually networked audio basics in more detail. using such network-based workflows are established over the past few years, These will enable us to better understand: considerable, and make it theoretically they were all proprietary standards at possible to broadcast many new kinds of first, which made it impossible to use a) the proprietary audio networking specialised events, the coverage of which a mixture of equipment from different standards developed over the past few would not be economically viable using manufacturers. years, and; traditional OB infrastructure. For example, the interest in regional or college sports, Furthermore, only very recently has b) the more recent cross-manufacturer or smaller entertainment events has not there been any progress made towards standards that are the focus of chapters hitherto justified the expense of sending establishing agnostic standards that three and four.

CALREC Putting Sound in the Picture 17 18 NETWORK PRIMER V2 NETWORK PRIMER V2 CHAPTER TWO: SOME TECHNICAL BACKGROUND

calrec.com CHAPTER TWO: SOME TECHNICAL BACKGROUND

As we’ve seen, proprietary standards for transmitting audio Layer 3 Protocols over IT networks have been Encapsulate audio data in standard IP packets. independently developed by Layer 3 audio networking products include: individual manufacturers over the past few years. Audinate’s QSC’s Q-LAN Some do a more complete job Wheatstone’s WheatNet-IP than others, but all were designed to deal with the basic fact that the requirements for delivering Layer 2 Protocols broadcast audio over a network are Encapsulate audio data in standard Ethernet frames. considerably more stringent than Layer 2 audio networking products include: those for IT-related data. AES51 IT networking protocols for data transfer CobraNet (such as the ubiquitous Ethernet) are Digigram’s EtherSound asynchronous, meaning that the order in which data arrives is not held to be so important as long it arrives eventually. Layer 1 Protocols Ethernet wiring and signalling componants .but do not use Ethernet frame structure. However, a broadcast audio feed usually Layer 1 audio networking products include: consists of many channels of high- resolution audio, all of which must be kept Aviom’s A-Net in sync with respect to each other and Riedel’s RockNet which have to be delivered in real time to Calrec’s Hydra2 avoid dropouts.

Looking at the technicalities of data transport over a network, we can Layer 2 describes the most basic unit of in the correct order, without losses or categorise audio networking protocols data used on the network; in an Ethernet duplication. in terms of how closely (or not) they network, this is the ‘frame’ containing the resemble IT networking data standards. electronic data. Ethernet Frames include In practical terms, all of the audio source and destination MAC addresses to networking technologies currently on the Modern electronic networks are often identify the source and destination device market are either Layer 1, 2 or 3 protocols described in terms of a notional model for data being transmitted. (and all of the Layer 3 protocols contain of up to seven layers of increasing Layer 4-style data verification capabilities). complexity that can be used to integrate Layer 3 adds the IP subnet structure used However, it would be misleading to communications protocols into real-world by all Ethernet networks (and Internet suggest that Layer 1 protocols are the applications. For audio networking, the servers) to uniquely identify network most basic and Layer 3 the most feature- most important of these are the first devices across the globe, and packages rich. four layers (which are also the most the data being transferred in IP packets. fundamental). These are all numbered to ensure that all Certainly, Layer 3 protocols conform more of the data arrives in the right order and closely to the defined standards of Gigabit Layer 1 describes the basic electrical can be accounted for. Ethernet (the most common network standards and voltages used to transmit standard) than others, including Layer data over a wired or wireless network, Layer 4 adds the ability to check the 4-style packet checking spliced into Layer such as an Ethernet network. arrival of these packets has occurred 3-style IP packets.

20 NETWORK PRIMER V2 CHAPTER TWO: SOME TECHNICAL BACKGROUND

These packets sit in turn within an overall and also have to use proprietary routing can be lower, and therefore less Layer 2 Ethernet Frame structure and hardware. However, many Layer 1 audio attractive to broadcasters who need their adhere to the basic Layer 1 electrical protocols offer similar routing and real- infrastructure to be as robust as possible. definitions of Gigabit Ethernet. time verification capabilities as Layer 3 protocols — but they do it by means of The ‘spectrum’ diagram below is a Layer 3 Protocols self-developed, proprietary means. reasonable summary in graphical form, with Layer 1 protocols at one end Thus the structure of the data in Layer Although compatibility with off-the-shelf (more expensive to implement, more 3 protocols, which include Audinate’s networking hardware is lost in a Layer 1 application- specific and geographically Dante, from ALC Networx, protocol, dispensing with the ‘higher-layer’ limited) and Layer 3 protocols at the other Axia’s and QSC’s Q-LAN, most data structures allows the development of (cheaper, with the potential for use over closely resemble that passing over a very efficient, robust, high-performance, a wider geographical area, and more standard Gigabit Ethernet network. As a low-latency protocols. When coupled with interoperable, but with a performance that result, they can multicast data to multiple the hardware required to use them, they is entirely dependent on the quality of the IP addresses simultaneously, as on an are arguably better suited to professional hardware and infrastructure being used). office network, and they can pass data via broadcast applications (albeit usually more connected Ethernet bridges and routers. expensive to implement). Most Audio over IP protocols or Internet This potentially allows the data to be Audio streaming standards would fall passed over a wide geographical area To summarise: ‘higher level’ protocols offer on the right side of this diagram, being and not to remain locked within one Local far greater compatibility with standard cheap to implement, but may fall below Area Network (or LAN). networking formats, and allow the use of the standard of reliability required by standard, affordable networking hardware. professional broadcasters. Layer 2 Protocols This can make installation more cost- effective and usable over a wider area, but Perhaps unsurprisingly, it has been The data structure of Layer 2 protocols, it can also mean that these protocols are attempts to marry the wider compatibility which include the IEEE’s Audio Video less efficient and higher in latency. and greater interoperability of Layer 2 Bridging standard (AVB), Calrec’s original and 3 protocols with further standards Hydra protocol, Peak Audio’s Cobranet Moreover, because the data in ‘higher- designed to improve reliability that and Digigram’s EtherSound, less closely layer’ networks is usually passed via have formed the basis for the cross- resemble standard Ethernet data. These non-proprietary hardware which is not manufacturer protocols that are now protocols dispense with the IP packet specifically designed to carry audio emerging. It is to these that we now turn in structure and thereby lose the ability data, the reliability of these networks chapter three. to be routed to other standard LANs. In this way, Layer 2 protocols cannot traverse the internet, although they still use the Ethernet Frame structure and Proprietary Off The Shelf can therefore still be routed within their network via off-the-shelf Ethernet hubs and switches.

Layer 1 Protocols

• Application Specific • Multi-Purpose Layer 1 protocols, which include Riedel’s • More Expensive • Cheaper RockNet, Aviom’s A-Net, Gibson’s • Local Area • Wide Area • Manufacturer-specific • Interoperable MaGIC, and Calrec’s Hydra2, have the • Performance tuned to requirements • Performance defined by choice of least in common with IT-style network infrastructure and protocols data. They are geographically limited

CALREC Putting Sound in the Picture 21 22 NETWORK PRIMER V2 NETWORK PRIMER V2 CHAPTER THREE: ROUTES TO INTEROPERABILLTY

calrec.com CHAPTER THREE: ROUTES TO INTEROPERABILLTY

At the end of Chapter One, Ravenna is a Layer 3 protocol. Although However, the positive trade-off is the we introduced the idea of intended for an Ethernet infrastructure, guarantee that what you put in is exactly interoperability - the concept of its use of IP packets abstracts it from the what you get out, and this reliability and data being shared freely between underlying network fabric, extending its predictability is attractive to broadcasters. video and audio equipment over an reach beyond LANs to public networks. IP-based network. In other words, there are no geographical • AES67 does not describe a full protocol; limits to this technology — and the use rather it defines a set of ‘ground rules’ that Central to the use of off-the-shelf IT of standard protocols makes it possible make interoperation between equipment components is conformance to a set to make use of existing IT-style IP from third parties possible. It is based on of standards which, together, define IP infrastructure, a clear and obvious benefit. the common ground between a number networking. This includes protocols such of established IP-based audio networking as RTP (the Real-time Transport Protocol), A diverse range of companies is signed systems including Ravenna, Livewire, IGMP (the Internet Group Management up to the Ravenna ecosystem; vendors Q-LAN and Wheatnet. Development Protocol) and PTP (the Precision Time currently producing Ravenna-compatible of the Layer 3, Ethernet-compatible Protocol), all of which are used in audio equipment include Calrec, Merging, standard began in 2010 under the name and video over IP streaming, but would Sonifex, AEQ and Digigram. AES-X192, and it was published in 2013. also be familiar to IT specialists outside The collaboration led to the formation of the broadcast industry. • Dante (Digital Audio Network Through the Media Networking Alliance (MNA) Ethernet), also a Layer 3 protocol, was in 2014, which consists of various like- Over the next few years we can expect to developed in 2006 by the Australian minded technology companies who all see broadcast equipment manufacturers company Audinate, and is the most wish to promote AES67 as the common producing equipment which will interface established of the audio protocols interchange of digital media between with common transports, although there is covered here. The biggest difference is different IP networking platforms. The still some work to do in this respect (more that Dante is proprietary, rather than an standard continues to evolve, but at the on this in Chapter four). open standard. Nonetheless, hundreds of time of writing is the closest thing the Dante-enabled products are available, and broadcast industry has to a common The past few years have seen some many technology companies, particularly networking solution. of the leading vendors in the audio for in the broadcast, installation, live and pro broadcast market working together audio industries work with Audinate to Management towards interoperability, but as is often provide compatible equipment, including the way when an industry first tries to Calrec and Shure to name but two. While there has been some industry establish standards, various approaches success in agreeing a common transport are currently in existence, having • AVB (), also known mechanism for IP audio, there has been been developed by different company as TNS (Time Sensitive Networking), is less success in agreeing how IP streams groupings. another open standard promoted by the can be managed. AVNU Alliance, a group of companies • Ravenna was proposed by ALC Networx including Avid, Cisco, Intel, Dolby, Meyer To fully realise the benefits of IP, it must at IBC 2010 as “a technology for real-time Sound, Sennheiser and Yamaha. AVB is be possible for a device (or end point, transport of audio and other media data in a Layer 2 protocol that uses etherframes as IP jargon terms it) to join a network, IP-based network environments”, Ravenna rather than IP packets to transport data. and discover for itself all the streams (or was, from the first, an open technology Consequently, AVB networks cannot services) available on the network. It must standard without a proprietary licensing extend across routers or bridges, and are do this in order to allow a human operator policy, which encouraged partners to geographically limited to LAN segments. to see the available streams and to make participate in development. As such, it A further limitation is that AVB networks choices about which to connect to. adapted standard network protocols like require specially manufactured, AVB- RTP for use primarily in the professional enabled switches Firstly, this process requires that broadcast market. and hubs. all devices participate in an agreed ‘discovery’ scheme. Secondly, as the list of

24 NETWORK PRIMER V2 CHAPTER THREE: ROUTES TO INTEROPERABILLTY

connected devices changes, the discovery other’s presence on the network. By required that allows one device to control mechanism must allow for dynamic contrast, for a network implementation to aspects of another device over an IP tracking of available streams. be successful, network designers must connection. be careful to select devices that have Thirdly, once a stream has been compatible discovery mechanisms. There are currently several control discovered, detailed information must mechanisms in development that may be provided by the transmitting device In the face of such continuing be utilised, including EMBER+ and the giving details of exactly how to listen incompatibilities, broadcast equipment AES70 open standard, which in addition to, and decode, that particular stream. vendors have their work cut out to ensure to control and monitoring of equipment, These details are known as a session that all products will be capable of mutual also handle connection management and description, and include sample rate, communication. Calrec’s approach is to self-discovery. These are sophisticated encoding mechanism, sample depth, design end points that support multiple protocols that use a server-client model number of channels and multicast IP. protocols simultaneously via networkable and do not generally require vendor- interfaces, making them network-agnostic, specific extensions to allow control of A popular service discovery mechanism, and capable of working with a wide range functions. originally developed by Apple Corp, is of devices from different vendors. Bonjour, with other mechanisms including A device may advertise its AES70 SAP (Session Announcement Protocol) Other Control-related Matters server using Bonjour, allowing potential and SIP (Session Initiation Protocol). More controllers to discover it. The protocol recently, development has begun on more Connecting devices on an IP network allows user-defined controls to be industry-specific methods, in AMWA’s is not done merely so that audio can be discovered, and contains pre-defined NMOS IS-04 protocol, and in discovery routed from one to another; it also offers objects for complex controls like extensions to the AES70 OCA framework the prospect of more sophisticated control routing, clock management and stream standard. For a network of devices to integration. management. work together, it is necessary for them all to support a common discovery For some years, proprietary manufacturer At the time of writing, it is early days for mechanism. protocols, including Calrec’s Hydra2, have AES70, and it remains to be seen how offered differing levels of control over the widely it is adopted. EMBER+, a fully Unfortunately, the AES67 standard hardware connected to their networks, featured, open source protocol, is also in does not mandate the use of a particular depending on the level of support in constant development, aided by a long discovery mechanism. This is because the connected device and its relative list of broadcast vendors whose hardware each has its own strengths and sophistication. As broadcast hardware is compatible (including Calrec, Evertz, weaknesses, and the authors of AES67 moves to its IP-based future, control GrassValley, Riedel and SSL). felt that manufacturers ought to have the protocols are also being developed to freedom to choose a mechanism that best make networked hardware controllable Following the development of proprietary, matches a particular application. While remotely, across the network. mutually non-interoperable networked this position is supportable, its unfortunate audio systems in the 2000s, and the consequence is that products with non- An example might be where a mixing fragmentary development of partially matching discovery mechanisms will not console is connected to a stream from a interoperable audio standards in the first be able to interoperate. third-party vendor’s mic amp unit. In this half of the 2010s, in recent years the case, the mixing console operator might global broadcast industry has committed This already raises issues; Ravenna want to be able to control the analogue itself to ensuring proper standards were uses RTP streaming and Bonjour for gain of the mic amplifier in the third-party created for the transmission of audio service discovery, Audinate’s AES67- unit. Another example might be where and video over IP, together with sync, compliant profile uses SAP, and AES67 someone wishes to take control of the control and metadata. This is the subject itself mandates only the use of SIP. In AES67 connection management of a of our final chapter, which also looks other words, devices running different device from a remote location. In both at where all these new developments protocols will remain unaware of each these cases, a control mechanism is might be taking us.

CALREC Putting Sound in the Picture 25 26 NETWORK PRIMER V2 NETWORK PRIMER V2 CHAPTER FOUR: CONTROL, SYNC & METADATA OVER IP

calrec.com CHAPTER FOUR: CONTROL, SYNC & METADATA OVER IP

The prospect of fully networked, They invited other standards organisations VSF’s TR-03 describes the broadcast interoperable broadcast technology to draw on existing protocols and data being sent in completely segregated seemed distant in 2012. Back technologies to create new standards so-called ‘elemental streams’: HD or then, work on interoperable audio that would help realise the ideal. uncompressed video in RFC 4175 format, standards was already developing in Following publication of the JT-NM’s audio in AES67 format, and separate sync multiple different directions, Reference Architecture in 2015, various and metadata streams. and no progress had been made broadcast industry bodies have come on any standards for transporting forward with proposed standards that From an audio broadcast point of view, audio and video over IP. meet the requirements in the Reference the TR-03 standard is more interesting as Architecture. the audio is readily available as a discrete To avoid a repeat of the chaos of the element; there is no requirement to previous decade, the Joint Task Force • AMWA created the Networked Media de-embed audio from a video signal on Networked Media (JT-NM) was Open Specifications (NMOS), developing before it can be processed or mixed, or formed by three key players in the global specifications that provide for the re-embed it afterwards, as is the case with broadcast industry: the US’s Society of discovery and registration of audio and SDI signals. Motion Picture & Television Engineers video hardware over IP networks. NMOS (SMPTE), the European Broadcasting has been met with great acclaim and • AMWA has also supported regular Union (EBU) and the international Video enthusiastically adopted by manufacturers meetings of broadcast technology Services Forum (VSF). They were later in the broadcast industry. providers at international plugfests. These joined by the Advanced Media Workflow events have allowed broadcasters, tech Association (AMWA). • SMPTE continued to add to its existing developers, design engineers and vendors SMPTE 2022 standard, which it had to exchange information, collaborate and All feared a costly future in which begun publishing in 2007 to define test their new prototype technologies, broadcasters would feel forced into protocols for transport of (initially heavily interfaces and protocols under condition upgrading to mutually incompatible IP- data-compressed) video signals over of anonymity - so no-one manufacturer based systems, many of which would need IP. In terms of realising an IP-based benefits too greatly from successful to be retrofitted or completely replaced as future, the most significant parts of the demonstrations, or is penalised by standards gradually emerged. specification are published as 2022-6, unsuccessful public demonstrations of which allows for the transport of high their technologies. Better, they reasoned, to work together bit-rate and even uncompressed video so that systems could be interoperable to over IP with embedded audio, and 2022-7, • Following the creation of the VSF’s widely accepted standards from the very which provides standards for redundant TR-03 and TR-04 standards, another beginning. transmission of the same video data broadcast industry organisation, AIMS over IP. This helps to safeguard against (the Alliance for IP Media Solutions) The taskforce conducted detailed network-related signal interruptions. was formed to promote their adoption research amongst broadcast companies, throughout the broadcast industry, manufacturers and practitioners of best • Following the publication of SMPTE together with the existing standards practice to find out exactly what they 2022-6, the Video Services Forum encapsulated in the VSF’s Technical might want from future IP-based video released two further standards for Recommendations: AES67 for audio and and audio standards, what their concerns high-resolution video and audio over IP SMPTE 2022-6 for high-resolution video. were, and how they saw such systems which fit the requirements of the JT- As a result, it has been proposed that both working in practice. NM’s Reference Architecture. TR-04 VSF TR-03 and TR-04 should be adopted recommends using the SMPTE 2022-6 as a new SMPTE standard under the They pooled the results and used them standard for high-resolution video either name SMPTE 2110. to derive what they called the Reference with embedded audio, as in SDI signals, or Architecture: a detailed model of how with a separate stream of audio in AES67 This is expected to be ratified before the an as-yet-undefined IP-based broadcast format, along with sync and metadata end of 2017, and the signs are that it may future might operate. streams. provide the basis for the high-resolution,

28 NETWORK PRIMER V2 CHAPTER FOUR: CONTROL, SYNC & METADATA OVER IP

12

NO

YES

NO

NO

YES

interoperable IP-based broadcast industry NO of the future. However, at the time of writing, SMPTE 2110 does not incorporate any of AMWA’s NMOS proposals.

The late 2010s are a time of great NO change in broadcast technology, involving a potentially bewildering series of new standards, industry bodies, research, acronyms and upheaval. It’s no simple task to clarify this process.

However, this flowchart seeks to do just that, asking you to consider what you might need from your broadcast facility over the next few years and leading you to a decision based on what is currently known about the standards currently under development.

CALREC Putting Sound in the Picture 29 CHAPTER FOUR: CONTROL, SYNC & METADATA OVER IP

With some of the standards required Two hours before the match, skilled At the end of the first edition of the Calrec to take these changes forward still cameramen, vision and sound mixing Audio Networking Primer, we wrote: incomplete, forecasting the immediate engineers drawn from personnel situated future of IP-based broadcast systems all over the globe, some at home, some “If this Primer has a message, it’s to is difficult — but it is possible to discern at small broadcast facilities and some emphasise that networking technology what might be possible in around a at large scale traditional studios, receive is on the brink of delivering unparalleled decade from now, once all the upheaval emails with access codes and privileges audio and video integration to broadcasters is past and these new technologies have for the streams they need to access for in many areas — even if some aspects still bedded in, and to compare it to standard video and audio monitoring, foldback remain just out of reach.” practices today. mixes, and the raw audio and video from the event site. Shortly before the Today, with internationally accepted Today, the broadcast of a major sporting specified event time, they connect their technical standards close to event requires a large and extremely respective camera, vision or audio mix ratification, and manufacturers expensive truck (and sometimes multiple control surfaces to their local network nearly all pulling in the same trucks) to arrive on-site at least 24 hours access point, and produce the broadcast direction, the future described before the event, and for skilled staff to of the event as though from an on-site OB above is closer than ever. remain on site until the infrastructure is vehicle today. packed up afterwards. This workflow has unavoidably high attendant costs (travel, Afterwards, IP technicians disconnect the hotels and so on). DSP hardware, pack it and any cameras or microphones back in the van, and leave. By contrast, here is how the workflow The virtual production ‘team’ disbands could be re-cast more affordably using immediately; if required, personnel can IP-based infrastructure. move straight on to producing another event in a completely different part of the A few hours before the fixture starts, a world, because they don’t need to travel single, small van could arrives on site, anywhere first. and one or two IP specialist engineers with some audio knowledge place some The broadcaster fills its schedules with remotely controlled DSP on site and many more types of varied programming connect it to a robust IP connection. and live events produced in this way, because the associated costs are no At some venues, they’ll also need to longer only viable for the biggest, most distribute cameras, microphones and high-profile events. interfacing for audio capture, but at many sites the mics and cameras may be pre- installed. There may be no cameramen or dedicated audio staff to place the mics as it will be possible to remotely control, steer and reposition them.

30 NETWORK PRIMER V2 CHAPTER FOUR: CONTROL, SYNC & METADATA OVER IP

CALREC Putting Sound in the Picture 31 Calrec Audio Ltd Nutclough Mill Hebden Bridge West Yorkshire England UK HX7 8EZ

Tel +44 (0)1422 842159 Fax +44 (0)1422 845244 Email [email protected] calrec.com (926-185 Iss.4)