AES WHITE PAPER Best Practices in Network Audio AESTD1003V1 2009-06-04
Total Page:16
File Type:pdf, Size:1020Kb
AES WHITE PAPER Best Practices in Network Audio AESTD1003V1 2009-06-04 Nicolas Bouillot, McGill University Steven Harris, BridgeCo Inc. Elizabeth Cohen, Cohen Group Brent Harshbarger, CNN Jeremy R. Cooperstock, McGill University Joffrey Heyraud, Ethersound Andreas Floros, Ionian University Lars Jonsson, Swedish Radio Nuno Fonseca, Polytechnic Institute John Narus, VidiPax LLC of Leiria Michael Page, Peavey Digital Research Richard Foss, Rhodes University Tom Snook, New World Symphony Michael Goodman, CEntrance, Inc. Atau Tanaka, Newcastle University John Grant, Nine Tiles Networks Ltd. Justin Trieger, New World Symphony Kevin Gross, AVA Networks Umberto Zanghieri, ZP Engineering srl EXECUTIVE SUMMARY Analog audio needs a separate physical circuit for each channel. Each microphone in a studio or on a stage, for example, must have its own circuit back to the mixer. Routing of the signals is inflexible. Digital audio is frequently wired in a similar way to analog. Although several channels can share a single physical circuit (e.g., up to 64 with AES10), thus reducing the number of cores needed in a cable. Routing of signals is still inflexible and any change to the equipment in a location is liable to require new cabling. Networks allow much more flexibility. Any piece of equipment plugged into the network is able to communicate with any other. However, installers of audio networks need to be aware of a number of issues that affect audio signals but are not important for data networks and are not addressed by current IT networking technologies such as IP. This white paper examines these issues and provides guidance to installers and users that can help them build successful networked systems. 1 BACKGROUND 1394. Differences between the various (IEEE 802.3) networks were Network audio technologies find appli- implementations include issues of employed to transfer audio files from cation across a wide number of audio format, number of channels one workstation to another. Initially domains, including studio recording supported, and end-to-end transport these were non-real-time transfers, but and production activities, archiving latency. In addition to the large facility operators, technology and storage, musical performance in number of low-fidelity consumer appli- providers and networking standards theater or concert, and broadcasting. cations for network audio streaming, grew to allow real-time playback over These range in scale from desktop various software architectures have the network for the purpose of media- audio production systems to 500-acre been developed to support professional storage consolidation, final mix-down, theme-park installations. grade audio transport over the Internet, and production. Each domain imposes its own, albeit without the higher performance True real-time audio networking was sometimes conflicting, technical that characterizes local area network first introduced in installed sound rein- requirements. For example, the low- solutions. forcement applications. By the mid latency demands of live sound can be 1990s, digital signal processing was in at odds with the flexibility and interop- 1.1 Marketplace widespread use in this market segment erability that is attractive to commer- Outside the telecommunications [12][19]. Digital signal processing cial installations. Similarly, recording industry, audio networking was first improved the flexibility and scalability studios demand impeccable audio conceived as a means of transferring of sound reinforcement systems. It performance and clock delivery that is work units within large studios and became possible to infuse facilities not required in other domains. postproduction facilities. In the 1990s such as theme parks and stadiums with The architectures that support when the digital audio workstation hundreds of individually processed network audio are, at present, largely became the dominant audio produc- audio signals, and thus, the need arose confined to Ethernet-based solutions, tion platform, standards-based Token for a flexible, optimized distribution with a few alternatives including IEEE Ring (IEEE 802.5) and Ethernet system for these signals. Since audio ¯ J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September 729 AES White Paper: Best Practices in Network Audio was now processed in digital format, ence point. If necessary, latency can benefit of a network is that resources logically the distribution system should always be added with a delay element, and information can be shared. Net- be digital as well. but once introduced, latency can never works are often characterized by a Live sound applications were the last be removed. This requirement severely combination of their size (e.g., per- to adopt digital technology. However, restricts the amount of computation and sonal area, local area, campus area, with the introduction of digital look-ahead that can be performed to metropolitan area, wide area), trans- consoles and distribution systems achieve data reduction. mission technology and their topology targeted to live sound applications, this Audio networks, as any other digital (e.g., star, bus, daisy-chain). has changed. audio system, need signals to maintain Network audio technologies have The present marketplace for digital synchronization over the entire system, been based wholly or partially on audio networking might be described ensuring that all parts are operating telecommunications and data commu- as fragmented, with multiple technol- with the same number of samples at nications standards. For example, ogy players. This is the result of any one time. Synchronization signals AES3 is based on an RS 422 electrical disparate performance requirements, may also be used to detect the exact interface and AES10 (MADI) is based market forces, and the nature of tech- moment when A/D and D/A converters on the electrical interface developed nology development. should read or write their values. In for the Fiber Distributed Data Interface most digital systems, synchronization (FDDI) communications standard. 1.2 Challenges of audio signals are transmitted at the sampling The remainder of this section networks rate, i.e., one sync signal for each provides further discussion of several Despite the maturity of data networks sample. In audio networking, such an important aspects of network technolo- and their capability of transporting a arrangement is often impossible to gies, both theoretical and practical. wide variety of media, transport of support, resulting in a more complex professional audio often proves quite clock recovery task. 2.1 OSI model challenging. This is due to several Flexibility and robustness are further The International Organization for Stan- interrelated characteristics distinct to considerations, in particular when dardization (ISO) created a seven-layer audio: the constraint of low-latency, dealing with a large number of nodes. model called the Open Systems Inter- the demand for synchronization, and For this reason, recent networks typi- connection (OSI) Reference Model to the desire for high fidelity, especially cally adopt a star topology. This provide abstraction in network function- in the production environment, which provides flexibility in adding, remov- ality. Many popular professional Ether- discourages the use of lossy encoding ing, and troubleshooting individual net-based audio networks only use the and decoding (codec) processes. connections, and offers the benefit that, Physical and Data Link Layers (layers A stereo CD quality audio stream (16- at least for a simple star, traffic one and two) of the model, as described bit resolution, 44.1-kHz sampling) between any two devices does not in further detail below: requires 1.4 Mbps1 of data throughput, a interfere with others. 1 The Physical Layer provides the quantity easily supported by existing Each of these requirements may rules for the electrical connection, wired LAN technologies although often impact the others. For example, including the type of connectors, not by commercial WAN environments. latency may be reduced by transmit- cable, and electrical timing. Available bandwidth may be exceeded ting fewer audio samples per packet, 2 The Data Link Layer defines the when the number of channels, resolu- which thus increases overhead and rules for sending and receiving tion, or sampling rate are increased, or bandwidth. Alternatively, switches information across the physical con- when the network capacity must be may be avoided by changing the nection of devices on a network. Its shared with other applications. network topology from a star to a bus, primary function is to convert a Low latency is an important require- which in turn decreases flexibility. The stream of raw data into electrical sig- ment in audio. Sources of latency quality of synchronization may be nals that are meaningful to the net- include A/D and D/A converters, each increased with the addition of sync work. This layer performs the func- of which adds approximately 1 ms, digi- signals over the same medium as used tions of encoding and framing data tal processing equipment, and network for audio transport, thus rendering the for transmission, physical address- transport. In a typical chain, audio may network incompatible with generic ing, error detection, and control. need to be transported from a source data network standards. Addressing at this layer uses a physi- device to a processing unit, then to a cal Media Access Control (MAC) mixer, and from there, to an amplifier. In 2 NETWORK TECHNOLOGIES address, which