AES WHITE PAPER Best Practices in Network Audio AESTD1003V1 2009-06-04

Nicolas Bouillot, McGill University Steven Harris, BridgeCo Inc. Elizabeth Cohen, Cohen Group Brent Harshbarger, CNN Jeremy R. Cooperstock, McGill University Joffrey Heyraud, Ethersound Andreas Floros, Ionian University Lars Jonsson, Swedish Radio Nuno Fonseca, Polytechnic Institute John Narus, VidiPax LLC of Leiria Michael Page, Peavey Digital Research Richard Foss, Rhodes University Tom Snook, New World Symphony Michael Goodman, CEntrance, Inc. Atau Tanaka, Newcastle University John Grant, Nine Tiles Networks Ltd. Justin Trieger, New World Symphony Kevin Gross, AVA Networks Umberto Zanghieri, ZP Engineering srl EXECUTIVE SUMMARY Analog audio needs a separate physical circuit for each channel. Each microphone in a studio or on a stage, for example, must have its own circuit back to the mixer. Routing of the signals is inflexible. is frequently wired in a similar way to analog. Although several channels can share a single physical circuit (e.g., up to 64 with AES10), thus reducing the number of cores needed in a cable. Routing of signals is still inflexible and any change to the equipment in a location is liable to require new cabling. Networks allow much more flexibility. Any piece of equipment plugged into the network is able to communicate with any other. However, installers of audio networks need to be aware of a number of issues that affect audio signals but are not important for data networks and are not addressed by current IT networking technologies such as IP. This white paper examines these issues and provides guidance to installers and users that can help them build successful networked systems.

1 BACKGROUND 1394. Differences between the various (IEEE 802.3) networks were Network audio technologies find appli- implementations include issues of employed to transfer audio files from cation across a wide number of audio format, number of channels one workstation to another. Initially domains, including studio recording supported, and end-to-end transport these were non-real-time transfers, but and production activities, archiving latency. In addition to the large facility operators, technology and storage, musical performance in number of low-fidelity consumer appli- providers and networking standards theater or concert, and broadcasting. cations for network audio streaming, grew to allow real-time playback over These range in scale from desktop various software architectures have the network for the purpose of media- audio production systems to 500-acre been developed to support professional storage consolidation, final mix-down, theme-park installations. grade audio transport over the Internet, and production. Each domain imposes its own, albeit without the higher performance True real-time audio networking was sometimes conflicting, technical that characterizes local area network first introduced in installed sound rein- requirements. For example, the low- solutions. forcement applications. By the mid latency demands of live sound can be 1990s, digital signal processing was in at odds with the flexibility and interop- 1.1 Marketplace widespread use in this market segment erability that is attractive to commer- Outside the telecommunications [12][19]. Digital signal processing cial installations. Similarly, recording industry, audio networking was first improved the flexibility and scalability studios demand impeccable audio conceived as a means of transferring of sound reinforcement systems. It performance and clock delivery that is work units within large studios and became possible to infuse facilities not required in other domains. postproduction facilities. In the 1990s such as theme parks and stadiums with The architectures that support when the digital audio workstation hundreds of individually processed network audio are, at present, largely became the dominant audio produc- audio signals, and thus, the need arose confined to -based solutions, tion platform, standards-based Token for a flexible, optimized distribution with a few alternatives including IEEE Ring (IEEE 802.5) and Ethernet system for these signals. Since audio ¯

J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September 729 AES White Paper: Best Practices in Network Audio was now processed in digital format, ence point. If necessary, latency can benefit of a network is that resources logically the distribution system should always be added with a delay element, and information can be shared. Net- be digital as well. but once introduced, latency can never works are often characterized by a Live sound applications were the last be removed. This requirement severely combination of their size (e.g., per- to adopt digital technology. However, restricts the amount of computation and sonal area, local area, campus area, with the introduction of digital look-ahead that can be performed to metropolitan area, wide area), trans- consoles and distribution systems achieve data reduction. mission technology and their topology targeted to live sound applications, this Audio networks, as any other digital (e.g., star, , daisy-chain). has changed. audio system, need signals to maintain Network audio technologies have The present marketplace for digital synchronization over the entire system, been based wholly or partially on audio networking might be described ensuring that all parts are operating telecommunications and data commu- as fragmented, with multiple technol- with the same number of samples at nications standards. For example, ogy players. This is the result of any one time. Synchronization signals AES3 is based on an RS 422 electrical disparate performance requirements, may also be used to detect the exact interface and AES10 (MADI) is based market forces, and the nature of tech- moment when A/D and D/A converters on the electrical interface developed nology development. should read or write their values. In for the Fiber Distributed Data Interface most digital systems, synchronization (FDDI) communications standard. 1.2 Challenges of audio signals are transmitted at the sampling The remainder of this section networks rate, i.e., one sync signal for each provides further discussion of several Despite the maturity of data networks sample. In audio networking, such an important aspects of network technolo- and their capability of transporting a arrangement is often impossible to gies, both theoretical and practical. wide variety of media, transport of support, resulting in a more complex professional audio often proves quite clock recovery task. 2.1 OSI model challenging. This is due to several Flexibility and robustness are further The International Organization for Stan- interrelated characteristics distinct to considerations, in particular when dardization (ISO) created a seven-layer audio: the constraint of low-latency, dealing with a large number of nodes. model called the Open Systems Inter- the demand for synchronization, and For this reason, recent networks typi- connection (OSI) Reference Model to the desire for high fidelity, especially cally adopt a star topology. This provide abstraction in network function- in the production environment, which provides flexibility in adding, remov- ality. Many popular professional Ether- discourages the use of lossy encoding ing, and troubleshooting individual net-based audio networks only use the and decoding (codec) processes. connections, and offers the benefit that, Physical and Data Link Layers (layers A stereo CD quality audio stream (16- at least for a simple star, traffic one and two) of the model, as described bit resolution, 44.1-kHz sampling) between any two devices does not in further detail below: requires 1.4 Mbps1 of data throughput, a interfere with others. 1 The provides the quantity easily supported by existing Each of these requirements may rules for the electrical connection, wired LAN technologies although often impact the others. For example, including the type of connectors, not by commercial WAN environments. latency may be reduced by transmit- cable, and electrical timing. Available bandwidth may be exceeded ting fewer audio samples per packet, 2 The Data Link Layer defines the when the number of channels, resolu- which thus increases overhead and rules for sending and receiving tion, or sampling rate are increased, or bandwidth. Alternatively, switches information across the physical con- when the network capacity must be may be avoided by changing the nection of devices on a network. Its shared with other applications. from a star to a bus, primary function is to convert a Low latency is an important require- which in turn decreases flexibility. The stream of raw data into electrical sig- ment in audio. Sources of latency quality of synchronization may be nals that are meaningful to the net- include A/D and D/A converters, each increased with the addition of sync work. This layer performs the func- of which adds approximately 1 ms, digi- signals over the same medium as used tions of encoding and framing data tal processing equipment, and network for audio transport, thus rendering the for transmission, physical address- transport. In a typical chain, audio may network incompatible with generic ing, error detection, and control. need to be transported from a source data network standards. Addressing at this layer uses a physi- device to a processing unit, then to a cal Media Access Control (MAC) mixer, and from there, to an amplifier. In 2 NETWORK TECHNOLOGIES address, which works only within a digital audio network, latency is For the purpose of this white paper, a the LAN environment. increased by the number of such network is defined as a collection of 3 The Network Layer defines protocols segments that the data must traverse. more than two devices, dispersed in such as Internet Protocol (IP) and Many system designers subscribe to a space, and connected together via less-is-more philosophy when specify- hardware and software, such that any 1 Note that this figure ignores packet head- ing or evaluating system latency, using device can communicate with any ers and tails, control signals, and possible zero latency analog systems as a refer- other connected device. The principal retransmissions.

730 J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September AES White Paper: Best Practices in Network Audio X.25 for opening and maintaining a application-level representations and all devices. Although this topology is path to the network between sys- network data representations. rarely used now, early Ethernet tems. At this layer, endpoints are 7 The Application Layer includes a devices used that was specified by internetwork address- range of network applications such connected to each device with a T- ing. This layer is responsible for the as those that handle file transfers, connector or “viper” tap. This should connectionless transfer of data from terminal sessions, and message not be confused with modern virtual one system to another across a single exchange, including email. bus systems, such as IEEE 1394 or hop, but not for reliable delivery. It USB. is also responsible for fragmenting 2.2 Transmission schemes Star: Devices are interconnected data into suitably sized packets as Data transmission may take place over through centrally located network necessary for transport. wired media, such as copper wire and distribution hubs, such as repeaters, 4 The Transport Layer provides addi- fiber optic connections or wireless switches, or routers. Hubs may also tional services that include control media, such as terrestrial radio and be interconnected to create a star-of- for moving information between satellite. In either case, the transmis- stars topology. Most Ethernet con- systems, additional error handling, sion scheme used can be classified as nections today are designed in some and security. The Transport Control asynchronous, isochronous, or syn- form of a star configuration. Protocol (TCP) is a well-known chronous. Daisy-chain: Devices are connected connection-oriented Transport Asynchronous communications are end-to-end. Setup and wiring is sim- Layer protocol that ensures trans- non-real-time such as web browsing ple as no separate network equip- mitted data arrives correctly at the and email transport. The general ment is required. However, failure of destination. The User Datagram purpose nature of asynchronous a single device or connection in a Protocol (UDP) is a connectionless systems gives these transport technolo- daisy chain can cause network fail- protocol, which does not provide gies a wide market, high volumes, and ure or split the system into two sepa- any acknowledgements from the low costs. Examples of asynchronous rate networks. destination. Although UDP is con- communications include Ethernet, the Ring: Similar to a daisy-chain, but with sidered less reliable than TCP Internet, and serial protocols such as the last device connected to the first because it provides no guarantees of RS-232 and RS-485. to form a ring. On failure of a single packet delivery, it is often preferred Synchronous communications device or connection, the ring reverts for streaming media applications systems are specialized, purpose-built to a still functional daisy-chain. because of its superior real-time systems. They can be the best solution Tree: Similar to a daisy-chain, but performance. The popular Real- for a well-focused data-transport appli- devices are allowed to make connec- time Transport Protocol (RTP), cation, but are ill-suited for supporting tions to multiple other devices, pro- used in many streaming media a mixture of services. Carrying asyn- vided that no loops are formed. applications, defines a packet for- chronous data on a synchronous trans- Spanning-tree: Any network topology mat for network delivery of audio port is inefficient and does not readily is permitted. The network features or video, which itself is typically accommodate bursty traffic patterns. distributed intelligence to deactivate carried inside of UDP. RTP is often Examples of synchronous communica- individual links to produce a work- used in conjunction with the RTP tions systems include AES3, AES10, ing spanning-tree star-of-stars Control Protocol (RTCP) to provide ISDN, T1, and the conventional tele- configuration. Upon failure of net- feedback to the sender regarding phone network. work links or components, the net- quality of service as required for Isochronous communication is work can automatically reconfigure, session management. required to deliver performance that is restoring deactivated links to route 5 The Session Layer coordinates the quantified in a service agreement on around failures. exchange of information between the connection between communicat- Mesh: Any network topology is per- systems and is responsible for the ing nodes. The service agreement mitted. Routing algorithms insure initiation and termination of commu- specifies parameters such as band- that traffic moves forward to its des- nication dialogues or sessions. The width, delivery delay, and delay varia- tination efficiently utilizing all links Session Initiation Protocol (SIP), tion. Isochronous transports are and does not get caught in any of the used commonly in Voice over Inter- capable of carrying a wide variety of loops in the topology. net Protocol (VoIP), streaming mul- traffic and are thus the most versatile timedia, and videoconferencing sys- networking systems. Examples of 2.4 Routing tems, resides at this layer. systems supporting isochronous The routing options supported by a 6 The Presentation Layer protocols are communications include ATM, IEEE network are an important measure of a part of the operating system (OS). 1394, and USB. its usefulness and flexibility. These negotiate the use and syntax Point-to-point or unicast routing, that allow different types of systems 2.3 Topology allows direct communications to communicate, for example, by Common network topologies include: between one sender and one dealing with the conversion between Bus: A single common wire connects receiver. Point-to point connec- ➥

J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September 731 AES White Paper: Best Practices in Network Audio tions may be defined statically by 2.6 Network service quality drawn in favor of IP [24][5]. In this the network configuration or dynam- Quality of Service (QoS) is a measure direction, Multiprotocol Label ically, as is the case when placing a of how reliably the data transmitted on Switching (MPLS) over Ethernet may telephone call or retrieving a page the network arrives at its destination. be considered a viable replacement. from a web server. The parameters by which QoS is mea- A service that has no defined QoS is Point-to-multipoint or broadcast rout- sured include the data rate, expressed described as best effort: the network ing supports asymmetric communi- in bits per second or packets per sec- tries its best to deliver the data but can cation from a sender to one or more ond, the latency, which is the time make no guarantees. Private networks receivers. This scenario is exem- interval between a data item being may be overspecified to ensure an plified by radio broadcast or a transmitted by the source and received adequate bandwidth margin and may conventional analog cable television by the destination, and the proportion prevent congestion by segregating distribution system. A variation of data that is lost (never arrives at the other traffic. In this case, throughput known as multicast allows specified destination). Both the long-term value and reliability will be as good as on a devices to join a virtual group, such and short-term variations are signifi- network with QoS, although latency that a message sent by one member cant in each case. may remain higher on a store-and- of the group goes to all other For activities such as web surfing, forward packet routing network than members of the group, but not to any QoS is relatively unimportant. Delays on a circuit-switched network. other devices. This allows some of of a second or two will be masked by However, under best effort condi- the benefits of broadcasting without random delays in other parts of the tions, audio streaming must allow for swamping other network devices chain such as waiting for access to a sudden changes in latency by providing with irrelevant messages. server, or for the retransmission of lost adequate buffering at the receiving end. Multipoint-to-multipoint routing allows or corrupted packets [11]. However, Redundant information (e.g., Reed- every node to broadcast or multicast for live audio, all aspects of QoS are Solomon forward error correction) can to all others. important. The available data rate must be introduced into the stream to miti- be sufficient to convey the bits as fast gate against the problems of packet 2.5 Data transport management as they are produced by the source, loss, but this increases the transmitted Devices that are connected to audio net- without interruption. In any system, a data rate which may make packet loss works can be sources and/or destinations sudden, unexpected increase in latency more likely and increase jitter.2 of digital audio. These devices may gen- can cause a drop out in the signal at the Strategies for achieving stream- erate audio themselves, as in the case of destination. Latency is most critical specific QoS, including differentiated studio synthesizers, or they could have a where the output is related to live service and MPLS, rely on resource number of plugs from which audio is sound (see Sections 4.4 and 4.7), as reservations. This requires that sourced. As we have seen in previous there is no opportunity to retransmit networked audio applications are sections, various technologies provide lost data. deployed in a centrally configured and the means to transmit and receive the If a defined QoS it required for a fully controlled network, hence audio samples over a physical network. particular flow such as an audio stream, restricting such solutions to relatively Data transport management deals with network resources must be reserved small networks. the management of addressing, encapsu- before the destination begins receiving At a larger scale, or when the lation, and subsequent extraction of the it. This is straightforward with connec- network configuration remains uncon- networked audio. tion-oriented technologies such as trolled, QoS can be optimized at the Data transport management tech- ISDN and ATM. Less comprehensive Transport Layer only, i.e., by end-to- nologies allow the user to select a QoS capability is available on Ethernet end protocols. As an example, TCP particular audio channel from a cluster and IP via 802.1p, DSCP, RSVP and includes a congestion-control algorithm of channels transmitted over the the like. A more comprehensive QoS that evaluates network resources and network. Management protocols such solution for Ethernet is currently under adjusts sending rate in order to share as SNMP (IP-based) or AV/C (specific development as part of the IEEE 802.1 available bandwidth with coexisting to IEEE 1394) allow differing degrees working group’s AVB initiative. streams. This is fine for file transfer, of flexibility, for example, limiting the All communication over ISDN is via where data are received ahead of the number of audio channels that may be fixed-rate channels with fixed latency time they are required, and played out transmitted, the size of packets and high reliability. If the required data locally. However, for purposes of time containing the audio samples, or the rate is more than the channel capacity, synchronization, networked audio over association between channel numbers several channels are aggregated and output plugs at the destination. together. On an ATM network, there is 2 The EBU committee N/ACIP The routing of audio streams can be a call set-up procedure during which (http://wiki.ebu.ch/acip/Main_Page) is presented to users in the form of the source equipment specifies what studying this issue. 3 graphic workstation displays, in a QoS is required. Both these circuit- The TCP-Friendly Rate Control protocol (TFRC), currently under specification by manner appropriate to the particular switched technologies are perceived as the IETF Audio/Video Transport (avt), application domain. obsolescent and beginning to be with- may offer a solution to this problem.

732 J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September AES White Paper: Best Practices in Network Audio Internet is often carried over UDP or ments introduces a number of imple- need to communicate with one or more RTP/UDP, which do not include native mentation issues that must be consid- routers to affect routing changes. congestion control algorithm.3 ered for the realization of high-fidelity wireless audio applications. For exam- 3.1 Compatibility 2.7 Wireless ple, wireless transmission range With few exceptions, the audio net- Digital audio streaming over packet- is currently limited to distances up to working systems described in Table 1 based networks may benefit from the 100 m, while non-line-of-sight trans- are characterized by the transport of adoption of wireless networking mission is further limited. Electro- uncompressed audio in PCM format, technologies, most obviously by the magnetic interference is also a which in principle could be reformat- elimination of interconnection cables. significant factor that affects both ted as requested. Most standards sup- The first popular wireless standard throughput and delay performance. port 24-bit PCM at the usual sample for large-scale digital audio communi- QoS enhancements are provided by rates (44.1, 48, 88.2, 96 kHz), with cation was that of cellular phone the recently ratified 802.11e specifica- several of the standards extending to networks. However, these employ tion [8]. 192 kHz. heavy lossy compression, geared The IEEE 802.16 (WiMAX) stan- In practice, there are several issues toward the transport of speech data, dard [14] provides native QoS capabil- for compatibility between formats that typically operating in the range of 5.3 ities and connection-oriented for- should be addressed and solved with or 6.3 kbps for G.723.1 encoding and 8 warding. Typical bitrate values specific implementations, most kbps for G.729A encoding. currently supported vary between 32 notably: On a smaller physical scale, a and 134 Mbps. Transmission range and Bluetooth network or piconet of up to power requirements are approximately • configuration and control proto- eight active members is characterized triple and 10 times that of 802.11, cols are incompatible between for- by random hopping between 79 respectively. mats, and bridging among them is frequencies, allowing for simultaneous non-trivial; some protocols are operation of multiple such networks in 3. CURRENT AUDIO sophisticated while others are the same location. Piconets carry both NETWORKING SYSTEMS fairly simple asynchronous data and audio, the latter Table 1 presents an overview of vari- transmitted over synchronous connec- ous audio networking systems avail- • auxiliary data communication chan- tion oriented (SCO) channels [1]. The able today. It should be stressed that nels have varying available bitrates, specification supports a maximum of obtaining an objective comparison of ranging from less than 100 kbps to three full-duplex 64-kbps SCO links. latency measurements is problematic, several Mbps; as an example, These employ either logPCM (A-law given that the various technologies AES50 provides a 5-Mbps LAN or u-law) speech coding or Continuous operate with different constraints in control channel embedded in its Variable Slope Delta (CVSD) modula- terms of the underlying network transmission, while other protocols tion coding. Despite enhanced quality medium and the limits imposed by provide asynchronous serial port and latency compared to G723.1 and hardware outside of their control. tunnelling G.729A codecs, limited audio quality Where relevant information was avail- and Bluetooth range make this technol- able to elaborate on the measurements, • propagation of the global sync refer- ogy currently unsuitable for profes- these are indicated as notes. Unless ence is done with various tech- sional audio applications. otherwise specified, all measurements niques, which cannot be mixed eas- For uncompressed audio transmis- are intended to be exclusive of delays ily; most solutions recover the audio sion, the IEEE 802.11 WLAN specifi- resulting from analog/digital conver- clock from packet time of arrival, cation [13] represents one of the most sion and DSP, and all channel counts while AES50 uses a dedicated promising networking formats, due to are assumed to be bidirectional (input twisted pair for synchronization. its wide adoption in many digital and output). consumer electronic and computer Certain point-to-point technologies, For these reasons, audio network products, as well as the continuous such as AES3 and AES50, can appear nodes are typically interfaced with ratification of enhancements in many as networks through the use of intelli- other nodes using a single format; state-of-the-art networking aspects, gent routers. Taking advantage of such change of format is accomplished with such as high-rate transmission, secu- hardware, these technologies can and back-to-back specific implementations, rity, and adaptive topology control. are being used for effective large-scale with AES3 being the most common Typical theoretical bitrate values multipoint-to-multipoint audio inter- interface protocol between different currently supported by the 802.11 connection. In this case, the distinction solutions. families of protocols include 54 Mbps between networked and point-to-point The main issue is currently related to (802.11a/g) and 100–210 Mbps for communication is somewhat blurred. licensing. A node that is made compat- 802.11n, rendering them applicable for However, a dedicated router implies ible with more than one standard uncompressed quality audio. higher cost, less flexibility, a potential would need to apply to different licens- Audio delivery in WLAN environ- single point of failure, as well as a ing schemes. For these reasons, ➥

J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September 733 THE PROCEEDINGS OF THE AES 37 th INTERNATIONAL CONFERENCE

2009 August 28–30

Class D Audio Amplification

AES 37th Conference, 2009

Hillerød, Copenhagen, Denmark

Conference Chair: Jan Abildgaard Pedersen ISBN 978-0-937803-70-7

During the last ten years, switched-mode power amplifiers, very often called digital power amplifiers, have become the technology of choice in all of the global audio industry. The high efficiency and compact design inherent in digital power amplifier technology has inspired many engineers to design new and interesting products in existing and new market areas. Digital amplification technology is used in everything from the ever-expanding mobile phone and media player market in the milliwatt range to the most powerful public address systems operating at thousands of watts. The 18 papers in the proceedings seek to advance many of the complementary scientific disciplines involved in this still-growing field: audio power conversion, signal processing, advanced analog design, EMC, and more. Purchase online at www.aes.org/publications/conferences For more information email Donna Vivero at [email protected] or call +1 212 661 8528, ext. 42 AES White Paper: Best Practices in Network Audio TABLE 1: AUDIO NETWORK TECHNOLOGIES MATRIX

Technology Transp ort Trans- Mixed Use Control Top ology1 Fault Tol- Distance2 Diam. Network Latency Max mis- Network- Com- erance capac- avail- sion ing munica- ity able scheme tions sam- pling rate

AES47 ATM isoch. coexists Any IP Mesh provided by Cat5=100m, ∞ ∞ 125 192 kHz www.aes.org with ATM or ATM ATM MM=2km, µsper www.ninetiles.com protocol SM=70km hop AES50 Ethernet isoch. dedicated 5Mbps Point-to- FEC, re- Cat5=100m ∞ 48 63 µs 384 kHz www.aes50.com physical or Cat5 ethernet point dundant chan- and DSD layer synch. link nels Note: Ethernet transport is combined with a proprietary audio clock transport. AES50 and HyperMAC are point-to-point audio connections, but they bridge a limited bandwidth of regular Ethernet for the purpose of control communications. An AES50/HyperMAC router contains a crosspoint matrix (or similar) for audio routing, and an Ethernet switch for control routing. The system topology may therefore follow any valid Ethernet topology, but the audio routers need a priori knowledge of the topology. While there are no limits to the number of AES50 routing devices that can be interconnected, each hop adds another link’s worth of latency, and each router device needs to be controlled individually. AudioRail Ethernet synch. Cat5 or Proprietary daisy none Cat5=100m, ∞ 32 4.5 48 kHz www.audiorail.com Physical fiber chain MM=2km, chan- µs+ (32 ch), Layer SM=70km nels 0.25µs 96 kHz per (16 ch) hop Aviom Pro64 Ethernet synch. dedicated proprietary daisy- redundant Cat5e=120m, 9520 64 322 µs 208 kHz www.aviom.com Physical Cat5 and chain links MM=2km, km chan- +1.34 Layer fiber (bidirec- SM=70km nels µsper tional) hop Note: The network diameter figure is the largest conceivable network using fiber and 138 Pro64 merger units; derived from maximum allowed response time between control master and furthest slave device. Pro64 supports a wide variation range from the nominal sample rate values (e.g., 158.8 kHz - 208 kHz). CobraNet Ethernet isoch. coexists Ethernet, Spanning provided by Cat5=100m, 7 ∞ 1.33 96 kHz www..info data-link with Ether- SNMP, tree 802.1 MM=2km, hops, and layer net MIDI SM=70km 10 km 5.33ms Note: Indicated diameter is for 5.33 ms latency mode. CobraNet has more stringent design rules for its lower latency modes. Requirements are documented in terms of maximum delay and delay variation. A downloadable CAD tool can be used to validate a network design for a given operating mode. Network redundancy is provided by 802.1 Ethernet: STP, Link aggregation; redundant network connections (DualLink) and redundant devices (BuddyLink) are supported. any IP isoch. coexists IP any L2 or provided Cat5=100m, ∞ 700 84µs 192 kHz www.audinate.com medium with other IP by 802.1 + MM=2km, chan- traffic redundant SM=70km nels link Note: Channel capacity is based on 48 kHz/24-bit sampling and operation on a 1 Gbps network. The latency value is based on 4 audio samples with this configuration. Note that latency is dependent on topology and bandwidth constraints of the underlying hardware, for example, 800 µsona100MbpsDolbyLakeProcessor. EtherSound Ethernet isoch. dedicated Proprietary star, fault- Cat5=140m, ∞ 64 84–125 96 kHz ES-100 data-link Ethernet daisy tolerant MM=2km, µs www..com layer chain, ring SM=70km +1.4 ring µs/node Note: EtherSound allows channels to be dropped and added at each node along the daisy-chain or ring. Although the number of channels between any two locations is limited to 64, depending on routing requirements, the total number of channels on the network may be significantly higher. EtherSound Ethernet isoch. coexists Proprietary star, fault- Cat5=140m, ∞ 512 84–125 96 kHz ES-Giga data-link with Ether- daisy tolerant MM=600m, µs www.ethersound.com layer net chain, ring SM=70km +0.5 ring µs/node Note: EtherSound allows channels to be dropped and added at each node along the daisy-chain or ring. Although the number of channels between any two locations is limited to 512, depending on routing requirements, the total number of channels on the network may be significantly higher. HyperMAC Gigabit isoch. dedicated 100Mbps+ Point-to- redundant Cat6=100m, ∞ 384+ 63 µs 384 kHz Ethernet Cat5, Cat6, Ethernet point link MM=500m, chan- and DSD or fiber SM=10km nels any IP isoch. coexists Ethernet, any L2 or provided by Cat5=100m, ∞ 32760 0.75 48 kHz www.axiaaudio.com medium with Ether- HTTP, IP 802.1 MM=2km, chan- ms net XML SM=70km nels Note: Network redundancy is provided by 802.1 Ethernet: STP, Link aggregation. mLAN IEEE- isoch. coexists IEEE- Tree provided by 1394 cable (2 100 m 63 354.17 192 kHz www.yamaha.co.jp 1394 with IEEE- 1394, 1394b power, 4 sig- devices µs 1394 MIDI nal): 4.5m (800 Mbps) Note: Many mLAN devices have a maximum sampling rate of 96 kHz, but this is a constraint of the stream extraction chips used rather than the core mLAN technology. Nexus Dedicated synch. dedicated Proprietary Ring provided by MM=2km 10 km 256 6sam- 96 kHz www.stagetec.com fiber fiber FDDI chan- ples nels Note: Fault tolernace is provided by the counter-rotating ring structure of FDDI, which tolerates single device or link failure. Optocore Dedicated synch. dedicated proprietary ring redundant MM=700m ∞ 512 41.6 µs 96 kHz www.optocore.com fiber Cat5/fiber ring SM=110 km chan- nels at 48 kHz Note: These entries refer to the classic fiber-based Optocore system; no information has yet been obtained regarding the Cat5e version. Confirmation is being sought for the figure of 110 km max distance. Rocknet Ethernet isoch. dedicated proprietary ring redundant Cat5e=150 10 km 160 400 µs 96 kHz www.medianumerics. Physical Cat5/fiber ring m, MM=2km, max, chan- @48 com Layer SM=20 km 99 de- nels kHz vices (48 kHz/24- bit) UMAN IEEE isoch. coexists IP-based daisy fault- Cat5e=50m, ∞ 400 354 µs 192 kHz 1394 and and with Ether- XFN chain tolerant Cat6=75 m, chan- +125 Ethernet asynch. net in ring, ring, device MM=1 km, nels µsper AV B tree, redundancy SM=>2 km (48 hop or star kHz/24bit) (with hubs) Note: Transport is listed for media streaming and control. Ethernet is also for control. Base latency measurement is provided for up to 16 daisy-chained devices. UMAN also supports up to 25 channels of H.264 video.

1The reader is referred to Section 2.3 for formal definitions of the various topologies. 2MM = multi-mode fibre, SM = single-mode fibre interoperability at the node level is 3.2 Implementation notes channels. Practical implementations of currently neither implemented nor Most audio networking formats are transport formats are usually realized foreseen. However, single-box format based on 100-Mbps or 1-Gbps full- with specialized logic. conversion is available in certain cases duplex links, as9 the available band- Commercially available physical layer (e.g., the Cobranet/Ethersound/Aviom width is considered sufficient for (PHY) devices are used for the critical conversion box by Whirlwind). streaming a significant number of interface to the physical medium. ➥

J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September 735 AES White Paper: Best Practices in Network Audio Specialized implementations on FPGAs computers to which they are con- 4 CASE STUDIES AND BEST or ASICs are used to implement the nected. Common interfaces include PRACTICES Media Access Control interface to the Peripheral Component Interconnect PHY and conversion to synchronous (PCI), IEEE 1394 (FireWire), and 4.1 Commercial Audio interfaces, in order to guarantee the USB. PCI, while slightly older and less Commercial audio systems span a necessary continuous data rate. convenient than the others, due to the large range of applications in terms of Some systems are also available on need to install special cards inside the scale and complexity. On the low end dedicated integrated circuits, allowing computer chassis, traditionally offered are sports bars and music systems in for tight integration in custom form the advantage of low latency. How- retail establishments. More complex factors. Licensing is an important ever, more recent versions of the vari- examples include the installed sound aspect of audio networking formats; ous interfaces (PCI Express, IEEE reinforcement systems in stadiums and most are patented at least partially by 1394b, and USB 2.0) offer significant arenas, various forms of audio distribu- the proposing companies. improvements in both throughput and tion in theme parks, background music Current systems include: latency. and/or paging systems for hospitals, Compliance with standards remains cruise ships, convention centers, and • module-level solutions, which a concern. In the case of PCI audio, no transportation facilities. In some cases, generally provide audio channels interoperability standards exist and these systems may be linked together in digital form and the recovered every manufacturer is left to develop a over great distances, for example, an master clock reference when proprietary solution, including OS interlinked network of train stations, or applicable drivers. USB 2.0 audio standards are sport campuses such as those built for gradually being adopted on certain the Olympics. For commercial applica- • intellectual property (IP) block platforms. Similarly, FireWire audio tions involving large distance or chan- solutions, in which a functional relies on standard protocols, although nel counts, audio networking is a well- logic block can be purchased and not all manufacturers fully support suited technology, offering reliability integrated into an FPGA; this them. The added advantage of and the ability to diagnose problems approach leads to tight integration FireWire is the option for direct remotely. Further discussion of net- but can cause high non-recurring communication between peripheral worked audio in commercial applica- costs, due to both licensing of the devices, without needing the host tions can be found in other AES publi- IP and integration of same into the computer.4 cations [10][11]. custom project USB uses a centralized protocol, which reduces the cost of peripherals. 4.2 Recording studios • ASIC-level solutions, which sim- USB moves the network processing The audio transmission needs of a plify licensing management and tasks to the host computer, placing recording studio are different from allow integration into space-con- fewer demands on peripheral devices those of theme parks or stadium instal- strained designs; some solutions, and making the device-side technology lations. While in the latter case dis- such as CobraNet devices, also less expensive. The multichannel USB tributed audio is sent long distance, include internal signal processing 2.0 audio specification was released in studio operations are usually confined capabilities. Only a few audio net- the Fall of 2006, finally establishing an to smaller spaces, with limited needs working formats have ASIC solu- interoperability standard for device for live multizone distribution. Record- tions available. drivers and audio communications. ing studios range in size from a single Table 2 compares the data through- room in the case of a home studio, to In the near future, DSP-based solu- put and resulting maximum channel larger multiroom facilities dedicated to tions are likely to become more count for different technologies. 100 recording orchestral works or televi- common, given their performance, BaseT and are sion and film soundtrack production. support for peripherals, and cost-effec- included here for reference. Postproduction studios can be as sim- tive integrability with signal process- ing functions. TABLE 2: AUDIO INTERFACE TECHNOLOGIES 3.3 Computer interfaces Format Bandwidth Max. Audio Channels Of particular relevance for network (@ 24 bits/sample, 96 kHz) audio hardware is the interface USB 1.1 12 Mbps 3 (theoretical limit) between the audio equipment and the USB 2.0 480 Mbps 80 (per end point) FireWire 400 400 Mbps 85 (theoretical limit) FireWire 800 800 Mbps 170 (theoretical limit) 4However, this functionality has not yet been fully utilized for audio devices. This Legacy PCI 1 Gbps 100 (typical) topic is being addressed by the AES Stan- PCI Express (PCI-E) x1 2.5 Gbps 800 (theoretical limit) dards Subcommittee on Digital Audio 100BaseT Ethernet 100 Mbps 32 (typical) (Working Group on Digital Input-Output Interfaces SC-02-12). Gigabit Ethernet 1 Gbps 320 (typical)

736 J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September AES White Paper: Best Practices in Network Audio ple as a cubicle with a computer, and Significant price decreases in network- Intranet or Web environment where, all-in-one portable multitrack recorders ing technology and storage have typically encoded, content is available occupy the low end of this spectrum. allowed for affordable network audio for streaming or uncompressed content A recording studio involves shorter storage for all sizes of archives, partic- is available for download. Once cable runs (under 100 m) and point-to- ularly in the NAS (Network Attached networked audio content is available point network communication topolo- Storage), SAN (Storage Area locally on a LAN, an archivist or gies; communication is typically Network), or DMSS (Digital Mass systems administrator can easily use a bidirectional. Microphone signals are Storage System) environments. combination of FTP, IIS, and/or port carried from each instrument in the SAN or DMSS allow for centralized forwarding to make that content avail- studio to the and/or storage and commonly employ a able via the Internet. This access audio recorder located in the control combination of redundant disk arrays, method is critical to researchers, room. Individual headphone mixes for load balancing, fault-tolerance, inte- students, and scholars. The end-user in each musician are sent from the control grated network security permissions, this case can be located outside of the room back into the studio for monitor- and ideally a tape library backup DMSS, yet still gain access to the ing purposes. Channel count varies by system. From the perspective of the content from any location. job complexity. A setup feeding 32 end-user, DMSS functions as if it were With network audio archives, physi- channels into the control room and a storage medium local to the worksta- cal security of media is not a primary returning a dozen stereo mixes (24 tion. The most common host-bus concern, given that the physical infras- mono channels) covers a majority of connection uses fiber channel for tructure is only accessed by a systems cases in professional recording, aside increased throughput and performance. administrator, but there are a new host from large symphonic works, which Workstations access the DMSS via of security issues within this environ- may require more than 100 channels. gigabit-Ethernet or gigabit-fiber ment. These include unauthorized Smaller project and home studios based switches. access and distribution, file integrity, around a computer, a control surface, There are several critical decisions and malicious code. The network and an audio interface, have also that determine the network design and architect must provide security permis- gained prominence for music record- operation of the audio archive, subject sions that will allow access to network ing. In these setups, recording one to cost constraints. First, disk space audio objects strictly on a per-user or instrument at a time and overdubbing must be available to house the current group basis. Furthermore a security are typical techniques. This reduces the collection of digital audio data, the system must be employed to prevent channel count to 4x4 or less. amount of analog audio to be digitized, users from copying and distributing When recording to a digital audio and for long-term growth as more content in an unauthorized manner. workstation (DAW), latency may be content is acquired. Storage may also The first line of defense against introduced due to computer process- be desired for high-resolution preser- unauthorized access to an audio ing. Since this can affect musicians’ vation copies along with lower-resolu- archive is at the entry point of a ability to hear themselves during the tion preview or access copies. Second, network, the router or gateway. The recording process, zero-latency moni- the number of concurrent users need- use of a firewall in front of the router toring, using a local analog feed, is ing access to the digital audio content or gateway provides security such as often employed. This is not subject to and whether such access must be real- stateful packet inspection, tarpits, digital delays and therefore offers time should also be considered. blacklisting, port scan detection, MAC instantaneous feedback. Bandwidth requirements may be deter- address filtering, SYN flood protection, mined by the need for streaming of network address translation (NAT), 4.3 Archives audio content for previewing or and typically a secure channel via VPN The most apparent benefit of a shared research purposes, access to content for remote users. network archive is the efficiencies for editing purposes, and the creation Critical to maintaining the integrity provided to workflow and access. The of derivative files. Calculations based of the audio data are antivirus and archivist no longer has to deal with on resolution can then be made to esti- spyware solutions, available as both cataloging and moving physical media. mate both the discrete and aggregate hardware and software forms. These With increased ease, audio data can be bandwidth needed for the shared stor- protect against Trojan horses, worms, located and accessed quickly, therefore age solution. spyware, malware, phishing, and other allowing for increased productivity and Network archives can readily be malicious code. Both hardware and workflow. made available to the public. A software solutions are available. Data networks used for storage and common implementation of this can be Similarly, for archive material access to audio archives must be seen in kiosks found in libraries, muse- distributed to the public through the centralized, scalable, fault-tolerant, and ums, as well as private industry. Kiosks Internet, there is a need for technology easily accessible to workstations can be integrated so that content is to provide security through such means throughout the network. Centralized accessible across the network from the as watermarking, access-control, and storage allows ease of administration, centralized mass-storage system. A copy-protection to prevent unautho- providing one point of control. second method for public access is in an rized use of the content. ➥

J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September 737 AES White Paper: Best Practices in Network Audio 4.4 Live performance star topologies, and global redundancy payload audio structure. An additional Networked audio systems have found on the scale of the entire system, as in tutorial document covers basic network important application in the context of ring topologies. It is worth noting that technologies and suitable protocols for live performance. In the case of mobile the manufacturers of network infras- streaming audio over IP [7]. sound reinforcement systems, simplic- tructure devices such as cables, optical ity of setup is critical, as the process fibers, and connectors have further 4.6 Distance education must be repeated frequently. Net- developed and significantly enhanced The use of live networked audio sys- worked audio technologies consider- their products in order to meet the tems is enabling communication across ably relieve the problems of wiring, demands for robustness of mobile great distances and bringing artists and allowing the transport of numerous systems and applications. Examples audiences closer together. Musicians audio channels and supplementary con- include the EtherCon connector, spend much of their lives practicing in trol data over optical fibers, coaxial, or armored fiber, and Cat5 cables. isolation as they perfect their art. With Cat5 cables. These are significantly the advent of high-speed networking, thinner and lighter than the traditional 4.5 Radio they are now able to interact with each multi-conductor audio snakes, facilitat- Networked audio over IP has become other in practice rooms, classrooms, ing both their transport and handling. common in radio operations for and on the stage, while physically Moreover, networked audio systems streaming of radio programs from located across the country or even on offer channel-routing possibilities and remote sites or local offices into main different continents. This enables the remote control of the equipment that studio centers. The infrastructure can exploration of pedagogy, styles, and enable fast reconfiguration without any be well managed private networks with techniques not previously available. physical user intervention. These tech- controlled quality of service. The Inter- Teachers and students have moved nologies enable the creation of an audio net is increasingly also used for vari- beyond the traditional exchange of raw bus between key points of the installa- ous cases of radio contributions, espe- audio files of musical performance for tion, allowing exchange of signals cially over longer distances. Radio comment and critique to live interac- between the monitor mixer, the front of correspondents may choose to use tion over networks, and with it, wit- house, and a production control room, either ISDN or the Internet via ADSL nessed the potential for greater peda- which offers a substantial improvement to deliver their reports. gogical accomplishment in a specified over traditional analog systems. As of this writing, a number of manu- time period. This has spawned the use Latency, also described in Section facturers provide equipment for such of network technology, often involving 4.7, is a critical problem in live events. applications. These include AEQ videoconferencing, in several music The generally agreed upon latency (Spain), AETA (France), APT (Ireland), programs to involve living composers limit for the entire system, between AVT (Germany), Digigram (France), in rehearsals and concerts, provide source and restored signal, is approxi- LUCI (NL), Mandozzi (CH), Mayah conducting seminars, and interactive mately 15–20 ms [16]. Assuming the (Germany), Musicam (US), Orban coaching sessions [15]. incompressible latency resulting from (US), Prodys (Spain), Telos (US), These distance music-education appli- A/D converters, D/A converters, Tieline (Australia), and Youcom (NL). cations raise various technical consider- console, and front of house DSP can be Until recently, end units from one ations regarding protocol choice (e.g., as high as 20 ms, this leaves very little manufacturer were not compatible with encoding format, bandwidth, inherent (if any) tolerance for additional delays those from other companies. Based on latency), production elements (micro- in network audio transport. To guaran- an initiative of German vendors and phones, lighting, far-site sound repro- tee optimal alignment of the loud- broadcasters in early 2006, the duction), and possibly the most speaker systems and avoid phasing, European Broadcasting Union (EBU) important, how to make the technology this latency must be constant in time at started a project group, N/ACIP, Audio as transparent as possible for the partici- each point of the network. Contribution over IP5, to suggest a pants. The choices are often dictated by Optimum reliability of the sound rein- method to create interoperability for the respective objectives. For example, forcement system is a constant concern. audio over IP, leading to the endorse- fidelity of the and the Multiple point-to-point connections of a ment of a standard [6]. perceived latency by the participants traditional analog system may provide a The requirements for interoperability become paramount when dealing with greater level of overall reliability, even are based on the use of RTP over UDP collegiate and post-collegiate musicians. though the probability of a single failure for the audio session and Session Protocols that are capable of transmit- is higher. Indeed, a connection break- Initiation Protocol (SIP) for signalling. ting (perceptually) uncompressed and down or the malfunction of a network IETF RFC documents for commonly minimally delayed audiovisual signals audio device may cause a system failure used audio formats in radio contribution, are often favored, although this choice on a larger scale. This explains the vari- such as G.722 speech encoding, MPEG comes at the cost of high bandwidth ety of solutions proposed to ensure Layer 2, and PCM, define the packet requirements. Moreover, many systems redundancy in data transmission. These lack the support for echo-cancellation, include individual device redundancy 5Further information can be obtained from thus requiring additional production for each connection, as is the case for Mathias Coinchon, [email protected] elements to improve the quality of inter-

738 J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September AES White Paper: Best Practices in Network Audio action. Proper microphones, cameras, following section, possibly even to systems, with some efforts incorporating lights, sound reproduction elements, perform before “remote” live audi- video as well.6 Although early demon- and a working understanding of live ences around the world. strations were limited to one-way sound reinforcement principles, are events, effective two-way interaction essential. Network infrastructure can 4.7 Networked musical was seen in 1996 with the Distributed prove troublesome as well for uncom- performance Rehearsal Studio, which succeeded in pressed data transmission, as many Artists have often pushed the envelope achieving a one-way delay of approxi- networks have security measures in of possibility of technologies for mak- mately 85 ms over an ATM circuit [17]. place, in particular in the “last mile,” ing music. The use of networks in musi- The fundamental challenge prevent- that block ports and throttle or filter the cal activity is no exception. Telecom- ing effective distributed musical network traffic associated with these munications infrastructures have been performance is typically one of protocols. used by artists in creative musical latency, as musical synchronization In projects with the precollegiate works of great diversity, and this con- becomes increasingly difficult to community, institutions often lack the tinues to this day. The use of networks attain under increased delay. In co- technical personnel, the financial in creative musical works include: present situations, visual cues such as resources, or the desire to learn how to the movement of a drummer’s hand or the conduit through which a perfor- use the more advanced software proto- • a conductor’s wand allow the musi- mance takes place cols for higher fidelity communication. cians to preserve timing. However, in As such, existing standards such as • a virtual space that is an assembly a computer network situation, visual H.323 [23] and its associated codecs point for participants from remote and auditory cues may be equally are generally used for musical interac- sites delayed. Although the International tions. Fortunately, these have improved Telecommunication Union defines 150 a musical instrument of sorts with in recent years, now incorporating • ms as the threshold for acceptable its own sonic characteristics higher bit-rate audio formats such as quality telephony, musicians perceive Polycom’s SirenStereo. These codecs • the musical object or canvas onto the effects at much lower values, often offer superior fidelity to the previous which sound is projected. less than 25 ms, depending on the G.722 [22] standard, and tend to suffice style of music they are playing. for discussion of musical elements such The move to wireless and mobile Experiments conducted at CCRMA as pitch, intonation, and tempo that are systems has not been lost on artists, and USC [2][3] have investigated typically discussed in these interac- with applications including a participa- these limits. Chew found that latency tions. Additional benefits of H.323 are tive soundwork for ringtones [18] and tolerance in piano duet performance its widespread adoption, standardized musical composition using GPS track- varied depending on the tempo and format, and support for remote manipu- ing with bidirectional streaming over types of onset synchronization lation. However, cost versus quality UMTS mobile phone networks [21]. required; for a particularly fast move- versus ease-of-use remain trade-offs. The vision of networked musical ment, this figure was as low as 10 ms. Software-based systems are free or performance (NMP) has been a holy Research systems such as those from nearly free, but rarely deliver compara- grail of telecommunications and video- Stanford7 and McGill University8 have ble quality to dedicated hardware, and conferencing technologies with respect demonstrated high quality, low-latency are often manipulated by complex to its demands on both signal quality audio distribution over the Internet. GUIs, which may be imposing on the and minimized latency. Actual Until recently, commercially available average user. In either case, the latency distributed performance, making use systems, as described in Section 3 were inherent in such protocols generally of network audio systems, has a history limited to LAN distribution and thus precludes distributed performance, going back at least to 1966, with the unsuitable for the demands of low- discussed in the next section. Finally, work of Max Neuhaus in his pieces for latency spatially distributed perfor- while some codecs support multiple public telephone networks. Experiments mance over larger distances. In the last audio channels for surround sound continued over radio, telephone, and few years, several commercial products reproduction, it is often hard to find a satellite networks, progressing in the with network layer addressing capabil- partnering institution capable of repro- 1990s to ISDN and ATM-based ity (see Section 2.1) have become avail- ducing the surround signal faithfully. able. The latency achievable by such As network technology continues to systems is, of course, affected both by 6 See http://www.cim.mcgill.ca/sre/projects/ improve, the richness and fidelity of rtnm/history. for a history of milestones the compression algorithm employed (if the distance education experience will in spatially distributed performance. any) and the network distance the signal make these interactions even more 7 http://ccrma.stanford.edu/groups/sound- must travel. valuable. The future holds the potential wire/software/ Current latencies achievable over the 8 for musicians to collaborate with a http://ultravideo.mcgill.edu research Internet (e.g., Internet2 9 virtual acoustical environment that Due both to switching and routing delays Abiline, National LambdaRail, and as well as the signal propagation time, provides the realism of an in-person whose speed through copper or fiber is Canarie CA*net4) impose a one-way experience and, as discussed in the approximately 2/3c. delay of approximately 50 ms9 across ➥

J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September 739 AES White Paper: Best Practices in Network Audio North America, which is borderline typical Internet distribution architec- living room electronics opportunity. tolerable for most small group perfor- tures. In general, the choice of whether Meanwhile, CE manufacturers are mances. Awareness of these limitations to employ audio compression, and if so, adding networking capability to their has motivated investigation of the alter- what kind of compression, is usually a products. One can now buy a home native approach of adding delay to trade-off between quality, data rate, theatre amplifier that performs the networked communication in order to latency, error robustness, and complex- usual audio/video switching, decoding ensure that the musicians are synchro- ity. Additionally, there is the choice and amplifier functions, and can also nized on the beats of a common tempo. between standardized algorithms, which stream and play internet radio stations, As an often important addition to provide a long-term guarantee to be as well as audio stored on any PC in network audio, simultaneous video decodable, or a proprietary scheme, the home. transmission may introduce additional which in many cases can be hand- latency, either due to characteristics of tailored to a specific application. ACKNOWLEDGEMENT the acquisition and display hardware, Elizabeth Cohen, past president of the or the use of signal compression to 4.8 Consumer AES, provided the initial inspiration reduce the bandwidth requirements. At present, the most popular method and impetus for the preparation of this Under these conditions, improved inter- for audio distribution in the home paper. action may be obtained by decoupling involves expensive, proprietary, analog the audio and video channels, transmit- or digital systems. These systems use REFERENCES ting the former at minimal latency custom wiring to carry the audio and [1] Bluetooth Audio Video Working without concern for audiovisual control signals. However, two tech- Group, “Bluetooth Specification: synchronization [4]. The decoupling of nologies are causing a revolutionary Advanced Audio Distribution Profile,” visual cues from audio, however, has change. The first is the continuing Bluetooth SIG Inc. (2002). musical consequences. When given the growth and popularity of broadband choice in the tradeoff between image Internet connections into the home, [2] C. Chafe, M. Gurevich, G. Leslie, quality and smoothness, musicians analogous to broadcast television, and S. Tyan, “Effect of Time Delay on preferred a pixelated image with except that the information only enters Ensemble Accuracy,” in Int. Sympo- smooth motion over a high-resolution the home at one point, and is typically sium on Musical Acoustics (2004 Mar.- image at lower frame rate decoupled connected to a single PC.10 This leads Apr.). from the audio channel [20]. to the second technology, that of inex- In the context of NMP, sources of pensive computer networking, either [3] E. Chew, A. A. Sawchuk, R. Zim- delay may include audio coding, acous- wired or wireless, which allows for merman, V. Stoyanova, I. Toshe , C. tic propagation in air, packetization, distribution of the broadband connec- Kyriakakis, C. Papadopoulos, A. R. J. network transport, and jitter buffering. tion to many PCs and network appli- Franücois, and A. Volk, “Distributed Traditionally, in particular when band- ances in the home. Immersive Performance,” in Annual width is a limiting factor, audio coding This same home network can now Nat. Assoc. of the Schools of Music has been an obvious target for improve- also be used to distribute audio and (NASM), San Diego (2004 Nov.). ment. An analysis of algorithmic delay video in an inexpensive, non-propri- in popular consumer audio codecs, etary manner. Underlying technology [4] J. R. Cooperstock, J. Roston, and such as MP3 and AAC [9] found that standards, such as Universal Plug and W. Woszczyk, Broadband Networked as of 2004, there existed no MPEG- Play (UPnP) are being used for connec- Audio: Entering the Era of Multisen- standardized audio coding system that tion management while HTTP GET sory Data Distribution,” in 18th Int. introduced less than 20 ms delay. functions can be used to transfer the Congress on Acoustics (2004 Apr.). Recent developments, namely AAC data itself. Interoperability standards Enhanced Low Delay (ELD), further are being created to make sure that [5] European Broadcasting Union, reduce the inherent encoding delay to 5 devices from different manufacturers “Audio Contribution over IP, ms, although an additional 10 ms is will connect and work together. www.ebu.ch/en/technical/ required for packetization. Applying a Another related trend is the conver- ibc2006/ibc_2006_acip.php. completely different coding paradigm gence of PC and Consumer Electronics compared to most other audio codecs, (CE) technologies. PC makers, driven [6] European Broadcasting Fraunhofer’s Ultra Low Delay Audio by Intel and Microsoft, are adding Union, EBU TECH 3326, “Audio codec (ULD) typically reaches an more audio and video capability to the Contribution over IP: Requirements inherent encoding delay of 2.6 ms with PC platform in an effort to win the for Interoperability,” www.ebu- only another 2.6 ms required for audio acip.org/Main_Page. packetization at 48 kHz sampling rate. At this setting, quality and data rates 10 See http://news.bbc.co.uk/1/hi/technol- [7] European Broadcasting Union, are comparable to those of MP3. ogy/4736526.stm, http://www.websiteopti- EBU -TECH 3329, “A Tutorial on mization.com/bw/0607, and http://www. Compression is often necessary to platinax.co.uk/news/29-03-2007/uk-leads- Audio Contribution over IP” (2006), satisfy the bandwidth constraints of western-europe-in-broadband-penetration/ www.ebu-acip.org/Main_Page.

740 J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September AES White Paper: Best Practices in Network Audio [8] A. Floros, T. Karoubalis, and S. Int., 14(8) (2005 Aug.), www.whirl Koutroubinas. “Bringing Quality in the windusa.com/tech.html. 802.11 Wireless Arena,” in Broadband Wireless and WiMax. Int. Eng. Consor- [17] D. Konstantas, Y. Orlarey, O. Car- tium (2005 Jan.). bonel, and S. Gibbs, “The Distributed Musical Rehearsal Environment,” [9] M. Gayer, M. Lutzky, G. Schuller, , IEEE MultiMedia, 6(3):54–64 (1999). U. Krämer, and S. Wabnik, “A Guide- line to Audio Codec Delay,” presented [18] G. Levin, “A Personal Chronol- at the 116th Convention of the Audio ogy of Audiovisual Systems Engineering Society, paper 6062, Research,” in New Interfaces for Musi- www.aes.org/e-lib (2002 May). cal Expression (NIME), pp. 2–3, Sin- gapore (2004). [10] K. Gross, “Audio networking: Applications and requirements,” J. [19] R. Stamberg, “Back to Basics,” Audio Eng. Soc., vol. 54, pp. 62–66 AV Video Multimedia Producer, (2001 (2006 Feb.). Jul.), www.highbeam.com/doc/1G1- 77874253.html. [11] K. Gross and D. J. Britton. “Deploying Real-Time Ethernet Net- [20] A. Tanaka, “von telepräsenz su works,” in the Proceedings of the AES co-erfahrung: ein jahrzehnt netzwerk- 15th UK Conference on Moving musik (From Telepresence to Co-expe- Audio: Pro-Audio Networking and rience: A Decade of Network Music),” Transfer (2000 May). Neue Zeitschrift für Musik, 5:27–28, 2004 Sep./Oct.). [12] “BSS Audio’s Soundweb Offers Flexible System for Distributing and [21] A. Tanaka, P. Gemeinboeck, and Routing Sound,” www.webcitation. A. Momeni, “Net_dérive, A Participa- org/5emZkXC1U. tive Artwork for Mobile Media,” in Mobile Media (2007). [13] IEEE Standards for Information Technology, ISO/IEC 8802-11:1999 [22] Telecommunication Standardiza- Telecommunications and Information tion Sector of ITU, “Recommendation Exchange Between Systems—Local G.722: 7 kHz Audio-Coding Within 64 and Metropolitan Area Network - kbit/s,” International Telecommunica- Specific Requirements -Part 11: Wire- tion Union, (1988 Nov.), www.itu. less LAN Medium Access Control int/rec/T-REC-G.722/e. (MAC) and Physical Layer (PHY) Specifications: Higher-Speed Physical [23] Telecommunication Standardiza- Layer (PHY) Extension in the 2.4GHz tion Sector of ITU, “Recommendation band, IEEE Standards Office, New H.323: Packet-Based Multimedia York, NY, USA (1999 Sep.). Communications Systems,” Interna- tional Telecommunication Union [14] Institute of Electrical and Elec- (2006 Jun.), www.itu.int/rec/T-REC- tronics Engineers, IEEE Standard H.323/e. 802.16-2004 for Local and Metropoli- tan Area Networks -Part 16: Air Inter- [24] M. Ward, “Broadband Kills Off face for Fixed Broadband Wireless Consumer ISDN,” BBC News, Access Systems. IEEE Standards http://news.bbc.co.uk/1/hi/technol- Office, New York, NY, USA (2004). ogy/6519681.stm.

[15] Internet2, “Member Community CONTACT INFORMATION Education Initiatives, www.internet2. The authors of this white paper can be edu/arts/member-education.html. reached through the Audio Engineer- ing Society’s Technical Committee on [16] A. Keltz, Opening Pandora’s Network Audio Systems. See Box? “The ‘L’ Word—Latency and www.aes.org/technical/nas for up to Digital Audio Systems,” Live Sound date contact information.

J. Audio Eng. Soc., Vol. 57, No. 9, 2009 September 741