The RATS Radio Traffic Collection System
Total Page:16
File Type:pdf, Size:1020Kb
The RATS Radio Traffic Collection System Kevin Walker, Stephanie Strassel Linguistic Data Consortium University of Pennsylvania, Philadelphia, PA USA {walker|strassel}@ldc.upenn.edu paid to the motivation behind the radio channels included in Abstract the system, implementation challenges and solutions, and the The DARPA RATS Program focuses on the development of resulting data. new technologies for identifying and processing speaker-to- speaker communications over degraded radio channels. In 2. System Design order to build a corpus to address this research question, we Many modern two-way communications systems used in developed a system that takes a clean source signal and urban environments by law enforcement and emergency transmits it over eight different radio channels, where the services have transitioned to digital modes like APCOP25, variation from channel to channel results in a range of MOTOTRBO, or TETRA. These established systems rely on degradation modes. Each channel included in the collection extensive repeater installations to maximize service area, system has unique characteristics targeting different utilize trunking or packet switching techniques in order to modulation types, different carrier channel bandwidths, and maximize the number of concurrent users, and in some cases different operating bands. employ encryption in order to control information exposure. It is important to note that these systems are thoroughly 1. Introduction engineered, may be cross-connected with the public telephone The goal of the DARPA Robust Automatic Translation of network and computer networks, and represent an established Speech (RATS) Program is to develop technologies capable “installation”. They aren’t designed for ad hoc of performing speech activity detection (SAD), language communications, and their operational parameters are clearly identification (LID), speaker identification (SID) and defined – their designers hope that the systems deliver keyword spotting (KWS) in potentially speech-containing predictable level of service. As such, internally these signals received over communication channels that are networks may be of relatively less interest when looking at extremely noisy and/or highly distorted. The first phase of the questions of system performance in the face of unpredictable program evaluates these technologies in five languages – variation. Levantine Arabic, Farsi, Urdu, Pashto and Dari – on open As a contrastive example, consider the Air Traffic Control source data that nonetheless represents real-world operational system used currently in the US and around the world. This scenarios. Later phases of RATS incorporate secondary system is also well connected, has considerable redundancy, evaluations using classified “real world” data. and must provide at least some level of quality of service. As with other large human language technology Where ATC differs from the communications radio evaluation programs, RATS requires a multi-pronged, installations used by law enforcement (for example) is in the integrated and creative data collection approach to satisfy the fact that its users are not predictable in the same way. All combined demands of relevant data, language diversity, rich pilots, whether operating large commercial airplanes, annotations and high volume. The unique challenge for RATS corporate jets, or private airplanes, depend on instructions, corpus development is the additional need to produce training, assistance, and direction from air traffic controllers. It may be development and evaluation data with extremely challenging the case that large airports provide separate channels acoustic properties while simultaneously maintaining depending on vehicle type, but even within a given category, efficient, cost-effective and scalable collection and annotation we see variations in radio equipment performance. ATC methods. To address this challenge we have developed a traffic is typically not encrypted, uses amplitude modulation “collect clean, broadcast dirty” methodology. In this model to avoid FM capture effect, relies heavily on the VHF band, we employ well-tested collection methods [1, 2] to produce and has a constantly changing set of users. The system has to clean-signal source data for each language and task. Manual be open and flexible in order to accomplish its mission – it annotation (speaker and language auditing, segmentation and will still need flexibility and robustness to variation across time stamping, transcription) is performed on this non- users even as it modernizes. degraded signal to assure efficiency and accuracy. We then When developing the data collection system for RATS, introduce the desired noise properties into live recordings and we envisioned a set of radio communications scenarios with content rebroadcasts by means of a Multi Radio-Link Channel significantly greater variation and degradation than one would Collection System. The system includes find in the highly controlled communications networks, and • A set of target signal transmitters/transceivers even greater than what one would find in the Air Traffic • A set of interference signal transmitters, Control model. We imagined a scenario in which interlocutors • A set of listening station receivers, were located in an indeterminate terrain, utilizing equipment of unknown origin and unknown operation parameters, and • Signal collection and digitization apparatus whose communication was being monitored by a third party In this paper we discuss design and implementation of whose situation vis-à-vis the interlocutors was also not know the Multi Radio-Link Channel System, with special attention a priori. In this scenario, the communications path between interlocutors might be of varying quality, and, furthermore, 3. System Implementation the communications path between the interlocutors and the monitor would almost certainly be of varying quality, and 3.1. Overview would be highly dependent on environmental, atmospheric, and structural factors. Rather than taking communications The collection system includes a set of target signal among a police force or a control tower and a Cessna as our transmitters/transceivers, a set of listening station receivers, target, we were more interested in looking to something like and signal collection and digitization apparatus. It is used to the communications among a group of amateur radio process both live sessions and rebroadcast sessions. operators, or the handheld radio communications of a team Recordings selected for rebroadcast consist of dialogues and playing Capture-The-Flag in a local state forest. conversational content. In order to simulate Push-To-Talk, the Consequently, we targeted a channel model of a single Shared transceivers are keyed under computer control. The control Channel used in half-duplex mode. This model implies that computer is connected to the transceivers via two routes: the operators may either transmit or receive, but may not do both audio signal route and the control route. The audio signal simultaneously; furthermore, attempts of more that one route originates with a Lynx Studio Technologies AES16e operator to utilize the channel at the same time result in digital audio interface connected to a Lucid 88192 Digital to partial or total communication failure. Analog converter. The DAC unit is connected to matrix One type of variation encountered in radio mixer, which provides analog audio to the input of each communications is in speech intelligibility. Our collection transceiver. system is interesting because it produces samples whose The matrix mixer audio output is 600-ohm line level, and intelligibility varies both in terms of degree and in terms of must be impedance matched to the audio input for each type. The effects of signal degradation present differently on a individual transceiver. Additionally, independent transformers single sideband HF radio channel than they do on an FM UHF are used to electrically isolate the transceivers from the channel, and both fail in different ways than a spread control computer and audio processing hardware. In order to spectrum system operating in the SHF band. In other words, control the push to talk functions of the transceivers, the while a phenomenon like multipath fading can be seen across control computer is connected to an SPDT relay bank via all bands, its effect - that is to say, the effect on the contents TCP/IP. The relay bank includes eight relays, each of which is of the channel - varies greatly depending on the operational used for one transceiver (each with a customized wiring parameters of the radio equipment in question. harness). The control computer monitors the signal strength of A critical first step in developing our collection platform the input source recording and keys the transceivers based on was a review of extensive work that went into the a custom signal activity detector. development of the Tactical Speaker Identification Speech 3.2. Estimation Of System Induced Signal Degradation Corpus (TSID) [3]. This corpus, produced at MIT Lincoln Labs by Doug Reynolds and Gerald C. O’Leary was used as a In order to estimate the signal-to-noise ratio of each model for our collection protocol. In particular, the selection retransmission, we performed a modified signal to and configuration of radio transceivers, the use of Map Task noise+distortion measurement for each channel. We scenarios to elicit transactional, task oriented speech, and the transmitted a 400ms, 1KHz sine test tone through