1 A device for human ultrasonic echolocation Jascha Sohl-Dickstein, Santani Teng, Benjamin M. Gaub, Chris C. Rodgers, Crystal Li, Michael R. DeWeese and Nicol S. Harper

Abstract—Representing and interacting with the external much higher sound frequencies for their echolocation, and world constitutes a major challenge for persons who are blind or have specialized capabilities to detect time delays in sounds. visually impaired. Nonvisual sensory modalities such as audition The most sophisticated echolocation abilities are found in and somatosensation assume primary importance. Some non- human species, such as bats and dolphins, by active echolo- microchiropteran bats (microbats) and odontocetes (dolphins cation, in which emitted acoustic pulses and their reflections are and toothed whales). For example, microbats catch insects on used to sample the environment. Active echolocation is also used the wing in total darkness, and dolphins hunt fish in opaque by some blind humans, who use signals such as tongue clicks water. Arguably simpler echolocation is also found in oilbirds, or cane taps as mobility aids. The ultrasonic pulses employed swiftlets, Rousettas megabats, some shrews and tenrecs, and by echolocating animals have several advantages over those used by humans, including higher spatial resolution and fewer even rats [1]. Microbats use sound frequencies ranging from diffraction artifacts. However, they are unavailable to unaided 25-150 kHz in echolocation, and use several different kinds of humans. Here we present a device that combines principles of echolocation calls [2]. One call, the broadband call or chirp, a ultrasonic echolocation and spatial to present human brief tone (< 5 ms) sweeping downward over a wide frequency users with environmental cues that are i) not otherwise available range, is used for localization at close range. A longer duration to the human auditory system and ii) richer in object and spatial information than the more heavily processed sonar cues call, the narrowband call, which with a narrower frequency of other assistive devices. The device consists of a wearable range is used for detection and classification of objects typ- headset with an ultrasonic emitter and stereo microphones ically at longer range. In contrast to microbats, odontocetes affixed with artificial pinnae. The echoes of ultrasonic pulses use clicks; shorter in duration than bat calls and with sound are recorded and time-stretched to lower their frequencies into frequencies up to 200 kHz [3]. Odontocetes may use shorter the human auditory range before being played back to the user. We tested performance among naive and experienced sighted calls as sound travels ∼ 4 times faster in water, whereas bat volunteers using a set of localization experiments. Naive subjects calls may be longer to have sufficient energy for echolocation were able to make laterality and distance judgments, suggesting in air. Dolphins can even use echolocation to detect features that the echoes provide innately useful information without that are unavailable via vision: for example, dolphins can tell prior training. Naive subjects were unable to make elevation visually identical hollow objects apart based on differences in judgments. However, in localization testing one trained subject demonstrated an ability to judge elevation as well. This suggests thickness [4]. that the device can be used to effectively examine the environment Humans are not typically considered among the echolocat- and that the human auditory system can rapidly adapt to these ing species. Remarkably, however, some blind persons have artificial echolocation cues. We contend that this device has the demonstrated the use of active echolocation in their daily lives, potential to provide significant aid to blind people in interacting interpreting reflections from self-generated tongue clicks for with their environment. such tasks as obstacle detection [5], distance discrimination Index Terms—echolocation, ultrasonic, blind, assistive device [6], and object localization [7], [8]. The underpinnings of human echolocation in blind (and sighted) people remain poorly characterized, though some informative cues [9], neural I.INTRODUCTION correlates [10], and models [11] have been proposed. While In environments where vision is ineffective, some animals the practice of active echolocation via tongue clicks is not have evolved echolocation — perception using reflections of commonly taught, it is recognized as an orientation and self-made sounds. Remarkably, some blind humans are also mobility method [12], [13]. However, most evidence in the able to echolocate to an extent, frequently with vocal clicks. existing literature suggests that human echolocation ability, However, animals specialized for echolocation typically use even in blind, trained experts, does not approach the precision and versatility found in organisms with highly specialized Jascha Sohl-Dickstein is with Stanford University and Khan Academy. echolocation mechanisms. e-mail: [email protected]. An understanding of the cues underpinning human auditory Santani Teng is with Massachusetts Institute of Technology. spatial perception is crucial to the design of an artificial e-mail: [email protected] Benjamin M. Gaub is with University of California, Berkeley. echolocation device. Lateralization of sound sources depends e-mail: [email protected] heavily on binaural cues in the form of timing and intensity Chris C. Rodgers is with Columbia University. differences between sounds arriving at the two ears. For ver- e-mail: [email protected] Crystal Li is with University of California, Berkeley. tical and front/back localization, the major cues are direction- e-mail: [email protected] dependent spectral transformations of the incoming sound Michael R. DeWeese is with University of California, Berkeley. induced by the convoluted shape of the pinna, the visible outer e-mail: [email protected] Nicol S. Harper is with University of Oxford. portion of the ear [14]. Auditory distance perception is less e-mail: [email protected] well characterized than the other dimensions, though evidence 2 suggests that intensity and the ratio of direct-to-reverberant Device&layout& energy play major roles in distance judgments [15]. Notably, Le6&earpiece& the ability of humans to gauge distance using pulse-echo Le6& Ultrasonic& ar7ficial& delays has not been well characterized, though these serve microphone& Computer& pinna& Echo& as the primary distance cues for actively echolocating animals 1) Sends&chirp&train&to& speaker&& [16]. 2) Slows&sound&from& Ultrasonic& microphone&&25C Chirp& Object& speaker& Here we present a device, referred to as the Sonic Eye, that fold&a6er&each&click& uses a forehead-mounted speaker to emit ultrasonic “chirps” and&sends&to& corresponding& (FM sweeps) modeled after bat echolocation calls. The echoes earpiece& Right& Echo& Ultrasonic& ar7ficial& microphone& are recorded by bilaterally mounted ultrasonic microphones, pinna& each mounted inside an artificial pinna, also modeled after (a) Right&earpiece& bat pinnae to produce direction-dependent spectral cues. After each chirp, the recorded chirp and reflections are played back 1 to the user at m of normal speed, where m is an adjustable magnification factor. This magnifies all temporally based cues linearly by a factor of m and lowers frequencies into the human audible range. For empirical results reported here, m is 20 or 25 as indicated. That is, cues that are normally too high or too fast for the listener to use are brought into the usable range simply by replaying them more slowly. (b) (c) Although a number of electronic travel aids that utilize sonar have been developed (e.g., [17], [18], [19], [20]), very few Fig. 1. (a) Diagram of components and information flow. (b) Photograph provide information other than range-finding or a processed of the current hardware. (c) Photograph of one of the artificial pinnae used, modeled after a bat ear. localization cue, and none appears to be in common use. Studies of human hearing suggest that it is very adaptable to altered auditory cues, such as those provided by different binaural spacing or altered pinna shapes. Additionally, in blind subjects the can be recruited to also represent auditory cues [10]. Our device is designed to utilize the intrinsic cue processing and learning capacity in the human auditory system [21], [22]. Rather than provide a heavily processed input to the user as other devices have, which ulti- mately will be relatively impoverished in terms of information about the world, we provide a minimally processed input that, while initially challenging to use, has the capacity to be much more informative about the world and integrate better with the innate human spatial hearing system. The relatively raw echoes contain not just distance information but vertical Fig. 2. Schematic of waveforms at several processing stages, from ultrasonic and horizontal location, as well as texture, geometric, and speaker output to stretched pulse-echo signal headphone output presented to material cues. Behavioral testing suggests that novice users user. Red traces correspond to the left ear signal, and blue traces to right ear can quickly judge the laterality and distance of objects, and signal. with experience can also judge elevation, and that the Sonic Eye thus demonstrates potential as an assistive mobility device. A preliminary version of this work was presented in poster tapered using a cosine ramp function. The computer, in a form at the ICME 2013 MAP4VIP workshop [23]. small enclosure mini-ITX case, runs Windows 7 and per- forms all signal processing using a custom Matlab program. Step 2: The chirp is played through the head-mounted tweeter II.SPECIFICATIONS AND SIGNAL PROCESSING speaker. In order to play the chirp, it is output through an The flow of information through the Sonic Eye is illustrated ESI Juli@ soundcard with stereo 192 kHz input and output, in Figure 1a, and the device is pictured in Figure 1b. Record- amplified using a Lepai TRIPATH TA2020 12 Volt stereo ings of a sound waveform moving through the system are amplifier, and finally emitted by a Fostex FT17H Realistic presented in Figure 2. A video including helmet-cam video of SuperTweeter speaker. the device experience is included in Supplemental Material. Step 3: The computer records audio through the helmet The signal processing steps performed by the Sonic Eye, mounted B&K Type 4939 microphones. For all experiments, and the hardware used in each step, are as follows: the recording duration was 30 ms, capturing the initial Step 1: The computer generates a chirp waveform, consisting chirp and echoes from objects up to 5 m away. The signal of a 3 ms sweep from 25 kHz to 50 kHz with a constant from the microphones passes through a B&K 2670 preamp sweep rate in log frequency. The initial and final 0.3 ms are followed by a B&K Nexus conditioning Amplifier before 3

being digitized by the ESI Juli@ soundcard. Step 4: The recorded signal is bandpass-filtered using Butter- worth filters from 50 to 25 kHz, and time-dilated by a factor of m. For m = 25, the recorded ultrasonic chirp and echoes now lie between 1 and 2 kHz. Step 5: The processed signal is played to the user through AirDrives open-ear headphones, driven by a Gigaport HD USB soundcard. Critically, the open-ear design leaves the ear canal unobstructed, ensuring safety in applied situations. (a) (Note that in Experiments 1 and 2 described below, conven- tional headphones were used for stimulus delivery.) Ultrasonic power (25-50 kHz) vs. angle 0 In the current version of the device, the speaker and two ultrasonic microphones housed in artificial pinnae are mounted on a bicycle helmet. The pinnae are hand-molded from clay 5 to resemble bat ears. The rest of the components are mounted within a baby carrier backpack, which provides portability, ventilation, and a sturdy frame. A lithium-ion wheelchair 10 battery is used to power the equipment. We note that in its current form, the Sonic Eye prototype is a proof-of-principle device whose weight and size make it unsuited to everyday use 15 Speaker by blind subjects and extensive open-field navigation testing.

dB relative to 0 degrees Microphone To overcome these limitations we are developing a low-cost 20 miniaturized version that retains all the functionality, with a 0 10 20 30 40 50 60 70 80 90 user interface specifically for the blind. However, user testing Angle (degrees) with the current version has provided a proof of principle of (b) the device’s capabilities, as we describe below. Fig. 3. Measurement of transfer functions for ultrasonic microphones and ultrasonic speaker as a function of angle. (a) Angular transfer function setup. A. Measurement of Transfer Functions (b) Angular transfer function data. For the microphone, the sensitivity relative to the sensitivity at zero degrees is plotted; for the speaker, the emission power We measured angular transfer functions for the ultrasonic relative to the emission power at zero degrees is plotted. speaker and microphone in an anechoic chamber (Figure 3). The full-width half-max (FWHM) angle for speaker power was ∼ 50◦, and for the microphone ∼ 160◦. Power was measured using bandpass Gaussian noise between 25 kHz and 50 kHz. an 18-cm-diameter plastic disc placed in positions appropriate to the stimulus condition, as illustrated in Figure 4a. For III.EXPERIMENTAL METHODS laterality judgments, the disc was suspended from the testing To explore the perceptual acuity afforded by the artificial room ceiling via a thin (< 1 cm thick) wooden rod 148 cm echoes, we conducted three behavioral experiments: two in in front of the emitter and 23.5 cm to the left or right of which we presented pulse-echo recordings (from the Sonic the midline. The “left” and “right” conditions were thus each Eye) via headphones to naive sighted participants, and a ∼ 9◦ from the midline relative to the emitter, with a center- practical localization test with one trained user wearing the to-center separation of ∼ 18◦. For depth judgments, the disc device. In both Experiments 1 and 2, we tested spatial dis- was suspended on the midline directly facing the emitter at crimination performance in separate two-alternative forced- a distance of 117 or 164 cm, separating the “near” and “far” choice (2AFC) tasks along three dimensions: i) laterality (left- conditions by 47 cm. Finally, for elevation judgments, the disc right), ii) depth (near-far), and iii) elevation (high-low). The was suspended 142 cm in front and 20 cm above or below the difference between Experiments 1 and 2 is that we provided midline, such that the “high” and “low” conditions were ∼ 8◦ trial-by-trial feedback in Experiment 2, but not Experiment 1. above and below the horizontal median plane, respectively, This allowed us to assess both the intuitive discriminability of separated by ∼ 16◦. In all cases, the helmet with microphones the stimuli (Experiment 1) as well as the benefit provided by and speakers was mounted on a styrofoam dummy head. feedback (Experiment 2). To reduce the likelihood of artifactual cues from any single In Experiment 3 we tested horizontal and vertical local- echo recording, we recorded five “chirp” pulses (3-ms rising ization performance in a separate task on a single user (NH) frequency sweeps, time dilation factor m = 25, truncated to 1 with approximately 6 hours of training experience over several s length) and the corresponding echoes from the disc for each days. stimulus position. Additionally, echoes from each stimulus position were recorded with and without the artificial pinnae A. Methods, Experiment 1 attached to the microphones. Thus, for each of the six stimulus 1) Stimuli: For each of the three spatial discrimination tasks positions, we had 10 recorded pulse-echo exemplars, for a total (laterality, depth and elevation), echoes were recorded from of 60 stimuli. 4

2) Procedure: Sighted participants (N = 13, 4 female, d = 164 cm d = 148 cm mean age 25.5 y) underwent 20 trials for each of the three d = 142 cm spatial discrimination tasks, for a total of 60 trials per session. d = 117 cm θ = ±8.0° θ (20 cm) θ The trials were shuffled such that the tasks were randomly θ = ±9.0° (23.5 cm) interleaved. Sound stimuli were presented on a desktop or laptop PC using circum-aural headphones (Sennheiser HD202) at a comfortable volume, ∼ 70 dB. No visual stimuli were presented; the screen remained a neutral gray during auditory (a) Depth Elevation Laterality stimulus presentation. On each trial, the participant listened Experiment 1 - No Feedback Experiment 1 - No Feedback to a set of three randomly selected 1-s exemplars (pulse-echo 1 * * recordings) for each of two stimulus conditions. Depending 0.9 Pinna 1 0.8 No pinna 0.9 t on the spatial task, the participant then followed on-screen t 0.8 e c e c 0.7 r r 0.7 o r instructions to select from two options; whether the second o r 0.6 0.6 0.5 0.5 tion c exemplar represented an object to the left or right; nearer or tion c 0.4 farther; or above or below relative to the echoic object from 0.4

opo r 0.3 opo r r r 0.3 P P 0.2 Mean the first exemplar. Upon the participant’s response, a new trial 0.2 Pinna 0.1 No pinna began immediately, without feedback. 0.1 0 Above chance 0 Below chance (b) Depth Elevation Laterality (c) Depth Elevation Laterality

Experiment 2 - Feedback Experiment 2 - Feedback B. Methods, Experiment 2 1 * * * * 1) Stimuli: Stimuli in Experiment 2 were nearly identical 0.9 1 0.8 0.9 t to those in Experiment 1, except that we now provided trial- t 0.8 e c e c 0.7 r r 0.7 o r by-trial feedback. To prevent participants from improving o r 0.6 0.6 0.5 0.5 tion c their performance based on artifactual noise that might be tion c 0.4 present in our specific stimulus set, we filtered background 0.4

opo r 0.3 opo r r r 0.3 P P 0.2 noise from the original recordings using the spectral noise 0.2 0.1 gating function in the program Audacity (Audacity Team, 0.1 0 0 http://audacity.sourceforge.net/). All other stimulus character- (d) Depth Elevation Laterality (e) Depth Elevation Laterality istics remained as in Experiment 1. Fig. 4. Two alternative forced choice spatial localization testing. (a) A 2) Procedure: Sighted volunteers (N = 12, 5 female, mean diagram of the configurations used to generate stimuli for each of the depth, age 23.3 y) were tested on the same spatial discrimination elevation, and laterality tasks. (b) The fraction of stimuli correctly classified tasks as in Experiment 1. After each response, participants with no feedback provided to subjects (N = 13). Light gray bars indicate results for stimuli recorded with artificial pinnae, while dark gray indicates that were informed whether they had answered correctly or incor- pinnae were absent. The dotted line indicates chance performance level. Error rectly. All other attributes of the testing remained the same as bars represent 95% confidence intervals, computed using Matlab’s binofit in Experiment 1. function. Asterisks indicate significant differences from 50% according to a two-tailed binomial test, with Bonferroni-Holm correction for multiple comparisons. (c) The same data as in (b), but with each circle representing the performance of a single subject, and significance on a two-tailed binomial C. Methods, Experiment 3 test determined after Bonferroni-Holm correction over 13 subjects. (d) and (e) The same as in (b) and (c), except that after each trial feedback was provided We conducted a psychophysical localization experiment on whether the correct answer was given (N = 12). with one blindfolded sighted user (NH) who had approxi- mately 6 hours of self-guided practice in using the device, largely to navigate the corridors near the laboratory. The task Experiment 3 used an earlier hardware configuration than was to localize a plate, ∼ 30 cm (17◦) in diameter, held Experiments 1 and 2. The output of the B&K Nexus condi- at one of 9 positions relative to the user (see Figure 5). In tioning amplifier was fed into a NIDAQ USB-9201 acquisition each of 100 trials, the plate was held at a randomly selected device for digitization. Ultrasonic audio was output using an position at a distance of 1 m, or removed for a 10th “absent” ESI GIGAPORT HD soundcard. The temporal magnification condition. Each of the 10 conditions was selected with equal factor m was set to 20. The backpack used was different, and probability. The grid of positions spanned 1 m on a side, power was provided by extension cord. NH did not participate such that the horizontal and vertical offsets from the center in Experiments 1 or 2. position subtended ∼ 18◦. The participant kept their head motionless during the task. Responses consisted of a verbal IV. EXPERIMENTAL RESULTS report of grid position. After each response the participant was given feedback on the true position. The experiment took A. Experiment 1 place in a furnished seminar room, a cluttered echoic space. Laterality judgments were robustly above chance for both Before doing the task the subject had ∼ 20 minutes of informal pinnae (mean 75.4% correct, p  0.001, n = 260, two- experience of plate localization on previous days, though not tailed binomial test, Bonferroni-Holm multiple comparison with the same plate locations or device settings as used in the correction over 6 tests) and no-pinnae conditions (mean 86.2% experiment. correct, p  0.001, n = 260, two-tailed binomial test, 5

Bonferroni-Holm multiple comparison correction over 6 tests), n = 93, two-tailed binomial test, Bonferroni-Holm multiple indicating that the binaural echo input produced reliable, comparison correction over 4 tests). Figure 5f shows the intuitive cues for left-right judgments. Depth and elevation confusion matrix collapsed over the horizontal dimension, judgments, however, proved more difficult; performance on thus showing how well the subject estimated position in the both tasks was not different from chance for the group. The vertical dimension. The vertical position of the plate was presence or absence of the artificial pinnae did not significantly correctly reported 68% of the time, significantly above chance affect performance in any of the three tasks. Population and performance (p  0.001, n = 93, two-tailed binomial test, single-subject results are shown in Figure 4b-c. Bonferroni-Holm multiple comparison correction over 4 tests).

B. Experiment 2 V. DISCUSSION Results for laterality and elevation judgments replicated In Experiments 1 and 2, we found that relatively precise those from Experiment 1: strong above-chance performance spatial discrimination based on echolocation is possible with for laterality in both pinnae (76.7% correct, p  0.001, little or no practice in at least two of three spatial dimensions. n = 240, two-tailed binomial test, Bonferroni-Holm multiple Echoic laterality cues were clear and intuitive regardless of comparison correction over 6 tests) and no-pinnae (83.3% feedback, while echoic distance cues were readily discrim- correct, p  0.001, n = 240, two-tailed binomial test, inable with feedback. Depth judgments without feedback were Bonferroni-Holm multiple comparison correction over 6 tests) characterized by very large variability compared to the other conditions. Because there appeared to be little benefit from tasks: performance ranged from 0 - 100% (pinnae) and 10 - feedback for these judgments, we conclude that it may be 100% (no-pinnae) across subjects. This suggests the presence unnecessary for laterality judgments. Performance was still at of a cue that was discriminable but nonintuitive without trial- chance for elevation, indicating that feedback over the course by-trial feedback. While we did not vary distances paramet- of a single experimental session was insufficient for this task. rically, as would be necessary to estimate psychophysical However, performance on depth judgments improved thresholds, our results permit some tentative observations markedly over Experiment 1, with group performance above about the three-dimensional spatial resolution achieved with chance for both pinnae (70% correct, p  0.001, n = 240, artificial echolocation. [24] tested echolocation in unaided two-tailed binomial test, Bonferroni-Holm multiple compar- humans. Direct comparison is difficult, as [24] tested absolute ison correction over 6 tests) and no-pinnae (68.3% correct, rather than relative laterality and included a third “center” p  0.001, n = 240, two-tailed binomial test, Bonferroni- condition. However, the reflecting object in that experiment Holm multiple comparison correction over 6 tests) conditions. was a large rectangular board subtending 29◦ × 30◦, such Performance ranges were also lower (smaller variance) for that lateral center-to-center separation was 29◦, whereas disc depth judgments compared to Experiment 1, suggesting that positions in the present study were separated by a smaller feedback aided a more consistent interpretation of depth cues. amount, 19.9◦, and presented less than 10% of the reflecting As in Experiment 1, the presence or absence of the artificial surface. [24] reported ∼ 60 - 65% correct laterality judgments pinnae did not significantly affect performance in any of the for sighted subjects which is somewhat less than our measures three tasks. Population and single subject results are shown in (they reported ∼ 75% for blind subjects). Hence, this suggests Figure 4d-e. an increase in effective sensitivity when using artificial echo cues provided by the Sonic Eye. C. Experiment 3 Depth judgments were reliably made at a depth difference of As illustrated in the spatially arranged confusion matrix 47 cm in Experiment 2, corresponding to an unadjusted echo- of Figure 5b, the correct position was indicated with high delay difference of ∼ 2.8 ms, or ∼ 69 ms with a dilation factor probability. Overall performance was 48% correct, signifi- of 25. A 69-ms time delay is easily discriminable by human cantly greater than a chance performance of 10% (p  0.001, listeners but was only interpreted correctly with feedback, n = 100, two-tailed binomial test, Bonferroni-Holm multiple suggesting that the distance information in the echo recordings, comparison correction over 4 tests). For all non-absent trials, although initially nonintuitive, became readily interpretable 72% of localization judgments were within one horizontal with practice. Our signal recordings included complex rever- or one vertical position of the true target position. Figure berations inherent in an ecological, naturalistic environment. 5d shows the confusion matrix collapsed over spatial po- Thus the discrimination task was more complex than a simple sition to show only the absence of presence of the plate. delay between two isolated sounds. The cues indexing auditory The present/absent state was reported with 98% accuracy, depth include not only variation in pulse-echo timing delays, significantly better than chance (p < 0.001, n = 100, two- but also differences in overall reflected energy and reverber- tailed binomial test, Bonferroni-Holm multiple comparison ance which are strongly distance-dependent. In fact, as cues correction over 4 tests). Figure 5e shows the confusion matrix produced by active echoes, discrete pulse-echo delays are not collapsed over the vertical dimension (for the 93 cases where typically encountered by the human auditory system. Single- the plate was present), thus showing how well the subject subject echoic distance discrimination thresholds as low as estimated position in the horizontal dimension. The horizontal ∼ 11 cm (∼ 0.6 ms) have been reported for natural human position of the plate was correctly reported 56% of the echolocation [6]. Thus, it is likely that training would improve time, significantly above chance performance (p  0.001, depth discrimination considerably, especially with time-dilated 6

(a) (b)

(c) (d) (e) (f) Fig. 5. Ten-position localization in one trained participant. A subject was asked to identify the position of an ∼ 30 cm plastic plate held at 1 m distance. (a) Schematic illustration of the 10 possible configurations of the plate, including nine spatial locations and a tenth ‘absent’ condition. (b) Spatially arranged confusion matrix of behavioral results. Each sub-figure corresponds to a location of the plate, and the intensity map within each subfigure indicates the fraction of trials the subject reported each position for each plate location. Black corresponds to a subject never indicating a location, and white corresponds to a location always being indicated. Each sub-figure sums to 100%. (c) The same confusion matrix, plotted without regard to spatial position. The x-axis shows the true position and the y-axis shows the reported position. The position labels are A: absent, UL: up-left, UC: up-center, UR: up-right, CL: center-left, CC: center-center, CR: center-right, DL: down-left, DC: down-center, DR: down-right. Grayscale intensity as in (b) indicates the percentage of times each position is reported for a particular true position — each column sums to 100%. (d) Confusion matrix grouped into plate absent and present conditions. (e) Confusion matrix grouped by horizontal position of the plate. (f) Confusion matrix grouped by vertical position of the plate. echo information, in theory down to ∼ 0.5 cm with 25-fold the behavioral envelope of Sonic Eye-aided echolocation, we dilation. consider the results presented here as encouraging. Specifi- Performance was low on the elevation task in both pin- cally, they suggest that performance on behaviorally relevant nae and no-pinnae conditions. It is possible that the echo tasks is amenable to training. Informal observations with two recordings do not contain the elevation information necessary further participants suggest an ability to navigate through for judgments of 16◦ precision. However, our tasks were hallways, detecting walls and stairs, while using the Sonic expressly designed to assess rapid, intuitive use of the echo Eye blindfolded. A sample of audio and video from the user’s cues provided, while the spectral cues from new pinna take perspective is provided in the supplemental video to this time to learn; judgment of elevation depends on pinna shape manuscript, also available at http://youtu.be/md-VkLDwYzc. [14], [25], and weeks of adaptation can be required to recover Any practical configuration such as that tested in Exper- performance using a modified pinna [22]. Finally, the design iment 3 should minimize interference between echolocation and construction of the artificial pinnae used in the present signals and environmental sounds (e.g., speech or approaching experiment may benefit from refinement to optimize their vehicles). To this end, open-ear headphones ensure that the spectral filtering properties. ear remains unobstructed, as described in Section 2. However, In line with these observations, Experiment 3 suggests that future testing should include evaluations of auditory perfor- both vertical and horizontal localization cues were available to mance with and without the device, and training designed a user with a moderate amount of training. This is qualitatively to assess and improve artificial echolocation in a naturalistic, consistent with previous measures of spatial resolution in acoustically noisy environment. blind and sighted subjects performing unaided spatial echolo- We note that performance on the experiments reported here cation tasks [7], [26]. While further research is needed to likely underestimates the sensitivity achievable by the Sonic validate such comparisons and, more generally, characterize Eye for several reasons. For example, all experiments featured 7 a constraint under which the head was fixed, either virtually [7] C. Rice, “Human Echo Perception,” Science, vol. 155, no. 3763, pp. or in reality, relative to the target object. This would not 656–664, 1967. [8] S. Teng, A. Puri, and D. Whitney, “Ultrafine spatial acuity of blind expert apply to a user in a more naturalistic context. Additionally, human echolocators.” Experimental Brain Research, vol. 216, no. 4, pp. the participants tested were sighted and relatively untrained; 483–8, Feb. 2012. blind and visually impaired users may benefit from superior [9] T. Papadopoulos and D. Edwards, “Identification of auditory cues uti- lized in human echolocation – Objective measurement results,” Biomed- auditory capabilities (e.g., [27], [28], [29]). Finally, ongoing ical Signal Processing and Control, 2011. development of the prototype continues to improve the quality [10] L. Thaler, S. R. Arnott, and M. A. Goodale, “Neural correlates of natural of the emitted, received, and processed signal. human echolocation in early and late blind echolocation experts.” PloS One, vol. 6, no. 5, p. e20162, Jan. 2011. [11] O. Hoshino and K. Kuroiwa, “A neural network model of the inferior VI.SUMMARY AND CONCLUSION colliculus with modifiable lateral inhibitory synapses for human echolo- cation,” Biological Cybernetics, 2002. Here we present a prototype assistive device to aid in [12] B. Schenkman and G. Jansson, “The detection and localization of objects navigation and object perception via ultrasonic echoloca- by the blind with the aid of long-cane tapping sounds,” Human Factors: tion. The ultrasonic signals exploit the advantages of high- The Journal of the Human Factors and Ergonomics Society, 1986. [13] J. Brazier, “The Benefits of Using Echolocation to Safely Navigate frequency sonar signals and time-stretch them into human- Through the Environment,” International Journal of Orientation and audible frequencies. Depth information is encoded in pulse- Mobility, vol. 1, 2008. echo time delays, made available through the time-stretching [14] J. Middlebrooks and D. Green, “ by Human Listen- ers,” Annual Review of Psychology, 1991. process. Azimuthal location information is encoded as inter- [15] P. Zahorik, “Auditory distance perception in humans: A summary of aural time and intensity differences between echoes recorded past and present research,” Acta Acustica united with Acustica, vol. 91, by the stereo microphones. Finally, elevation information is no. February 2003, pp. 409–420, 2005. [16] H. Riquimaroux, S. Gaioni, and N. Suga, “Cortical computational maps captured by artificial pinnae mounted to the microphones as control auditory perception,” Science, 1991. direction-dependent spectral filters. Thus, the device presents [17] T. Ifukube, T. Sasaki, and C. Peng, “A blind mobility aid modeled after a three-dimensional auditory scene to the user with high echolocation of bats,” IEEE Transactions on Biomedical Engineering, 1991. theoretical spatial resolution, in a form consistent with natural [18] L. Kay, “Auditory perception of objects by blind persons, using a spatial hearing. Behavioral results from two experiments with bioacoustic high resolution air sonar,” The Journal of the Acoustical naive sighted volunteers demonstrated that two of three spatial Society of America, 2000. [19] P. Mihajlik and M. Guttermuth, “DSP-based ultrasonic navigation aid dimensions (depth and laterality) were readily available with for the blind,” Proceedings of the 18th IEEE Instrumentation and no more than one session of feedback/training. Elevation infor- Measurement Technology Conference, 2001. mation proved more difficult to judge, but a third experiment [20] D. Waters and H. Abulula, “Using bat-modelled sonar as a navigational tool in virtual environments,” International Journal of Human-Computer with a single moderately trained user indicated successful use Studies, 2007. of elevation information as well. Taken together, we interpret [21] A. King and O. Kacelnik, “How plastic is spatial hearing?” Audiology these results to suggest that while some echoic cues provided and Neurotology, 2001. [22] P. Hofman, J. V. Riswick, and A. V. Opstal, “Relearning sound local- by the device are immediately and intuitively available to ization with new ears,” Nature Neuroscience, 1998. users, perceptual acuity is potentially highly amenable to [23] S. Teng, J. Sohl-Dickstein, C. Rodgers, M. DeWeese, and N. Harper, training. Thus, the Sonic Eye may prove to be a useful assistive “A device for human ultrasonic echolocation,” IEEE Workshop on Multimodal and Alternative Perception for Visually Impaired People, device for persons who are blind or visually impaired. ICME, 2013. [24] A. Dufour, O. Despres,´ and V. Candas, “Enhanced sensitivity to echo cues in blind subjects,” Experimental Brain Research, 2005. ACKNOWLEDGMENTS [25] G. Recanzone and M. Sutter, “The biological basis of audition,” Annual The authors would like to thank Jeff Hawkins for support Review of Psychology, 2008. [26] S. Teng and D. Whitney, “The Acuity of Echolocation: Spatial Reso- and feedback in construction and design, Aude Oliva for lution in Sighted Persons Compared to the Performance of an Expert comments on an earlier version of this manuscript, David Who Is Blind,” Journal of & Blindness, no. January, Whitney for providing space, equipment and oversight for 2011. [27] N. Lessard, M. Pare,´ F. Lepore, and M. Lassonde, “Early-blind human behavioral tests, Ervin Hafter for allowing use of his anechoic subjects localize sound sources better than sighted subjects.” Nature, chamber, Alex Maki-Jokela for help with hardware debugging vol. 395, no. 6699, pp. 278–80, Sep. 1998. and software design, and Jonathan Toomim for work develop- [28] B. Roder,¨ W. Teder-Salejarvi, and A. Sterr, “Improved auditory spatial tuning in blind humans,” Nature, vol. 400, no. July, pp. 162–166, 1999. ing and building the next generation prototype. [29] P. Voss, F. Lepore, F. Gougoux, and R. Zatorre, “Relevance of spectral cues for auditory spatial processing in the occipital cortex of the blind,” REFERENCES Frontiers in Psychology, 2011. [1] G. Jones, “Echolocation.” Current Biology : CB, vol. 15, no. 13, pp. R484–8, Jul. 2005. [2] A. Grinnell, “Hearing in bats: an overview,” Hearing by Bats, 1995. [3] A. Popper, “Sound emission and detection by delphinids,” Cetacean behavior: Mechanisms and functions, 1980. [4] W. Au and D. Pawloski, “Cylinder wall thickness difference discrim- ination by an echolocating Atlantic bottlenose dolphin,” Journal of Comparative Physiology A, 1992. [5] M. Supa, M. Cotzin, and K. Dallenbach, “‘Facial Vision’: The Percep- tion of Obstacles by the Blind,” The American Journal of Psychology, vol. 57, no. 2, pp. 133–183, 1944. [6] W. N. Kellogg, “Sonar system of the blind.” Science (New York, N.Y.), vol. 137, no. 3528, pp. 399–404, Aug. 1962.