Naturally occurring auditory-visual experienced under dark adaptation Anupama Nair1,2 Advisor: David Brang, PhD1 Co-assessor: Romke Rouw, PhD2 1University of Michigan, 2University of Amsterdam

Abstract Synesthesia is a perceptual phenomenon in which stimulation of one sensory modality evokes additional experiences in an unrelated modality (e.g., sounds evoking colors). This condition is thought to arise from increased connectivity between associated sensory areas. However, non-synesthetes can experience these sensations via hallucinogens or as a result of brain damage, raising the possibility that synesthesia exists as a latent feature in all individuals, manifesting only when the balance of activity across the senses has been altered. Indeed, multisensory connections are present in all individuals that support the processing of dynamic auditory, visual, and tactile information present in the environment, but it is thought that inhibition of these pathways and the presence of dominant bottom-up information prevents normal multisensory interactions from evoking the subjective experience of synesthesia. The present research explores the conditions necessary to evoke auditory-visual synesthetic experiences in non-synesthetes. First, subjects performed a visual-imagery task in a visually deprived environment while simultaneously being presented with startling sounds from two spatial locations at random, infrequent intervals. The visual imagery task served to increase top-down feedback to early visual areas and from previously conducted pilot studies, startling sounds were found to be more effective in over-stimulating the multisensory network present in all individuals. Visual synesthetic percepts, evoked by startling sounds, were observed in ~60% of our non synesthetic subjects across several behavioural experiments. To identify the neural correlates of this phenomenon, we conducted an EEG study to explore differences in early visual areas for trials in which the participants experienced hallucinatory percepts vs. when they reported no such experiences. The EEG signals reflected a difference in average ERP activity for the two conditions within 100 ms of sound exposure implying differential activation for the presence of hallucinatory experiences versus its absence. Across all experiments, subjects reported seeing visual images (vivid colors and Klüver's form-constants) localized to the position of the speaker. These results indicate a higher prevalence of synesthetic experiences in the general population and a link to normal multisensory processes.

INDEX

Introduction 2

EXPERIMENT I 6 1. Methods 6 2. Phase I 7 3. Phase II 8 4. Analysis procedure 10 5. Results 10 EXPERIMENT II 14 1. Methods 14 2. Results and analysis 15 EXPERIMENT III 19 1. Methods 19 2. Results and analysis 20 EEG STUDY 23 1. Methods 23 2. Behavioural data results 23 3. EEG results and analysis 25 a. Event Related Potentials (ERPs) 25 b. Spectral-power analysis 26 c. Phase analysis 27 Discussion 29 Conclusion 30 References 30

Appendix 35

1

Introduction Interactions between different sensory modalities have been a topic of avid interest in Psychology and other sciences, with a substantial amount of research devoted to discovering possible connections between the auditory and visual systems in humans and animals. Auditory and visual stimuli presented concurrently or in close proximity to each other interact with each other uniquely to produce a unified sensory experience. For example, the 'Ventriloquism effect' demonstrates the dominance of visual cues over auditory ones in localization tasks (Choe, Welch, Gilford & Juola, 1975), and throws light on how the two temporally or spatially discrepant senses interact or supersede each other to produce a coherent, unified experience. Similar studies using pairs of discrepant visual, auditory, and proprioceptive information have been conducted by Pick, Warren & Hay (1969) who were interested in determining the biasing influence of one modality on another; these researchers found visual information to exert biasing effects on localization of auditory stimuli, in their studies. Similarly, based on their findings, Bertelson and Aschersleben (1998) report that a sound can be mislocalized (or dragged across space) to coincide with a visual target in a dark room, even with instructions to disregard the visual stimulus. In contrast, the beep-flash or the "fission" illusion shows how auditory cues gain precedence over visual cues and possibly alter in a temporally close context (Innes-Brown & Crewther, 2009). That is, multiple beeps presented close in succession have the power to induce detection of multiple 'illusory' flashes, even if only a single flash is presented (Shams, Kamitani, & Shimojo, 2000). These studies highlight the conditions under which miscellaneous sensory information can interact with and influence information presented through alleged dominant sense modalities. Neurophysiological studies conducted on animals have indicated that auditory stimuli can modulate visual cortex activity alongside active visual stimulation (Brang, Towle, Suzuki et al., 2015). For example, research by Allman and Meredith (2007) has found auditory stimuli to exert multimodal influences on otherwise visually- responsive neurons of the cat posterolateral lateral suprasylvian (PLLS) visual area, but only in the presence of other visual stimuli. Recent research has also suggested a mirror effect in humans, with auditory stimuli being capable of modulating activity evoked as a result of visual stimulation, in early visual cortex (Mercier et al., 2013).

In line with this finding, McDonald and colleagues (2013) found that peripherally presented salient sounds can activate contralateral occipital neurons, and the associated ERP component was called auditory-evoked contralateral occipital positivity (ACOP). They found the ACOP to arise from ventral visual areas in the occipital cortex, which is also the source of the visually evoked P1 component. Moreover, they also found the lateral-sound induced ACOP to resemble the response generated by visual stimuli, strengthening the argument for multimodal influences on so-called unimodal neurons. Thus, their findings point to an enhanced activation of the contralateral visual cortex in response to peripherally presented sounds, suggesting that these sound-related visual effects are experienced in the ipsilateral (to the sound) side.

Research has also found a facilitative effect of sounds on , under certain conditions. For example, it has been found that simultaneously presented auditory stimuli can enhance visual sensitivity of targets, specifically low-intensity targets (Noesselt et al. 2010). Some research has also reported faster perceptual and motor responses to visual targets in the presence of concurrent auditory stimulation (Cappe et al. 2010; Brang et al. 2013). For example, Miller (1982) describes the “redundant signals effect” which posits quicker reaction times to

2 bimodal signals (“redundant signal trial”) as compared to unimodal signals (“single signal trial”) which could explain enhanced visual processing with concurrent auditory stimulation. The reverse effect is also reported in literature i.e. the enhancement of auditory stimulus processing by attending to a visual stimulus (Bulkin & Groh, 2006). Specifically, ERP and fMRI studies have shown that visual stimuli presented concurrently with auditory tones can enhance processing of the auditory stimulus, despite a discrepancy in signal locations,, provided that the visual stimuli are being attended to (Busse, Roberts, Crist, Weissman, & Woldorff, 2005).

These multisensory interactions occur as a function of three important rules. The spatial rule suggests more effective integration of multisensory stimuli originating in a seemingly similar origin (Meredith & Stein, 1985). The temporal rule states that multisensory stimuli presented at the same time or approximately the same time tend to be integrated (Meredith & Stein, 1983), Studies have also shown that bimodal activation of certain neurons through stimuli presented in close temporal and spatial proximity to each other can exceed the sum of the neuronal response to each of the unimodal stimuli. In other words, stimuli coincident in space and within receptive fields can lead to neuronal response enhancement of the stimulus (Meredith & Stein, 1986; Frassinetti, Bolognini, & Làdavas, 2002). This enhancement is amplified when unimodal stimuli would have in turn elicited relatively weak responses. This rule is called the law of inverse effectiveness (Meredith & Stein, 1983; Stein & Meredith, 1996).

While multisensory connections exist in all individuals to facilitate sensory processing, certain individuals experience hyperactive state of connections between certain sensory regions that often result in multimodal processing of unimodal stimuli. This condition, labeled "synesthesia" results in consistent multisensory perceptual experiences with otherwise unisensory stimuli (Aleman et al., 2001; Sagiv & Ward, 2006). According to Cytowic (1995), "synesthesia is the involuntary physical experience of a cross-modal association." While previously thought to be a relatively rare phenomenon, some studies have found synesthesia to occur more frequently previously imagined. For example, Simner et al. (2006) conducted two surveys testing for the prevalence of chromatic-grapheme synesthetic traits in normal individuals; the results from their survey indicated the prevalence of synesthesia in normal populations to be 88 times more likely than originally expected, suggestive of its widespread occurrence.

To understand what leads to the emergence of synesthetic experiences, different accounts have been proposed. Ward and Simner (2005) describe one such account put forth by Baron- Cohen et al. (1996) which regards synesthesia to be a consequence of a single dominant gene inherited via the X chromosome. They further elaborate on Baron-Cohen's (1996) notion about neonates carrying synesthetic traits that eventually suffer extinction due to programmed cell death (apoptosis); however, synesthetes continue to possess these pathways, giving rise to their unique perceptual experiences (Ward & Simner, 2005). Hubbard (2007) reports the occurrence of this type of synesthesia (which he classifies as congenital synesthesia) to be high in families. An alternative explanation offered by Ward and Simner (2005) consists of anomalous connections between unisensory regions or between multisensory and unisensory regions, that could result in altered perceptual experiences. In contrast, Marks (1975) describes a "learned" form of synesthesia, based on his analysis of research in this field, which could occur as a result of repeated associated experiences between stimuli early in life.

3

Synesthesia can also be classified on the basis of its occurrence: Ward and Meijer (2010) report developmental synesthesia to be genetically determined and persisting throughout the lifespan, predisposing the individual to perceive the world differently, akin to congenital synesthesia mentioned above. Synesthesia can also be the result of brain injury, trauma or an underlying neurological disturbance (acquired synesthesia) (Afra, Funke, & Matsuo, 2009). E.g.: auditory-evoked synesthetic visual experiences were reported by a man with intact visual pathways and no apparent case of visual dysfunction. Yet, sounds of certain intensity levels produced distinct visual experiences for the man, only when the sounds were lateralized to the left. He also experienced mild left-eye visual sensations with left-ear stimulation. Further examinations revealed a cancerous growth extending to his midbrain that seemed to cause such striking synesthetic experiences. Excising the tumor mass put an end to such experiences. Similar reports have described auditory-visual synesthetic experiences in people with dysfunctional visual pathways or optic disorders, and migraine-like conditions (Lessell & Cohen, 1979; Podoll & Robinson, 2002; Jacobs, Karpik, Bozian, & Svend, 2016). Moreover, these visual effects were found to be ipsilateral to the perceived source of the sound (Afra et al., 2009). Synesthetic experiences have also been reported with the use of certain ; for example, in a study by Hartman and Hollister (1963), subjects administered with doses of LSD, or psilocybin reported heightened color experiences even in reaction to irrelevant, mildly provoking stimuli such as pure tones. (Hartmann & Hollister, 1963). According to Afra et al. (2009), LSD- induced synesthesia is the indirect consequence of deprived visual input in the occipital cortex. Such -induced synesthesia has been classified as induced synesthesia (Afra et al., 2009). Marks (1975) has also linked the occurrence of synesthetic experiences with the use of drugs such as mescaline and hashish in non-synesthetic subjects, claiming such experiences to arise from an enhancement of cross-modal representations under the influence of these drugs. Moreover, being relaxed, drowsy or sedated (or similar hypnagogic states) aid synesthetic experiences, as briefly noted in a review by Sinke et al. (2012). Such hypnagogic visions were documented by Carol Steen (Steen, 2001), a synesthetic artist who recently reported kaleidoscopic visions while on the verge of sleep (Steen, 2017). An excerpt from her article is as follows:

" Two weeks later I noticed that when I shut my eyes I saw visions that Peter and Marie-Hélène would later call ‘Mandalas’ (Brook and Estienne, 2014). They just began. But I could only see these quickly moving images with my eyes shut. They could appear on different occasions with no warning. At first, I saw them just before I fell asleep. Many times, when trying to go to sleep I’d be flooded by them. Some nights I had to open my eyes to make them stop. I also noticed I could shut my eyes and see these kaleidoscoping visions when I was in the shower." (Steen, 2017)

She also experienced these visions in reaction to loud and startling sounds -

"Extremely loud, unexpected and startling sounds or sensations can produce visions that I see externally and that I may also feel as compression waves through my body." (Steen, 2001)

The impact of loud sounds in eliciting visual sensations (also called "" or "photisms") has also been demonstrated by Page, Bolger, & Sanders (1982) in their patient reports. Five of their patients with some form of optic neuropathy complained of

4 occurrences in the affected eyes when exposed to loud, startling sounds. All patients reported the phosphene experiences to be particularly enhanced when the sounds were startling and least expected, and when they were sufficiently drowsy. Their phosphenes resembled bright flashes of light, sometimes colored or as vivid patterns (Page et al., 1982).

These photisms often assume certain shapes that are more commonly experienced than some others. Kluver (1996) identified these images and organized them into four categories or 'form constants' (Bressloff, Cowan, Golubitsky et al., 2002) including "gratings and honeycombs, cobwebs, tunnels, and cones and spirals", based on his investigations on mescaline (Cytowic, 1996).

(1) Small circles, clusters, amorphous blobs

(2) Central radiation, radial symmetry, kaleidoscope

(3) Grids, fretwork

(4) Geometric lines: straight, angular, circular

(5) (a) Scintillation, extrusion (b) iteration (c) movement (d) rotation, spiraling

Fig. (1) Examples of Kluver-form constants (figures as presented in Cytowic, 1996)

Moreover, in line with McDonald and colleagues' (2013) findings mentioned previously in this paper, these photisms are reported to be experienced in the eye ipsilateral to the ear where the sounds were heard (Vike et al., 1984; Afra et al., 2009; Jacobs et al., 2016).

5

In summary, most research aimed at understanding hyperactive cross-modal interactions is carried out on synesthetic populations, those with underlying neural disorders or under the influence of certain substances. Nevertheless, fewer studies have looked into the existence of possible synesthetic traits in neurotypical populations, with no such underlying conditions or history of drug use. Our interest in this study lies in exploring and evoking possible synesthetic experiences in non-synesthetic populations, free of the use of hallucinogenic drugs to better understand those conditions that call for manifestation of otherwise latent synesthetic traits in non-clinical populations. Specifically, if multisensory connections exist in all individuals, why does a fragment of the population experience synesthetic experiences, while others do not? We are interested in understanding how certain auditory stimuli can evoke conscious visual percepts in non-synesthetic populations as opposed to mere modulation of visual cortex activity, suggested in previous literature. This is important because it brings us closer to understanding if synesthesia is indeed an inherent condition that affects just a few, or if it is merely the product of conducive environmental conditions. It could also possibly explain the occurrence of ubiquitous cross-modal experiences such as sudden burst of colors in reaction to startling sounds while falling asleep.

In line with the review above, the present research aimed at testing the effectiveness of the following conditions in evoking visual hallucinatory experiences: (1) a dark environment (2) absence of external visual input (achieved by keeping the eyes closed during the study) (3) a visual imagery task (4) random presentation of startling sounds. Therefore, we designed our first experiment keeping with the abovementioned requirements.

EXPERIMENT I Our aim in experiment I was to explore how sounds can evoke conscious, hallucinatory sensory experiences in non-synesthetic subjects. In other words, we wanted to test if startling sounds presented under carefully chosen conditions can lead to vivid, hallucinatory visual percepts in non-synesthetes. To achieve this objective, we engaged participants in an imagination task (so as to achieve mild visual cortex activation) in a dark environment (to inhibit activation by external cues) while presenting them with startling sounds periodically. We ensured that the imagination task was adequate enough to engage their visual cortex and capture their attention and hypothesized that the sounds can add on to this activation resulting in visual sensations such as patterns, shapes and colors. We did not expect sounds from any one direction to be more effective in evoking percepts than sounds from another direction.

Methods Participants Twenty-five participants (twelve males, thirteen females; mean age 18.72 years) who were undergraduate students at University of Michigan participated in the experiment. Of these, two subjects were left-handed, while the remaining twenty-three were right-handed. An additional four participants were tested but their data were not included for analysis; two of these participants were excluded for reasons of them being likely synesthetes (as assessed through debriefing questionnaire responses) and data from two participants were lost due to technical failures. They were unaware of the hypotheses of the study prior to testing. They all gave informed consent to participate in the study and received course credits for their participation. The present study as well as all subsequent ones were approved by the ethics committee (IRB) of

6 the University of Michigan and the participants were tested in compliance with all applicable rules and regulations regarding human participant research.

Phase I: Training Materials and set-up For the training phase of the experiment, a sheet containing the abstract figures along with the four probe names ("curve","close","diag","sym") used in Thompson, Kosslyn, Hoffman, & Van der Kooij's (2008) experiment (henceforth referred to as the Kosslyn task) was prepared. Moreover, a sheet with columns of uppercase, bold, black letters of the Roman Alphabet was also created, akin to Thompson et al.'s (2008) study.

Fig. (2) Examples of the abstract figures used in Thompson, Kosslyn, Hoffman, & Van der Kooij's (2008) and the present experiment.

The start of the training phase involved exposing participants to the abstract figures along with the probe names. Participants were given verbal descriptions of each of the four probe names and were then asked to match all fourteen figures with the corresponding probes, based on whether it contained the property described by the probe. Once the participants' responses were assessed for accuracy, they were then presented with the sheet containing the Roman Alphabet letters and asked to study and imagine the typeface of the presented letters as best as they could for a later part of the experiment.

Stimuli and procedure The participants were once again tested on their understanding of the probes using the same figure stimuli through a computerized session. They were seated ~30" across from the computer screen (24", resolution: 1920 x 1080 px), facing the LCD monitor. This testing session was programmed in Python 2.5.2 using OpenSesame 3.0 on Windows 7. The session began with a set of instructions which remained on screen till a keypress, followed by a white fixation dot (3 px) in the center of the screen (512 px x 384 px) for 1000 ms against a black background. The figure stimulus was then displayed as a jpg image (242 px x 101 px) at the screen center for 3000 ms followed by a new canvas containing the probe name (e.g.: "curve?", font: mono, 100 px) at the screen center, which stayed on screen until keypress. The figure stimulus and the probe name were both selected at random, and all fourteen figure-probe pairs were shown consecutively in the same manner. Participants had to respond with the keyboard press "Y" if they believed the presented probe was a property present in the displayed figure, or keyboard press "N" if they believed there was a mismatch. The program gave them feedback after each trial in the form of a 1000 ms colored fixation dot (white for correct responses, red for incorrect responses).

Phase II: Experimental phase Materials and set up The main experiment was programmed in MATLAB using Psychtoolbox-3 (Brainard, 1997) on Windows 7 for stimulus presentation. Participants were instructed to respond via button press using a Cedrus Response Box model RB-834, which was also used to record their responses and mark their reaction times. Speakers were placed 16" from the computer monitor on either side at

7

45° angles. The participants were seated ~30" across from the computer monitor, so as to be equidistant from each of the speakers and were handed the Cedrus Response Box to make their responses. They were given a detailed set of verbal instructions by the experimenter prior to the start of the experiment. To attain a visually deprived environment, all light sources were either blocked or attenuated to ensure a relatively light-absent environment, following the participant's consent. At the start of the experiment, the computer monitor was turned off to prevent exposure to light emanating from the monitor.

Stimuli and procedure: The end of the testing session marked the start of the main experiment. The stimuli used in the experiment were auditory recordings of probe names and Roman alphabet letters. Thus, each trial consisted of a randomly- selected spoken probe name paired with a random letter (e.g.: "Diag F"). There were 104 trials in the experiment, each trial consisting of a probe-letter pair with an approximate 5000 ms gap between each trial. On randomly chosen 26 of the 104 trials, a short, sharp beep (10,000 Hz sine wave lasting for 10 ms sampled at 44100 Hz) was presented between 250 and 750 ms from the start of the trial through either of the speakers following the presentation of the probe-letter pair. The intensity of the beep sound was set so as to achieve an optimal startled response. The stimulus volume was not measured in experiment I but the sound stimuli used in subsequent experiments were roughly equated in volume to the stimuli in experiment I. The onset of the beep in each trial was temporally jittered across a broader time range to isolate the effect of the beeps during different stages of the imagery process (before start of visual imagery, during or after). To ensure uniformity across beep locations, each speaker presented 50% of the total beep sounds trials.

8

(a) Screen which preceded the start of the experiment

(b) letter-property stimulus presented through both speakers

(c) beep sound presented either from the left or right speaker (On a beep trial)

(d) ITI = 5000 ms (participants' response)

Fig. (3).Task set-up. (a) The first screen in the figure preceded the start of the trial, informing the participant to anticipate an auditory stimulus. (b) A randomly chosen probe-letter pair was presented to the participant through the speakers. (c) In case of a beep trial, a sharp, brief beep (10,000 Hz sine wave) was presented through either the left or right speaker, lasting for 10 ms. The beep was presented anytime between 250 and 750 ms from the start of the trial. On a no beep trial, this step was missed. (d) An inter-trial interval of 5000 ms followed during which participants responded to the probe-letter pair question and possibly indicated the occurrence of a visual sensation by pressing one of the Cedrus response box keys. All responses were recorded along with their respective latencies.

Prior to the start of the experiment, participants were instructed to maintain a steady posture and keep their eyes closed throughout the experiment. On being presented with a probe- letter pair, they were instructed to imagine the letter in the manner in which they studied them during the training phase and evaluate if the probe property was present for the letter. They were to hit designated keys on the Cedrus response box to indicate the presence or lack thereof of the probe in the letter. They were also informed about the randomly occurring beeps and were asked to press a specific button on the response box if they sensed any visual experiences, with their eyes closed, in reaction to the beep sound. The participants were alerted to the possibility of experiencing vivid visual percepts, but were not given details as to why these may occur. They were only asked to hit the button if they were confident of having experienced these sensations. The experimenter stayed with the participant for the first four trials to ensure comfort with the

9 experimental set-up, and then left the room. At this point, the participant was exposed to a long set of recorded auditory instructions (~5 minutes) while they still had their eyes closed, intended to achieve maximal dark adaptation, after which the experimental trials continued as usual.

At the end of the 104 trials, the participants filled out two debriefing questionnaires: Vividness of Mental Imagery Questionnaire (VVIQ) to assess imagery abilities on a daily basis and a customized questionnaire relating to experiences during the experiment (see appendix).

Analysis procedure The subjects' responses were assessed in terms of accuracy to the probe-letter questions and reaction times (in terms of button press events) to ascertain that they were focused on the task. Our reasoning was that a high accuracy rate would imply meaningful engagement with the task, since it would indicate a high level of mental imagery activity by the participants.

According to our hypothesis specified earlier in the paper, we also expected participants to experience visual sensations or photisms in reaction to the startling sounds and not otherwise. In other words, we expect the frequency of 'photism button press' events to be higher for the beep vs. no beep trials. The photism data obtained from the participants were found to be non- normal across conditions, according to the Shapiro-Wilk test of normality. Therefore, the Wilcoxin signed-rank test was used on the observed data (only for those participants that reported experiencing some visual sensations) to compare the difference between the mean proportion of photism presses in beep vs. no beep trials.

To determine if sound origin has any influence on photism experiences, the data were subject to another Wilcoxin signed-rank test to compare the difference between the proportion of photism presses for left beep vs. right beep trials. Again, this analysis was only run on those select participants' data that included photism key press responses to beep trials.

Results The mean accuracy for no beep, left beep and right beep trials across participants was 93%, 90% and 90% respectively, confirming that participants remained engaged during the task and imagined the probe-letter stimuli with a high degree of accuracy.

Of the 26 participants tested, 10 experienced some kind of visual sensations, as indicated by the corresponding button press records and debriefing interview responses. The open-ended debriefing questionnaire asked participants to report incidences of photisms, their verbal and illustrated descriptions, their confidence levels in having experienced them and the frequency of synesthetic occurrences in daily life, amongst others (e.g.: "Did you experience any visual sensation(s) during the course of the experiment (colors, shapes, textures, visual patterns, etc.)", " How confident are you that you did or did not detect any visual flashes?”," When trying to fall asleep at night, do loud or startling sounds cause you to see flashes of light, even though your eyes are closed?") Some of the subjective responses to the debriefing questionnaire included comments such as " Colors, like a flash of light", "I saw the color blue in a wavy pattern", " ... light bubbles floating while eyes were closed" that further validate their photism experiences. Moreover, participants who "saw" some visual images also reported being confident of seeing them. A mechanistic account of this phenomenon would describe this effect to be a byproduct of

10 altered thresholds of visual cortical activity, likely due to a combination of increased visual- cortical noise, and strong feedback from auditory threat-detection networks.

Beep vs. No-Beep trials The frequency of photism presses for beep vs. no beep trials was significantly different (z = - 2.501, p = 0.012, n = 10) suggesting that the photisms were more commonly experienced on beep trials, in reaction to a sharp sound stimulus, rather than arbitrarily without any trigger.

Beep sound source The left beep sounds were not any more effective than the right beep sounds in eliciting photisms, as observed in fig.(4). (z = -1.667, p = 0.096, n = 10, N.S.) as expected.

(a)

11

(b)

Fig.(4): Proportion of photism responses as a function of trial type and beep source location. Boxplots illustrating differences in conditions. The horizontal line cutting across the boxplots represents the median response and the boxpoints alongside the plots represent the individual datapoints i.e. the total proportion of photism response for each participant, for each condition. (a) X-axis represents the type of trial (no beep vs. beep) and the Y-axis represents the range of photism responses for that condition. The figure shows beep trials to elicit greater number of photism responses. A Wilcoxin signed-rank test showed a significant difference between the two conditions (z = - 2.501, p = 0.012, n = 10). (b) X-axis represents the beep source location (left vs. right) and the Y-axis represents the range of photisms reported. No significant differences were found in the number of photism responses for left vs. right beep trials (z = -1.667, p = 0.096, n = 10, N.S.).

Distribution of photisms across the experiment We also looked at the frequency of photisms across trials, to determine if photisms occur more frequently with increasing fatigue or boredom. Previous studies have found synesthetic experiences to occur with unexpected sounds under sedated or drowsy states and we expect fatigue levels to increase with increasing trials in our study. The nature of the task in conjunction with a darkened room creates an atmosphere conducive to sleep in our participants. Moreover, some participants reported feeling drowsy or lethargic as the experiment progressed.

Results showed a general trend for an increased number of photisms experienced towards the latter half of the experiment. However, on testing the difference between the frequency of photism button presses in the first 52 vs. last 52 trials of the experiment, using a Wilcoxin signed-rank sum test, we found the difference to be insignificant (z = -0.479, p = 0.632, n = 10, N.S.).

12

Fig.(5): Photism button press frequency across trials. A trendline representing the average number of photism button presses per 10-trial bin, across all 10 participants. The errors bars represent the standard error. A trend towards increased photism frequency was observed with increasing number of trials. The large variance among participants makes it difficult to draw conclusions about our data.

The range of photism button press frequency (for participants who pressed the button at all) varied widely across participants with some participants having experienced as many as 12 photisms throughout the experiment and some others only 1. The debriefing reports seemed to suggest that participants seemed startled by the occurrence of the beep. Their descriptions and drawings of the visual experiences matched those of Kluver-form constants, as shown below:

13

Fig.(6): Examples of drawings made by participants in reaction to the beep-sounds. Fig.(a) is an example of 'movement' form constant described by Kluver, while fig.(b), fig.(d) and fig.(h) is consistent with the 'small circles, clusters and amorphous blobs' category. Fig.(c), fig.(e) and fig(f) resemble the 'scintillation, extrusion' Kluver-forms and fig.(g) resembles a geometric Kluver-form constant.

Based on the drawings of our participants from experiment I, we classified the drawings into Kluver-form categories. This classification included only those drawings that clearly belonged to any one form category. The results of this classification were as follows: Small circles (2), amorphous blobs (4), scintillation, extrusion (4), movement (2), rotation, spiraling (1). The remaining drawings didn't exclusively fit into any one of the categories, and were excluded from this classification.

EXPERIMENT II In light of the evidence obtained from experiment I, we inferred that trials with startling sounds were more effective in evoking conscious visual experiences as compared to sound-absent trials. Based on this conclusion, we then attempted to determine if louder sounds were necessarily regarded as more startling, as compared to sounds of lesser intensity or "softer" sounds. We conducted a follow-up experiment to address this question, as well as to determine if visual percepts are more likely to be experienced in the hemifield in which the sound is perceived.

Method Participants Thirty-one participants (twenty males, eleven females; mean age 18.5 years) who were undergraduate students at University of Michigan participated in the experiment. Of these, three subjects were left-handed, while the remaining twenty-eight were right-handed. An additional four participants were tested but excluded from analysis, for reasons of them being likely synesthetes (as assessed through debriefing questionnaire responses) Moreover, an additional

14 four participants were inconsistent in their responses to the debriefing interview, i.e., they indicated seeing visual percepts during the task but contradicted this claim during the debriefing interview. Their data were therefore excluded from analysis. The participants were unaware of the purpose of the experiment and gave informed consent to participate in the study in exchange for course credits.

Procedure The materials, stimuli, set-up and procedure in experiment II resembled that of experiment I. However, there were two important differences that distinguished this experiment from the previous experiment. The first major difference pertained to the intensity of the sound levels used as beep stimuli in the experiment. Instead of setting the beep sound to a constant level (close to 70 dB) for all beep-trials, like in the previous experiment, two different kinds of beep sounds were introduced - referred to as "soft" or "loud" beeps. The soft and loud beeps were set to a sound pressure level (SPL) of 60 dB and 70 dB respectively, from the estimated position of the center of the subject's ears. The beep sound levels were manipulated so as to determine whether sound intensity modulates the frequency at which subjects experience synesthetic percepts. Although the total number of trials remained the same as in experiment I (104 trials), the number of beep trials now increased from 26 to 28. Therefore, 28 of the 104 trials were beep trials, with 14 trials being soft-beep trials and the remaining 14 being loud-beep trials. Of these 14, 50% of the beeps originated from each speaker for each beep trial type.

The second major difference from experiment I was that participants were now required to report the visual hemifield in which they experienced photisms. Therefore, using different buttons on the Cedrus response box, they could indicate the side corresponding to a photism. They also had the option to indicate the occurrence of a spatially distributed photism with a specific button press. This was done so as to determine if a photism occurs on the same side as the beep location (ipsilateral) or on the opposite side (contralateral). Here, we count ipsilateral photism trials as those beep-trials wherein participants responded with a button press corresponding to the location of the sound. Contralateral photisms are counted as beep trials in which participant responded with the opposite hemifield button press

Results and analysis Twenty-one of the thirty-one participants experienced photisms during the experiment. Of these, five participants experienced photisms both on beep and on no-beep trials, one participant exclusively on a no-beep trial and fifteen exclusively on beep trials.

Beep vs. No-Beep trials Most participants reacted to beep trials (either soft or loud) but some participants indicated experiencing photisms even on no-beep trials. On testing for significance of the difference between the mean proportions of photism frequency on the two types of trials using a Wilcoxin signed-rank test, we found the difference to be significant at p = 0.000 (z = -3.981, n = 21). This finding lends further support to the fact that visual sensations were most often experienced in reaction to startling sounds.

15

Soft vs. Loud beeps We also wanted to test for the effect of sound intensity levels on participants' experience of photisms. For this analysis, we selected participants who experienced visual percepts in reaction to the sounds (twenty participants). We compared the frequency of photism button presses for soft vs. loud beep trials and found them to be significantly different using a Wilcoxin signed- rank sum test at p = 0.007 (z = -2.711, n = 20). Therefore, louder beeps more successfully elicited visual sensations among participants, possibly because louder beeps proved to be more startling in nature, as confirmed in debriefing sessions.

(a)

16

(b)

Fig.(7): Proportion of photism responses as a function of trial type and sound intensity. Boxplots illustrating differences in conditions. Y-axis represents the range of photism responses for that condition and total proportion of photism response per subject, as indicated through the boxpoints. (a) X-axis represents type of trial (beep vs. no beep). As in the previous experiment, beep trials were found to elicit photisms on a more frequent basis than no- beep trials. This difference was significant at p = 0.000 (z = -3.981, n = 21). (b) X-axis represents beep sound intensity (loud vs. soft). The loud beeps elicited significantly more flashes as compared to the soft beeps (z = -2.711, p = 0.007, n = 20)

Beep sound source The source origins of beep sounds were also manipulated in the study and therefore, we conducted pairwise comparisons to test for interaction effects between sound intensity and beep sound location using two-factor (beep sound intensity x beep sound location) repeated measures ANOVA (soft-left beeps vs soft-right, soft-left vs. loud-left, soft-left vs. loud-right, soft-right vs. loud-left, soft-right vs. loud-right, loud-left vs. loud-right).

(refer to the table below).

17

COMPARISONS SIGNIFICANCE STATISTICS

soft-left vs. soft-right N.S t(19) = 1.28, p = 0.2170

soft-left vs. loud-left significant t(19) = 4.08, p = 0.006

soft-left vs. loud-right significant t(19) = 2.93, p = 0.0086

soft-right vs. loud-left significant t(19) = 3.80, p = 0.0012

soft-right vs. loud-right significant t(19) = 2.54, p = 0.0200

loud-left vs. loud-right N.S t(19) = 0.00, p = 1.000

Table (1): Pairwise comparisons and significant levels across the different conditions in experiment II

As seen from the table above, the number of photisms perceived did not differ as a function of where the sound was perceived. It was only the sound intensity that determined the possible occurrence of photisms but the location of sound source (left vs. right) made no significant difference.

However, we were also interested in determining if the location of sound source made a difference to the hemifield in which the visual sensation was experienced, i.e., if left speaker sounds elicited photisms in the left visual field and vice-versa. Previous research has suggested visual effects to arise in the hemifield associated with the perceived sound (Afra et al., 2009, McDonald, Stormer, Martinez, Feng, & Hillyard, 2013).

There were a few instances wherein participants responded with a button press corresponding to a spatially distributed photism, or were unsure of the photism side, but those button presses were excluded for this part of the analysis. Results from this experiment found ipsilateral photisms to be significantly more frequent than contralateral photisms (photisms arising on the opposite side of the sound location) (z = -3.848, p = 0.000, n = 20), in line with previous findings. These results suggest that the startling sounds activate areas of the contralateral visual cortex, resulting in conscious ipsilateral visual experiences.

Photism frequency across time was observed to determine if photisms were more frequent in the former half of the experiment vs. the latter half. However, this difference was found to be insignificant (z = -0.558, p = 0.577, n = 21, N.S.).

The results from experiment II suggest that louder sounds were judged to be more startling and were therefore linked with greater number of photisms, as compared to the softer sounds. They also suggest startling sounds to activate contralateral visual areas, leading to photisms on the same side as where the sound was perceived. However, in order to ensure that the participants' reports on photisms were not marked by biases or confabulations, we decided to

18 conduct a follow-up experiment using control stimuli to test for the robustness of this phenomenon.

EXPERIMENT III Experiment III was an attempt to counter the likelihood of any possible biases or "fake" visual experiences on part of the subject. Therefore, in addition to the sound stimuli presented in the previous experiments, we introduced real flashes of light, either in conjunction with the sound stimuli or in isolation, to determine if participants could distinguish between real flashes from those they perceived as a result of the sounds. We reasoned that if they consistently confused the sound-evoked photisms for the real flashes, it served as further evidence for our model and for the power of sounds under specific circumstances to evoke conscious sensory experiences.

Method Participants Twenty-one participants (ten males, eleven females; mean age 18.81 years) who were undergraduate students at University of Michigan participated in the experiment. Of these, three subjects were left-handed, while the remaining eighteen were right-handed. An additional participant who was a likely synesthete was tested but not included for analysis. They were unaware of the purpose of the experiment and gave informed consent to participate in the study in exchange for course credits.

Prior to this study, an experiment with a similar paradigm was run on sixteen participants (eleven males, five females; mean age 18.56 years) at the University of Michigan to pilot the most optimal design and stimulus characteristics for this study. Only the data for the final study are presented in this section.

Procedure The materials, stimuli, set-up and procedure in experiment III were once again similar to experiment I and II but with some marked differences. Participants were still seated ~30" from the computer monitor, but the speakers were brought closer to the computer screen (one speaker placed directly beneath the monitor on each side) in an attempt to merge the spatial location of the flashes and beeps, to the best possible extent. Most importantly, some trials now contained real visual flashes. These visual flashes appeared for 10 ms (same duration as the beeps) either to the left or right side of the screen (262 px from the center on either side) in the form of opaque white oval flashes (300 X 350 px). Flashes and beeps could either be presented in isolation or in conjunction with each other, on the same trial. As in the last experiment, the source location of the beep sounds varied, i.e. they appeared either from the left or right speakers. Therefore, the location of the visual flashes also varied between left and right, making some flashes ipsilateral with beep location (valid beep-flash trials) and others contralateral (invalid beep-flash trials). Therefore, the trials of interest either contained isolated flashes, isolated beeps, co-localized beeps and flashes (beeps and flashes presented on the same side of the screen) or mismatched beeps and flashes (beeps and flashes presented on opposite sides of each other on the screen). These measures were introduced as additional controls to avoid confabulations or biases in the study.

19

Fig. (8). Schematic representation of the trial structure in experiment III. The trials contained one of (a) isolated beeps, (b) isolates flashes, (c) beep-flash combinations or (d) no stimuli.

Results and analysis Participants performed the Kosslyn task with near-ceiling level accuracies (~93% across all conditions), denoting task engagement.

Beep-only vs. No-Beep/No-Flash trials As in all the other preceding experiments, we first compared the proportion of flashes reported for beep trials to that for blank/empty trials (containing no stimuli). A Wilcoxin signed-rank test indicated that the reactions were significantly different for beep vs. blank trials (z = -2.121, p = 0.034, n = 21). While the proportion of beep-induced flashes detected in this experimental variant is much smaller than the previous experimental variants, the beeps still manage to elicit hallucinatory visual responses as compared to blank trials where no visual experiences were recorded by any participant in the study.

Flash vs. Beep vs. Empty trials We tried to isolate the effect of each individual stimulus in eliciting flash responses. In order to do so, we drew a comparison between flash-only, beep-only and empty trials to determine

20 whether the presence or absence of stimuli or the nature of the stimulus could play a role in flash detection rates.

Fig. (9): Proportion of photism responses as a function of trial type. The boxplots represent the difference in proportion of photism responses for each trial type (flash only, beep only and blank/empty trials). X-axis represents the type of trial and the Y-axis represents the proportion of photism response per participant and range of responses across subjects. As seen in the figure, the flash only trials were most effective in evoking reactions.

As seen from the plot above, real flashes were most effective in eliciting flash detection responses from participants. In fact, the Wilcoxin signed-rank test shows the flash stimuli to be significantly more salient than the beep stimuli, easily crossing participants' threshold levels for detection (z = -4.008, p = 0.000, n = 21). A possible reasoning for this finding is that given the strong salience of the flash stimuli in this experiment, the internal criterion for detection may have shifted leading to a high, well-defined threshold for detection than in the previous experiments. Possibly, the beep-stimuli may still have elicited some visual sensations which were too weak to cross internal detection thresholds, resulting in sparse responses.

To ascertain if participants can actually detect real flashes, we compared the proportion of correct flash-detection responses to incorrect flash detection responses on flash-only trials. We hypothesized that if the participants could indeed detect flashes on the side in which they appeared, the proportion of correct responses should be higher than the proportion of incorrect responses. As expected, there was a significant difference between correct and incorrect responses on flash only trials (Wilcoxin signed-rank test, z = -3.136, p = 0.002, n = 21).

1. On invalid trials, visual flash detection on the side of the flash (out of 8) vs. visual flash detection on the side opposite to the flash (or the beep-side) (out of 8) The proportion of real flash detection was significantly lower than the proportion of illusory flash detection on invalid trials. A signed-rank sum test found this difference to be significant at

21 p = 0.000 (z = -4.030, n = 21). This finding stands in direct contradiction to the previous finding which claims flashes to be more effective than beeps in eliciting visual reactions. While the differing salience of the beep and flash stimuli could be the main factor leading to such contradictory responses, this sound-biased mislocalization could also be the result of a reversed- Ventriloquism effect. As described in the introduction of this paper, the Ventriloquism effect posits that localization in a visual-auditory multisensory domain is strongly dominated and biased by visual cues, such that the auditory cues seem to shift across space and coincide with the visual cues (Choe, Welch, Gilford & Juola, 1975). In our study however, participants tend to be biased in favour of the sound on multisensory trials in localizing flash stimuli, which stands in contrast to the Ventriloquism effect. Some reasons for this may include strong spatial linking of the auditory and visual stimuli by the participants - leading them to force the more vague visual stimulus to coincide with the more defined auditory stimulus. Again, this reasoning may hold true only for multisensory trials and not for the single-stimulus trials. Some other research points to the modulation of feedforward visual processes within the visual cortex by auditory stimuli because of faster cortical responses to auditory vs. visual stimuli (Liegeois-Chauvel et al, 1994; Martinez et al, 1999, Schroeder et al, 2004). Another possible factor could be the relatively fewer number of trials per condition which could result in lower power of the study.

The debriefing reports of the participants suggested high confidence in the flashes detected as real and no awareness of any illusory flashes (on response to beep-only trials), which is still consistent with our model. On being asked how confident they were about the veridicality of the flashes, some participants responded with statements like: “I am very confident that I detected most of the flashes”, “100% [confident]”, “pretty confident”. Some participants even provided colorful, extravagant illustrations when asked to draw out the flashes they detected, which is in contrast with our oval-shaped, white flashes, hinting at the perception of “illusory” percepts by the participants in response to the beep.

With respect to spatially distributed flashes, there was a trend towards detecting spatially distributed flashes on invalid trials more so than on valid trials which is a reasonable assumption. However, this difference was found to be insignificant (z = -1.115, p = 0.265, n = 21, N.S.).

2. Hits: proportion of flash detection on the correct side on valid trials (out of 8) and invalid trials (out of 8) and flash-only trials (out of 8) False alarms: proportion of flash detection on the incorrect side of valid (out of 8) and invalid trials (out of 8), flash-only(out of 8) and beep-only trials (out of 8) In this part of the analysis, we compared all the hits (proportion of flash detection on the correct side on valid trials, invalid and flash-only trials) to all the false alarms (proportion of illusory flash experiences on the incorrect side of valid, invalid, beep-only and flash-only trials). We expected no significant difference between the hits and false alarms as we expected a high occurrence of illusory flash detection on invalid and beep-only trials. The difference between hits and false alarms in experiment IV was insignificant, as hypothesized (z = -0.571, p = 0.568, n = 21).

22

EEG STUDY Method Participants: Twenty participants (eight males and twelve females, mean age = 20.76 years) participated in the experiment after giving informed consent. Of these, nineteen participants were right-handed and one was left-handed. Some additional participants were tested but excluded from analysis for the following reasons: two participants deviated largely from the mean age, an additional two participants performed at chance level accuracy and data from five participants were excluded due to excessive noise or technical errors. Additionally, the initial ten participants served as pilots for the study and their data were also excluded from analysis. All participants had normal to normal-to-corrected vision and normal hearing.

Stimuli and procedure: The stimuli, set-up and materials used in the experiment were similar to behavioural experiment I. The number of trials increased from 104 to 208 (the first four trials being practice trials), with a sound stimulus on every trial. The sound stimuli could range from 70 dB (startling/ loud sounds) to ~ 30 dB (threshold level/ soft sounds) in terms of their SPL. As in the previous experiments, external speakers were placed at the distance of ~35” to the left and right of the participants. Therefore, the four kinds of sound stimuli represented the events in this study - left- loud, right-loud, left-soft and right-soft, depending on the sound intensity and location. Unlike the previous behavioural experiments, participants in the EEG study did not have the option to indicate the spatial source of the flash (if any) but were instructed to indicate if they experienced a flash or not at all, on every trial. All EEG subjects were instructed to have their closed for the entire duration of the experiment. The purpose of the experiment was disclosed to the participant to alert them to the possibility of visual experiences.

The electroencephalograms (EEGs) were recorded continuously from 62 channels adhering to the international 10-20 system montage (Jasper 1958) sampled at a rate of 5000 Hz with no online filtering (actiChamp, BrainVision, Munich, Germany). The eye movements were monitored using additional lateral cathode electrodes, with vertical EOGs recorded from electrodes placed on the sides of the eyes and the horizontal EOGs monitored with an electrode placed above the left eye. The electrode impedances were kept below 20 k Ω.

Behavioural data results The mean accuracy in the imagination task across all conditions was ~94% suggesting high engagement in the performance task. Participants took an average of 3.79s to respond to the imagination task in the experiment.

A total of 15 participants out of 20 reported seeing flashes during some part of the experiment. For these 15 participants, the loud-beeps were significantly more effective than the soft-beeps in evoking photisms (Wilcoxin signed-rank test, z = -3.202, p = 0.001, n = 15). Of the 15 participants, 10 participants reported flashes that were almost equally distributed across both loud- and soft-beep trials.

23

Fig.(10).: Range of photism responses as a function of trial type and sound intensity. The boxplots represent the difference in photism responses across the two conditions. Similar to the results obtained in experiment III, the loud beeps elicited significantly more flashes as compared to the soft beeps z = -3.202, p = 0.001, n = 15).

Preprocessing: The majority of the preprocessing steps were performed on EEGlab (Delorme & Makeig, 2004) along with custom MATLAB scripts. The data were referenced with respect to the average of the two mastoids, with a bandpass filter of 0.01 - 100 Hz. An additional 60-Hz Notch filter was used to remove line noise. The data were further downsampled to 1 kHz to save on computation time. For individual-level analysis, the EEG was averaged in 1000 ms windows around auditory stimulus onset and baselined to a pre-stimulus interval of 500 ms. Epochs containing artifacts (muscle activity, ocular artifacts and drifts) were detected and rejected offline using the moving peak-to-peak amplitude method on EEGLab along with visual inspection (<15% trials rejected per subject), and channels heavily ridden with artifacts were interpolated on EEGlab. Slow drifts were removed from continuous EEG data. The first four trials were considered practice trials to ensure that the task was clear to the subject and therefore excluded from analysis.

For individual-level analysis, signals from the occipital channels of interest ('O1', 'O2', PO7', 'PO8', 'Oz', 'Iz') were analyzed for each condition (4 "see" conditions and 4 "no-see" conditions for each type of auditory stimulus: left-soft, left-loud, right-soft and right-loud) based on the designated epoch duration. Wavelet analyses were performed at 39 frequencies between 2 and 40 Hz with linearly increasing number of cycles from 3 to 10 across this frequency range. The input data were subject to parametric tests of significance (2-tailed paired t-test), with a threshold of p<0.05. Data were multiple comparison corrected using False-Discovery Rate (FDR) correction at each time-point for the Event-Related Potentials (ERP) analysis and in the time-frequency domain in the 2-dimensional spectral graph.

24

Group-level analyses Data were averaged across subjects to compute ERPs for the occipital channels of interest (average of 'O1', 'O2', 'PO7','PO8','Oz','Iz') for the "see" and "no see" conditions. The analyses performed for the group level data were the same as for the individual level data. The ERPs for the two conditions were subject to a 2-tailed paired t-test to determine significance of differences at each time point.

Spectral power distributions for the two conditions were plotted to determine if a difference in frequency-specific power could drive the experience of visual flashes, therefore differentiating between the two conditions. The spectral power analyses were subject to the same set of parameters as the group level ERP analyses, with the additional Whitening transformation of data. Previous multisensory studies demonstrate higher suppression of alpha-band related activity in the visual cortex for detection of visual stimuli presented at threshold levels and strong alpha-band activity for active suppression of visual stimuli (Hanslmayr et al., 2005a; Dijk, Schoffelen, Oostenveld, & Jensen, 2008). Some other studies have described strong alpha-band power as being linked with inhibitory processes (Ergenoglu, Demiralp, & Bayraktaroglu, 2004; Thut et al., 2006; Dijk et al., 2008; Jensen, Bonnefond, & Vanrullen, 2012) Therefore, it follows that the "see" trials could possibly be linked with higher suppression of alpha activity while the "no-see" trials could be dominated by stronger alpha activity, even in the absence of a real visual stimulus in our study. Some research has even demonstrated that alpha phase could have a role to play in detection of threshold-level visual stimuli (Busch, Dubois, & Vanrullen, 2009; Dugué & VanRullen, 2011; Milton & Pleydell-pearce, 2016). Thus, to determine the extent of phase- locking to the auditory stimulus onset for the different frequency bands across trials, phase analyses were conducted.

EEG results and analysis For the EEG study, only those subjects who experienced flashes on a threshold number of trials (at least 15 trials) and whose flashes were more or less equally distributed for the loud beep and soft beep conditions were included in this analysis. This was done so as to have a fair distribution of trials (containing flashes) across the two conditions, especially since the dependent variable measure varied largely between participants. Therefore, a total of ten subjects' data were analyzed. Moreover, only those participants who performed at high accuracy levels in the Kosslyn task were included, so as to ensure high engagement in the imagination task.

Event-Related Potentials (ERPs) ERPs were calculated at the occipital channels of interest as outlined above. Fig.(11) represents the difference in the amplitude of the ERPs for the "see" and "no-see" conditions, over occipital scalp regions. This ERP reflects the difference in the two conditions across both ipsilateral and contralateral hemispheres. The reference, being set to the mastoids, leads to the emergence of a strong post-auricular muscle response (PAMR) regardless of the experience of visual flashes, within the first 25 ms from stimulus onset (McDonald et al., 2013). Significant differences in the ERP waveforms for the two conditions emerge within the first 100 ms of auditory stimulus onset, with the “no-see” ERP waveform being more positive than the “see” waveform.

25

Fig.(11) The ERP waveforms for the "see" (blue) and "no-see" (red) response conditions. The X-axis represents the time in seconds and the Y-axis is a measure of the ERP amplitude in microVolts. Time 0 represents the onset of the auditory stimulus and the black bars along the X-axis represent differences in ERPs across the two conditions, for that time point. The first spike seen at ~25ms from stimulus onset represents the post-auricular muscle response (PAMR) since the data were referenced to the average of the mastoids. The "see" and "no-see" conditions have differing ERP waveforms, as observed in this plot. The "no-see" condition waveforms are significantly more positive as compared to the "see" condition waveforms.

Spectral-power analysis Some significant differences in the alpha and theta frequency bands across the "see" and "no- see" conditions were observed for individual subjects. However, no significant patterns across frequency bands emerged on a group level, as seen in fig.(12).

This could indicate the absence of spectral-power-driven mechanisms in the experience of hallucinatory visual experiences in our study, possibly due to the relatively few number of trials per condition.

26

Fig.(12): Spectral-power plots for the "see" and "no-see" conditions across all subjects. Time (in seconds) on the x-axis, ranging from 500 ms before and after stimulus onset (at 0) and frequency on the y-axis (ranging from 0 to 40Hz). Both conditions are marked by a strong post-auricular muscle response (PAMR) response within 25ms of stimulus onset. No significant differences in spectral-power distributions were observed across conditions.

Phase analysis Since spectral analysis did not reveal significant power differences between conditions, we then turned to phase analysis to determine if phase-locking activity could impact the experience of illusory flashes or photisms. Previous studies hint at the role of an optimal pre-stimulus alpha phase in the detection of threshold-level visual stimuli (Busch, Dubois, & Vanrullen, 2009) Moreover, these studies have also found high pre-stimulus ITPC (inter-trial phase coherence) for certain frequency bands for successful detection trials vs. unsuccessful detection trials (Hanslmayr et al., 2007).

In our analysis, we explore phase coherence across frequencies to observe if there was high phase consistency for any one condition vs. the other, which may possibly explain the difference in behavioural outcomes and the ERP waveforms for the two conditions. Specifically, is there high phase-locking to the stimulus onset across trials for the "see" vs. the "no-see" conditions? If yes, could an optimal phase angle be a necessary condition for the experience of photisms? Or is the ongoing oscillation in the primary visual areas reset/entrained by the incoming auditory stimulus for some trials, facilitating the photism experience?

To answer these questions, we looked at the ITPC for a broad range of frequencies, for the entire epoch duration, with the same set of specifications applied for the ERSP analyses in our study. High inter-trial phase coherence could possibly reflect phase resetting activity due to presentation of an external stimulus, leading to task-evoked entrainment of oscillations. However, it could also reflect natural synchronization of ongoing oscillations. Here, we wanted to determine if there was a difference in phase-locking activity across the two conditions.

27

As seen in fig.(13), No significant differences in phase-locking activity were observed across the two conditions. One possible reason for this could be low power in some frequency bands that could bias the overall ITPC measure and introduce noise in the data.

Fig.(13): ITPC across the two conditions: "see" vs. "no see". Time (in seconds) is plotted along the x-axis (500 ms before and after stimulus onset) and Frequency (in Hz) along the y-axis (0 to 30 Hz). (a) ITPC for the "see" trials reflects average phase locking activity to the auditory stimulus onset, for trials subjects report seeing flashes (b) ITPC for the "no-see" trials reflects average phase locking activity to the auditory stimulus onset, for trials subjects do not report seeing flashes. (c) The significant differences in the ITPC across the two conditions. No significant clusters of differences were found in the phase coherence across the two conditions in our study - ruling out the role of pre-stimulus phase in the experience of photisms.

28

Discussion Hypnagogic synesthetic experiences have commonly been reported in literature, most often in individuals with classic forms of synesthesia, underlying neurological conditions or under the influence of certain drugs. Here, we report the first empirical study of naturally occurring synesthetic experiences, examine the state and stimuli necessary to evoke non-synesthetic auditory-visual hallucinations, the prevalence of typically developing individuals in whom we could evoke these experiences, and the mechanisms underlying them.

Through extensive piloting, we identified four conditions aiding the experience of hallucinations: (1) dark environment (2) performance of a visual imagery task and (3) presentation of startling sounds and (4) lack of exposure to external visual inputs (achieved by keeping the eyes shut during the experiment).

We kept the first three conditions constant across the first two behavioural experiments and found the startling beep sounds to be significantly more effective in evoking hallucinatory visual percepts, as compared to trials without the beep sounds. Moreover, on manipulating the intensity of the incoming sounds, the effect diminished for the softer sounds as compared to the louder ones, demonstrating the importance of a startling sound (usually louder and more intense) in evoking such experiences.

In experiment III, we introduced real flashes along with the auditory beeps to determine if participants could successfully distinguish the real flashes from the sound-induced ones. Our results indicated that participants could detect the veridical flashes with ease while responding erratically to the beep-only trials, possibly due to altered internal sensory thresholds in detection of stimuli. On the other hand, on beep-flash trial combinations, they favoured the beep-side at the time of localization hinting at the differing salience of the two kinds of stimuli on multisensory trials. Since there were relatively fewer trials per condition, our data may be subject to confounds.

We finally ran an EEG study to explore the neural basis of this phenomenon and to identify activation patterns in the occipital areas of interest, on trials where participants report hallucinatory experiences. The ERP waveforms for the "see" and "no-see" conditions differ, with the "no-see" ERP waveform being more positive than the "see" waveform. Since strong alpha activity in the occipital cortex has generally been linked with inhibitory visual processes, we ran a spectral power analysis across a wide range of frequencies to identify differences in power across the two conditions. The results were inconclusive, leading us to look at phase- locking activity to stimulus onset, across a wide range of frequencies in the occipital regions of interest. We wanted to explore if phase of ongoing oscillations in the visual cortex could be reset by auditory stimulus onset, possibly reflected in the difference in the ITPC across the two conditions. We measured the ITPC across a broad frequency range for the two conditions of interest, but found no significant differences.

Participants' debriefing reports provide evidence for the startling nature of the beeps and the strong belief in the flashes being detected as real. Moreover, the drawings made by the participants closely resemble Kluver-form constants and usually tend to be co-localized in the space shared by the beep source. We also find that subjects need not be in a hypnagogic state to

29 experience these sensations as demonstrated by near ceiling performance on the visual imagery task and through confirmation of wakeful neural activity using EEG.

Conclusion The present research draws attention to the high incidence of latent synesthetic tendencies in non-synesthetic individuals, that are manifested under conducive conditions. Here, we present evidence for the power of startling sounds to not only modulate or influence activity in the visual areas, but also evoke conscious, qualitative sensory experiences. In audio-visual synesthetes, the startling nature of the sounds may not be necessary to elicit visual sensations, it may arise even otherwise due to the strong associations between the two systems. It is not entirely clear why startling sounds have such a powerful impact on the , but it may serve as a link between our sensory and threat detection systems, i.e., the additional noise introduced in the visual system due to the startling sound could significantly increase activity in that area, causing it to exceed threshold levels needed for perception. Further research investigating this link may be necessary to have attain deeper insights into the cross-modal connections in non-synesthetes. Our experiment could serve as a useful model to study latent multisensory processes that may exist in all individuals, bridging the gap between normal cross-modal experiences and atypical synesthetic sensations.

30

References Afra, P., Funke, M., & Matsuo, F. (2009). Acquired auditory-visual synesthesia: A window to early cross-modal sensory interactions. Psychology research and behavior management, 2, 31.

Aleman, A., Rutten, G. J. M., Sitskoorn, M. M., Dautzenberg, G., & Ramsey, N. F. (2001). Activation of striate cortex in the absence of visual stimulation: an fMRI study of synesthesia. Neuroreport, 12(13), 2827-2830.

Bertelson, P., & Aschersleben, G. (1998). Automatic visual bias of perceived auditory location. Psychonomic bulletin & review, 5(3), 482-489.

Brainard, D. H., & Vision, S. (1997). The psychophysics toolbox. Spatial vision, 10, 433-436.

Brang, D., Towle, V. L., Suzuki, S., Hillyard, S. A., Di Tusa, S., Dai, Z., ... & Grabowecky, M. (2015). Peripheral sounds rapidly activate visual cortex: evidence from electrocorticography. Journal of neurophysiology, 114(5), 3023-3028.

Bressloff, P. C., Cowan, J. D., Golubitsky, M., Thomas, P. J., & Wiener, M. C. (2002). What geometric visual hallucinations tell us about the visual cortex. Neural Computation, 14(3), 473- 491.

Bulkin, D. A., & Groh, J. M. (2006). Seeing sounds: visual and auditory interactions in the brain. Current opinion in neurobiology, 16(4), 415-419.

Busch, N. A., Dubois, J., & VanRullen, R. (2009). The phase of ongoing EEG oscillations predicts visual perception. Journal of Neuroscience, 29(24), 7869-7876.

Busse, L., Roberts, K. C., Crist, R. E., Weissman, D. H., & Woldorff, M. G. (2005). The spread of attention across modalities and space in a multisensory object. Proceedings of the National Academy of Sciences of the United States of America, 102(51), 18751-18756.

Choe, C. S., Welch, R. B., Gilford, R. M., & Juola, J. F. (1975). The “ventriloquist effect”: Visual dominance or response bias?. Attention, Perception, & Psychophysics, 18(1), 55-60.

Cytowic, R. E. (1995). Synesthesia: Phenomenology and neuropsychology. Psyche, 2(10), 2-10.

Cytowic, R. E. (1996). The neurological side of neuropsychology. MIT Press. Delorme, A., & Makeig, S. (2004). EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of neuroscience methods, 134(1), 9-21.

Dugué, L., Marque, P., & VanRullen, R. (2011). The phase of ongoing oscillations mediates the causal relation between brain excitation and visual perception. Journal of Neuroscience, 31(33), 11889-11893.

31

Ergenoglu, T., Demiralp, T., Bayraktaroglu, Z., Ergen, M., Beydagi, H., & Uresin, Y. (2004). Alpha rhythm of the EEG modulates visual detection performance in humans. Cognitive Brain Research, 20(3), 376-383.

Frassinetti, F., Bolognini, N., & Làdavas, E. (2002). Enhancement of visual perception by crossmodal visuo-auditory interaction. Experimental Brain Research, 147(3), 332-343.

Hanslmayr, S., Aslan, A., Staudigl, T., Klimesch, W., Herrmann, C. S., & Bäuml, K. H. (2007). Prestimulus oscillations predict visual perception performance between and within subjects. Neuroimage, 37(4), 1465-1473.

Hanslmayr, S., Klimesch, W., Sauseng, P., Gruber, W., Doppelmayr, M., Freunberger, R., & Pecherstorfer, T. (2005). Visual discrimination performance is related to decreased alpha amplitude but increased phase locking. Neuroscience letters, 375(1), 64-68.

Hartman, A. M., & Hollister, L. E. (1963). Effect of mescaline, lysergic acid diethylamide and psilocybin on color perception. Psychopharmacology, 4(6), 441-451.

Hubbard, E. (2007). Neurophysiology of synesthesia. Current psychiatry reports, 9(3), 193-199.

Innes-Brown, H., & Crewther, D. (2009). The impact of spatial incongruence on an auditory- visual illusion. PLoS One, 4(7), e6450.

Jacobs, L., Karpik, A., Bozian, D., & Gøthgen, S. (1981). Auditory-visual synesthesia sound- induced photisms. Archives of Neurology, 38(4), 211-216.

Jasper, H. H. (1958). The ten twenty electrode system of the international federation. Electroencephalography and Clinical Neuroph siology, 10, 371-375.

Jensen, O., Bonnefond, M., & VanRullen, R. (2012). An oscillatory mechanism for prioritizing salient unattended stimuli. Trends in cognitive sciences, 16(4), 200-206.

Lessell, S., & Cohen, M. M. (1979). Phosphenes induced by sound. Neurology, 29(11), 1524- 1524.

Liegeois-Chauvel, C., Musolino, A., Badier, J. M., Marquis, P., & Chauvel, P. (1994). Evoked potentials recorded from the auditory cortex in man: evaluation and topography of the middle latency components. Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section, 92(3), 204-214.

Marks, L. E. (1975). On colored-hearing synesthesia: cross-modal translations of sensory dimensions. Psychological bulletin, 82(3), 303.

32

Martinez, A., Anllo-Vento, L., Sereno, M. I., Frank, L. R., Buxton, R. B., Dubowitz, D. J., ... & Hillyard, S. A. (1999). Involvement of striate and extrastriate visual cortical areas in spatial attention. Nature neuroscience, 2(4), 364-369.

McDonald, J. J., Störmer, V. S., Martinez, A., Feng, W., & Hillyard, S. A. (2013). Salient sounds activate human visual cortex automatically. Journal of Neuroscience, 33(21), 9194-9201.

Mercier, M. R., Foxe, J. J., Fiebelkorn, I. C., Butler, J. S., Schwartz, T. H., & Molholm, S. (2013). Auditory-driven phase reset in visual cortex: human electrocorticography reveals mechanisms of early multisensory integration. Neuroimage, 79, 19-29.

Meredith, M. A., & Stein, B. E. (1983). Interactions among converging sensory inputs in the superior colliculus. Science, 221(4608), 389-391.

Meredith, M. A., & Stein, B. E. (1985). Descending efferents from the superior colliculus relay integrated multisensory information. Science, 227, 657-660.

Meredith, M. A., & Stein, B. E. (1986). Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of neurophysiology, 56(3), 640-662.

Meredith, M. A., & Stein, B. E. (1996). Spatial determinants of multisensory integration in cat superior colliculus neurons. Journal of Neurophysiology, 75(5), 1843-1857.

Miller, J. (1982). Divided attention: Evidence for coactivation with redundant signals. Cognitive psychology, 14(2), 247-279.

Milton, A., & Pleydell-Pearce, C. W. (2016). The phase of pre-stimulus alpha oscillations influences the visual perception of stimulus timing. Neuroimage, 133, 53-61.

Noesselt, T., Tyll, S., Boehler, C. N., Budinger, E., Heinze, H. J., & Driver, J. (2010). Sound- induced enhancement of low-intensity vision: multisensory influences on human sensory- specific cortices and thalamic bodies relate to perceptual enhancement of visual detection sensitivity. Journal of Neuroscience, 30(41), 13609-13623.

Page, N. G. R., Bolger, J. P., & Sanders, M. D. (1982). Auditory evoked phosphenes in optic nerve disease. Journal of Neurology, Neurosurgery & Psychiatry, 45(1), 7-12.

Pick, H. L., Warren, D. H., & Hay, J. C. (1969). Sensory conflict in judgments of spatial direction. Attention, Perception, & Psychophysics, 6(4), 203-205.

Sagiv, N., Simner, J., Collins, J., Butterworth, B., & Ward, J. (2006). What is the relationship between synaesthesia and visuo-spatial number forms?. Cognition, 101(1), 114-128.

33

Schroeder, C. E., Molhom, S., Lakatos, P., Ritter, W., & Foxe, J. J. (2004). Human–simian correspondence in the early cortical processing of multisensory cues. Cognitive Processing, 5(3), 140-151.

Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions: What you see is what you hear. Nature, 408(6814), 788.

Simner, J., Mulvenna, C., Sagiv, N., Tsakanikos, E., Witherby, S. A., Fraser, C., ... & Ward, J. (2006). Synaesthesia: the prevalence of atypical cross-modal experiences. Perception, 35(8), 1024-1033.

Sinke, C., Halpern, J. H., Zedler, M., Neufeld, J., Emrich, H. M., & Passie, T. (2012). Genuine and drug-induced synesthesia: a comparison. Consciousness and cognition, 21(3), 1419-1434.

Steen, C. (2001). Visions shared: A firsthand look into synesthesia and art. Leonardo, 34(3), 203-208.

Steen, C. (2017). Synesthetic Photisms and Hypnagogic Visions: a Comparison.

Thut, G., Nietzel, A., Brandt, S. A., & Pascual-Leone, A. (2006). α-Band electroencephalographic activity over occipital cortex indexes visuospatial attention bias and predicts visual target detection. Journal of Neuroscience, 26(37), 9494-9502.

Van Dijk, H., Schoffelen, J. M., Oostenveld, R., & Jensen, O. (2008). Prestimulus oscillatory activity in the alpha band predicts visual discrimination ability. Journal of Neuroscience, 28(8), 1816-1823.

Van Dijk, H., Schoffelen, J. M., Oostenveld, R., & Jensen, O. (2008). Prestimulus oscillatory activity in the alpha band predicts visual discrimination ability. Journal of Neuroscience, 28(8), 1816-1823.

Vike, J., Jabbari, B., & Maitland, C. G. (1984). Auditory-visual synesthesia: report of a case with intact visual pathways. Archives of Neurology, 41(6), 680-681.

Ward, J., & Meijer, P. (2010). Visual experiences in the blind induced by an auditory sensory substitution device. Consciousness and cognition, 19(1), 492-500.

Ward, J., & Simner, J. (2005). Is synaesthesia an X-linked dominant trait with lethality in males?. Perception, 34(5), 611-623.

34

Appendix

The questions in the debriefing questionnaire were as follows:

1. How startled were you by the high-pitched tones?

Not startled at all Mildly startled Somewhat startled Very startled

2. Did your experience (or lack of experience) of being startled change throughout the experiment?

More startled at the beginning No difference More startled at the end

3. Did you experience any visual sensation(s) during the course of the experiment (colors, shapes, textures, visual patterns, etc.)?

4. If yes, how frequently during the experiment did you experience these visual sensations?

5. How confident are you that you did or did not experience any visual sensations?

6. Were these experiences more common at the beginning or the end of the experiment, or equally common?

7. Was the occurrence of the visual sensation(s) linked to the presentation of any of the sounds? That is, did they immediately follow the words (e.g., Curve A), the high-pitched sounds (beeps), or did they seem to randomly occur?

8. Did the visual sensation(s) appear more towards the left-hand side (for example, near your left eye), or the right-hand side (for example, near your right eye) or were they randomly distributed across space?

9. Do you think the location of the short, high-pitched sounds (coming from the left or the right) influenced the spatial appearance of the visual sensation(s)?

10. Please describe and draw the visual sensation(s) that you experienced as best you as can.

11. What types of sounds elicit these colors?

12. Do you experience colors in response to letters or numbers in your everyday life (e.g., you think of or see a color such as blue when you look at the number 2 written in black ink)?

Don’t Know Never Rarely Frequently Every Day

13. What colors do you see for the numbers 2, 7,9, and the letters A, C, M, N?

14. When trying to fall asleep at night, do loud or startling sounds cause you to see flashes of light, even though your eyes are closed?

Don’t Know Never Rarely Frequently Every Day

35