Journal of Vision (2019) 19(1):3, 1–12 1

Predictive coding of visual motion in both monocular and binocular human Melbourne School of Psychological Sciences, The University of Melbourne, Melbourne, Australia Helmholtz Institute, Department of Experimental Psychology, Elle van Heusden Utrecht University, Utrecht, the Netherlands $

Institute of Cognitive Neuroscience, University College London, London, UK Queensland Brain Institute, Anthony M. Harris The University of Queensland, Brisbane, Australia $

Queensland Brain Institute, The University of Queensland, Brisbane, Australia School of Mathematics and Physics, The University of Queensland, Brisbane, Australia Centre for Advanced Imaging, The University of Queensland, Brisbane, Australia Australian Research Council Centre of Excellence for Integrative Brain Function, The University of Queensland, Marta I. Garrido Brisbane, Australia $

Melbourne School of Psychological Sciences, The University of Melbourne, Melbourne, Australia Helmholtz Institute, Department of Experimental Psychology, Hinze Hogendoorn Utrecht University, Utrecht, the Netherlands $

Neural processing of sensory input in the brain takes reversal and the target are presented dichoptically to time, and for that reason our awareness of visual events separate eyes. This reveals extrapolation mechanisms at lags behind their actual occurrence. One way the brain both monocular and binocular processing stages might compensate to minimize the impact of the contribute to the illusion. We interpret the results in a resulting delays is through extrapolation. Extrapolation hierarchical predictive coding framework, and argue that mechanisms have been argued to underlie perceptual prediction errors in this framework manifest directly as illusions in which moving and static stimuli are perceptual illusions. mislocalised relative to one another (such as the flash- lag and related effects). However, where in the visual hierarchy such extrapolation processes take place remains unknown. Here, we address this question by Introduction identifying monocular and binocular contributions to the flash-grab illusion. In this illusion, a brief target is flashed Neural processing of sensory input in the brain takes on a moving background that reverses direction. As a time, and for that reason our awareness of visual events result, the perceived position of the target is shifted in lags behind their actual occurrence. If the the direction of the reversal. We show that the illusion is did not somehow compensate for neural transmission attenuated, but not eliminated, when the motion delays, we would consistently mislocalize moving

Citation: van Heusden, E., Harris, A. M., Garrido, M. I., & Hogendoorn, H. (2019). Predictive coding of visual motion in both monocular and binocular human visual processing. Journal of Vision, 19(1):3, 1–12, https://doi.org/10.1167/19.1.3.

https://doi.org/10.1167/19.1.3Received July 27, 2018; published January 10, 2019 ISSN 1534-7362 Copyright 2019 The Authors

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937737/ on 01/17/2019 Journal of Vision (2019) 19(1):3, 1–12 van Heusden, Harris, Garrido, & Hogendoorn 2

objects behind their actual position. Nevertheless, consistent with the flash-lag effect (Lotter, Kreiman, & human observers are typically very accurate at local- Cox, 2018). izing moving objects, achieving near-zero lag when An important conceptual challenge to interpreting object trajectories are predictable (Brenner, Smeets, & motion-induced position shifts as resulting from de Lussanet, 1998). One explanation for how the brain predictive mechanisms arises from the observation that might overcome its internal delay is through extrapo- the perceived position of a static event is biased lation: By exploiting knowledge about an object’s past primarily by motion presented after that event, rather trajectory, the brain predicts its present position. than before it. This led Eagleman and Sejnowski (2000) Although accurate interaction with moving objects to coin the term post-diction, as a temporal counterpart could also be achieved by extrapolation in the motor to prediction. In this original post-diction account, system (e.g., Kerzel & Gegenfurtner, 2003), extrapola- events were essentially back-dated in perception, tion mechanisms in the visual system have been rewriting recent perceptual history. Several years later, hypothesized to underlie a class of visual illusions in the same authors presented a refined version of this which visual motion signals affect the perceived model, in which local velocity signals integrated over a location of stationary objects. This includes the much- brief period after an event interact with local position studied flash-lag effect, in which a moving object that is signals to bias its perceived position (Eagleman & physically aligned with a stationary flash is perceived Sejnowski, 2007). Much has been made of what seems ahead of that flash (Nijhawan, 1994). In this interpre- like reverse causality in the postdiction account, and tation, the brain extrapolates the position of the the apparent contrast with predictive extrapolation moving object along its expected trajectory to com- mechanisms. However, the predictive model and the pensate for lag that would otherwise arise due to postdictive motion-biasing model are mechanistically processing time. When the flash is presented aligned the same, differing only in the time-window during with the moving object, it is compared to the which motion signals are integrated. Eagleman and extrapolated position of the moving object, and hence Sejnowski (2007) note that ‘‘Motion biasing will appears to lag behind it. normally push objects closer to their true location in the world [...] by a clever method of updating signals In the years following Nijhawan’s initial demon- that have become stale due to processing time’’ (p. 9), strations of the flash-lag effect, numerous other which is precisely what motion extrapolation also does motion-induced position shifts have been reported, (Nijhawan, 2008). Eagleman has more recently argued including the flash-drag (Whitney & Cavanagh, 2000), that prediction and postdiction cooperate to compen- flash-jump (Cai & Schlag, 2001), and flash-grab sate for neural delays (Eagleman, 2008), simply because (Cavanagh & Anstis, 2013) effects. Although the predictions by their nature sometimes do not come underlying mechanisms have been hotly debated (e.g., true, necessitating posthoc revisions to the timeline of Eagleman & Sejnowski, 2000; Krekelberg, 2000; Patel, experience. Viewed more broadly, prediction and Ogmen, Bedell, & Sampath, 2000; Whitney & Mur- postdiction are simply two halves of the same akami, 1998), convergent evidence points to an mechanism, split along the line separating past from important role for predictive extrapolation mechanisms future. However, in the context of sensory processing, in causing these effects (Nijhawan, 2008). For instance, this line is artificial: Due to neural delays, all cortical animal neurophysiology studies have demonstrated the areas process information collected in the objective existence of predictive extrapolation mechanisms in the past. Although they seem polar opposites, predictive retinae of salamanders, mice, and rabbits (Berry, and postdictive accounts both push the representation Brivanlou, Jordan, & Meister, 1999; Schwartz, Taylor, of an object closer to its true location in the world at a Fisher, & Harris, 2007), as well as in cat primary visual given instant. Given the reality of neural processing cortex (Jancke, Erlhagen, Scho¨ner, & Dinse, 2004). In delays, this means anticipating (i.e., predicting) the humans, it has been demonstrated that moving objects present. are extrapolated into regions of visual space where they Interestingly, this predictive rationale fits neatly with could physically not be detected, such as the blind a more recent computational model aiming to unify spot—ruling out explanations in terms of differential position and motion perception (Kwon, Tadin, & latencies (Maus & Nijhawan, 2008). Modeling studies Knill, 2015), which applied a Bayesian approach to have shown that a Bayesian model of perceived another motion-position illusion (motion-induced po- position that incorporates neural delays generates sition shifts; De Valois & De Valois, 1991; Ram- predictive position shifts such as seen in the flash-lag achandran & Anstis, 1990). Kwon et al. advocate a effect (Khoei, Masson, & Perrinet, 2017), and most model in which motion and position judgments recently an unsupervised predictive neural network mutually interact to make optimal inferences about the exposed to natural video sequences (including motion) generative causes underlying sensory signals. The was found to have developed a pattern of response model is implemented as a Kalman filter, and therefore

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937737/ on 01/17/2019 Journal of Vision (2019) 19(1):3, 1–12 van Heusden, Harris, Garrido, & Hogendoorn 3

the represented position at a given time explicitly mislocalized. This is the effect we know as the flash- depends only on velocity signals integrated before that grab effect (Cavanagh & Anstis, 2013). Importantly, time. However, the precise time-window over which the magnitude of this mislocalization would be a direct signals are integrated relative to objective external time reflection of how far into the future the neural was not the focus of the model, nor does it invalidate representation of the moving object has been extrap- the mechanistic similarities it shares with the account olated. As is evident in Figure 1A through C, the longer proposed earlier by Eagleman and Sejnowski (2007). the processing delay, the further into the (local) future a Indeed, Eagleman and Sejnowski (2000) note that representation must be extrapolated in order to postdiction is commonplace in engineering, where it is compensate, and the longer it would take before a simply known as smoothing. Most importantly, the violation of that extrapolation is detected in that brain model’s core feature—that it causes a position signal to area. This would yield a stronger velocity spike as the be shifted in the direction of a motion signal—is the area ‘‘catches up,’’ and a bigger flash-grab effect. same: It is a predictive mechanism that causes Consistent with this interpretation of the flash-grab anticipatory activation at the object’s future position. effect as resulting from failed prediction, we recently A more recent illusion, the flash-grab effect, has demonstrated that the same neural mechanisms that provided the opportunity to study how predictive cause the location of the target in the flash-grab effect motion extrapolation mechanisms behave when motion to be misperceived also influence aimed at that vectors abruptly change, such that the anticipated target (Van Heusden, Rolfs, Cavanagh, & Hogen- future does not come true (Cavanagh & Anstis, 2013). doorn, 2018). Most importantly, we showed that the In this illusion, a target is briefly flashed on a moving degree of error increased with increasing background as the motion unexpectedly reverses saccadic latency. This indicates that the visuomotor direction, which results in the perceived position of the system was extrapolating the (stationary) target’s flash being shifted in the direction of the second motion position as if it was actually moving, confirming that sequence. Although neither Eagleman and Sejnowski mislocalization in this illusion results from an extrap- (2007) nor Kwon et al. (2015) made reference to the olation process. It can be readily appreciated from flash-grab effect, the flash-grab effect can also be Figure 1 that if processing delays are larger (for readily explained by the same mechanism. Figure 1 whatever reason), then the original trajectory will be shows schematically how this would work. A moving extrapolated further into the future, the object will object is represented at a given level of processing with move even further along its actual trajectory, and the a certain delay. It is possible to compensate for that total position error will be greater. The transient peak delay by using information about the object’s velocity in velocity as the system adjusts will therefore also be to extrapolate the true position of the object at that greater, in turn leading to a larger flash-grab effect. instant (e.g., Krekelberg & Lappe, 2001; Nijhawan, Where in the visual hierarchy the neural mechanisms 1994; see Figure 1A). When the object reverses responsible for extrapolation operate is still unknown. direction, it again takes time to detect the reversal, In animals, predictive neural mechanisms have been during which the object’s position continues to be identified at multiple levels of the visual system, extrapolated beyond the reversal point (Figure 1B). including the (Berry et al., 1999; Hosoya, Baccus, When sensory information about the object’s actual & Meister, 2005; Schwartz et al., 2007), lateral trajectory then becomes available, the represented geniculate nucleus (Sillito, Jones, Gerstein, & West, position must rapidly shift from the predicted trajec- 1994), primary (Jancke et al., 2004), and tory to the new trajectory. This rapid shift in V4 (Sundberg, Fallah, & Reynolds, 2006). In humans, represented position equates to a brief spike in velocity we recently demonstrated that the visual brain predicts (Figure 1C, upper plot). Importantly, the key features the position of a moving object using an EEG in Figure 1 are not hypothetical: The overshoot in classification paradigm (Hogendoorn & Burkitt, represented position and subsequent acceleration to 2018a). This study revealed that for an object in intercept the new trajectory exactly mirror population apparent motion, the neural representation of the codes reported in the mouse and salamander retina for object’s position is preactivated when the object moves such reversing stimuli (Schwartz et al., 2007). Although along a predictable trajectory. However, this was only such mechanisms have not yet been directly demon- true for neural representations evoked around 130 ms strated in the brain itself, because prediction error after stimulus presentation, whereas the latency of signals arise already in the retina, they are passed on to earlier neural position representations was not modu- the rest of the visual processing hierarchy even if they lated by prediction. In contrast, a previous EEG study would not be calculated there. The mechanism pro- of the flash-grab effect revealed that the target’s illusory posed by Eagleman and Sejnowski (2007) then predicts position was represented in the EEG signal as early as that any stationary object flashed at the reversal point 81 ms poststimulus (Hogendoorn, Verstraten, & would interact with this motion signal and be Cavanagh, 2015). These two studies therefore seem to

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937737/ on 01/17/2019 Journal of Vision (2019) 19(1):3, 1–12 van Heusden, Harris, Garrido, & Hogendoorn 4

Figure 1. Schematic illustration of extrapolation in the flash-grab effect. In each panel, the lower plot shows position as a function of time, and the upper plot shows velocity as a function of time. Solid gray traces indicate the properties of the physical stimulus as presented on the screen, and dotted black traces indicate (predictive) neural representations of the same stimulus, as demonstrated empirically in the retina by Schwartz et al. (2007). (A) In order to accurately localize a moving object despite neural transmission delays, the visual system uses concurrent velocity signals to extrapolate the real-time position of the object (blue lines). (B) When an object unexpectedly reverses direction, at any given level of representation, some time elapses before the reversal is detected. During that time, the object will continue to be (erroneously) extrapolated into positions where it is never presented, creating a prediction error. (C) As the represented position shifts from the predicted trajectory to a new trajectory, the rapid shift in represented position creates a brief spike in the represented velocity (dotted red trace). If a (stationary) flash is presented at the same time as the reversal, then the position of the flash will interact with the (large) transient velocity signal and be mislocalised, resulting in the flash-grab effect (red lines).

reveal extrapolation processes at different stages in the In order to answer this question, here we make use visual hierarchy. Hence, the limited evidence from of the fact that visual information from the two eyes humans is consistent with the evidence from animal does not converge until V1. Neurons that carry neurophysiology. information either from the retina to LGN, or from With the cortical EEG reflecting the target’s LGN to V1, carry information from only one eye, with extrapolated position already at about 80 ms post- the first binocular neurons in the visual pathway stimulus, the question is, where along the route from located in V1 itself (Parker, 2007). We use the flash- retina to cortex does this extrapolation take place? The grab effect, and employ dichoptic presentation to vast majority of visual information reaches the cortex separate the different components of the flash-grab through the retino-geniculo-cortical pathway, passing stimulus across the two eyes. In so doing, we prevent from retina to the lateral geniculate nucleus of the those components of the stimulus interacting at an thalamus before flowing on to the primary visual cortex early (monocular) stage of the visual hierarchy. We (V1). Although there are alternative pathways to the manipulate which components of the flash-grab cortex (a point to which we return in the discussion), stimulus sequence (motion prior to the flash, the flash given the severely limited timeframe, it is likely that the itself, and the motion following the flash) are visual extrapolation mechanisms responsible for the presented to which eye, and measure the strength of flash-grab effect operate along the geniculate pathway. the resulting illusion. If the shift in the perceived This function would parallel the predictive mechanisms position of a target presented in one eye is reduced in the retina, LGN, and V1 revealed in animals, but it when the flanking motion sequence is presented to the remains unknown whether (and if so, which of) these opposite eye, then this would be evidence that areas similarly carry out extrapolation in the human extrapolation mechanisms operate already at monoc- visual system. ular stages of processing.

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937737/ on 01/17/2019 Journal of Vision (2019) 19(1):3, 1–12 van Heusden, Harris, Garrido, & Hogendoorn 5

Here, we show that this is the case. The flash-grab square was 9.3 dva wide, drawn with a linewidth of 0.8 effect is indeed attenuated when the flash is not dva. Fixation dots were presented in the center of both presented in the same eye as both the preceding and annuli (diameter: 0.6 dva). To give the observers some the subsequent motion sequences. This makes a strong reference as to where they perceived the target, and to case for the existence of neural extrapolation mecha- aid binocular fusion and avoid torsional eye move- nisms in early, monocular stages in the visual ments, a white line was presented on the vertical hierarchy. The fact that the illusion was not entirely meridian just below both annuli (width: 0.06 dva; eliminated in these conditions indicates that extrapo- height: 0.5 dva; 3.4 dva from fixation). lation also occurs in later binocular areas. Altogether, the results therefore point towards extrapolation computations being carried out at multiple stages of Procedure the early visual pathway. On each trial, observers viewed a rotating annulus for 1,000, 1,100, 1,200, 1,300, 1,400, or 1,500 ms (from Methods now on referred to as the first motion sequence). During the very last frame of the motion sequence, a target (a red circle with a diameter of 0.6 dva) was Observers presented at one of three possible target locations: 1608, 1808, or 2008 polar angle offset from the top of the Twenty observers performed the experiment. All annulus, for a single frame (10 ms). Next, the direction observers had normal or corrected-to-normal vision of motion reversed and the annulus continued to rotate and gave informed consent prior to participating. The in the opposite direction for 400 ms, after which it experiment was conducted in accordance with the gradually started to turn gray. The annulus was fully Declaration of Helsinki and was approved by the ethics gray 100 ms later (these 500 ms are from now on committee of the Melbourne School of Psychological referred to as the second motion sequence). This was Sciences. Data from three observers were excluded done to ensure that participants were not distracted by from the analysis because target detection was lower the segments of the annulus when giving their response. than 50%. The remaining observers successfully de- At the end of each trial, observers used a mouse to tected an average of 88% of targets across conditions. report the position where they perceived the target. An image of the target was drawn at the cursor location for both eyes, and moved with the mouse cursor across the Apparatus screen. When observers did not perceive the target, they were instructed to click at the location of the fixation Stimuli were presented on an ASUS ROG Swift dot. PG258Q monitor running at 100 Hz with a resolution of 1920 3 1080 pixels, controlled by a Dell Precision computer. The experiment was presented using MAT- Experimental design LAB (MathWorks, Natick, MA) and Psychtoolbox 3.0.8 extensions (Brainard, 1997). A mirror-stereoscope Although observers perceived the same series of set-up (including chin rest) was placed 50 cm away events on every trial, we used a mirror stereoscope to from the screen. manipulate the information presented to each eye across five different conditions (Figure 1C). (a) In the Binocular condition, all the information (first motion Stimulus sequence, the target, and the second motion sequence) was presented to both eyes; (b) in the monocular All stimuli were presented on a gray background. condition, all information was presented to one eye The stimuli consisted of two annuli (presented one to only; (c) in the interocular condition, both the first and each eye), composed of 16 patches, which showed an second motion sequence were presented to one eye, alternating black and white pattern (see Figure 1A) and while the target was presented to the other eye; (d) in rotated at an angular velocity of 2008 per second. The the Before Reversal condition, the first motion se- annuli had inner and outer radii of 4.3 and 6.1 degrees quence and the target were presented to one eye, while of visual angle (dva) respectively. The two annuli were the second motion sequence was presented to the other viewed through a mirror-stereoscope and fused into a eye; and (e) lastly, in the After Reversal condition, the single percept (Figure 1B). A black square was first motion sequence was presented to one eye, while presented around both annuli throughout the experi- the target and the second motion sequence were ment to assist in maintaining binocular fusion. The presented to other eye. The first motion sequence (and

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937737/ on 01/17/2019 Journal of Vision (2019) 19(1):3, 1–12 van Heusden, Harris, Garrido, & Hogendoorn 6

all consecutive events) occurred in the left and right eye with equal probability. The experiment consisted of nine blocks, with 110 trials in each block. All conditions were randomly interleaved within each block. On 10% of the trials, no target was presented. These trials served as catch-trials (1.8% of which were wrongfully reported).

Results

Observers viewed a flash-grab sequence (motion- flash-motion) in one of five conditions, in each case reporting the perceived position of the flashed target (Figure 2). The strength of the illusion was calculated as the polar angle between the reported position of the target and the target’s real position, with errors in the direction of the second motion sequence (i.e., post- reversal) taken as positive. Mean illusion strength in each condition is plotted in Figure 3, with and without baseline-correcting for variability in the mean strength of the illusion across observers. All statistical analyses Figure 2. Stimulus and procedure. (A) Observers (N ¼ 17) were carried out on the nonbaselined data. viewed a flash-grab sequence consisting of a rotating annulus First, one-sample t tests revealed that mislocalization that unexpectedly reversed its direction of motion. At the was evident in all conditions (all p , 0.001). A repeated reversal, a red target disc was presented at one of three measures analysis of variance subsequently revealed a possible target locations, and observers reported the perceived highly significant effect of condition, F(4, 64) ¼ 10.3, p location of the target after the trial using a mouse. (B) Using a ¼ 1.7 3 106, partial g2 ¼ 0.391. To further interpret the mirror stereoscope, we manipulated the information presented results, we made planned comparisons between the to each eye. (C) Stimuli were presented in five different conditions using paired-samples t tests. Each condition conditions. Binocular condition: All information is presented to was compared to the baseline Binocular condition, and both eyes. Monocular condition: All information is presented to one eye. Interocular condition: The moving annulus is additional planned pairwise comparisons were made presented to one eye, while the target is presented to the other between the three split conditions. The Monocular eye. Before reversal condition: The first motion sequence and condition did not differ from the Binocular condition, the target are presented to one eye, after which the annulus is t(16) 1.15, p 0.26, Cohen’s d 0.28. Conversely, ¼ ¼ ¼ presented to the other eye (rotating in the opposite direction). mislocalization was significantly reduced in the Inter- After reversal condition: The first motion sequence is presented ocular, t(16) 3.43, p 0.003, Cohen’s d 0.83; ¼ ¼ ¼ to one eye, after which both the annulus (rotating in the Before Reversal, t(16) ¼4.67, p ¼ 0.0003, Cohen’s d ¼ opposite direction) and the target are presented in the other 1.13; and After Reversal, t(16) ¼2.27, p ¼ 0.037, eye. Cohen’s d ¼0.55, conditions. Finally, mislocalization did not differ significantly between the Interocular and separate eyes (Before Reversal and After Reversal After Reversal conditions, t(16) ¼1.0, p ¼ 0.31, Cohen’s d ¼0.25, although the Before Reversal conditions). Maximal reduction (the weakest illusion) condition was significantly reduced relative to the After was evident in the Before Reversal condition in which Reversal condition, t(16) ¼3.09, p ¼ 0.007, Cohen’s d the flash was presented to the eye that received the first ¼0.75, and there was a trend suggesting that the motion sequence. Before Reversal condition might also produce less mislocalization than the Interocular condition, t(16) ¼ 1.98, p ¼ 0.066, Cohen’s d ¼0.75. Discussion The strength of the illusion was maximal when the entire stimulus sequence was presented to either one or both eyes. Importantly, the strength of the illusion was We have previously demonstrated that the flash-grab reduced when the motion and the flash were presented effect (Cavanagh & Anstis, 2013) involves neural in separate eyes (Interocular condition) as well as when extrapolation mechanisms that operate very early on in the first and second motion sequences were presented to the visual pathway (Hogendoorn et al., 2015; Van

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937737/ on 01/17/2019 Journal of Vision (2019) 19(1):3, 1–12 van Heusden, Harris, Garrido, & Hogendoorn 7

Figure 3. Results. (A) Mean mislocalization for individual observers in each of the five dichoptic presentation conditions. Each combination of marker shape and color represents an individual observer. Solid black lines indicate means across observers. (B) The same data after baseline-correction (subtracting the overall mean of each observer from each of the conditions for that observer). This reveals that although observers vary widely in the magnitude of the illusion, they all demonstrate a comparable pattern of illusion strength: In the three conditions in which the complete stimulus sequence was not presented to the same eye (Interocular, Before Reversal, and After Reversal conditions), the strength of the illusion was significantly attenuated. Statistical comparisons with paired-sample t tests are unaffected by the baseline correction and are illustrated only in Panel B for clarity. Vertical text indicates comparisons against the Binocular condition.

Heusden et al., 2018). Here, we used dichoptic neurons ‘‘predict’’ their own input through feedback presentation to further narrow down when and where connections to earlier layers, feeding only the prediction these mechanisms operate. Because visual input from the error forward to higher layers (Rao & Ballard, 1999). two eyes does not converge until primary visual cortex, For example, in the visual system, a high-level neuron separating the components of the flash-grab effect might represent a Gabor patch at a given position, allowed us to discriminate whether motion extrapolation spatial frequency, and orientation, and ‘‘predict’’ the takes place at early, monocular stages (possibly retinal local luminance of the lower level neurons (with smaller, or subcortical), or at later binocular stages (V1 and/or simpler receptive fields) that project to it. The lower level beyond). The results reveal that dichoptic presentation neuron receiving the prediction then essentially com- attenuates, but does not eliminate the illusion, indicating pares the ‘‘prediction’’ to its input, and only feeds that extrapolation mechanisms operate in both monoc- forward the deviations from that prediction—i.e., any ular and binocular visual processing. properties of the stimulus not captured, or predicted, by This finding is important for two reasons. Firstly, it the activity of the high-level neuron representing the is consistent with animal work showing predictive Gabor. Conceptually, such a hierarchy would converge mechanisms in monocular parts of the early visual on patterns of connectivity and activation that minimize pathway, including the retina (Berry et al., 1999; total prediction error in the system. This would minimize Hosoya et al., 2005; Schwartz et al., 2007) and LGN metabolic requirements of sensory signaling, while (Sillito et al., 1994). The present results reflect a similar optimizing information-theoretic properties of the net- involvement of predictive mechanisms in humans at work (a principle that has been dubbed the Free Energy both early, monocular stages (e.g., retina or LGN) and Principle; Friston, 2005, 2010). later, binocular stages (e.g., V1 and beyond; Hubel & Importantly, the ‘‘predictions’’ in current models of Wiesel, 1965, 1968). predictive coding are predictive only in the hierarchical Secondly, both monocular and binocular neural sense, but not in the temporal sense of predicting future populations contributed to the effect. This indicates that activity (Bastos et al., 2012; Spratling, 2012, 2017); but extrapolation occurs at multiple hierarchical (rather see Friston (2005) for a discussion of predictive coding than single) processing stages. Extrapolation, or predic- in time, and Garrido, Kilner, Stephan, and Friston tion, at multiple stages of sensory processing is the (2009) for an empirical demonstration applied to central property of hierarchical predictive coding, an expectation and mismatch negativity. However, neural influential theoretical and computational account of transmission delays mean that for any time-variant neural sensory processing (Huang & Rao, 2011; Rao & input (such as visual motion), prediction errors at a Ballard, 1999). In this model, successive layers of given neural population are minimized by the higher

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937737/ on 01/17/2019 Journal of Vision (2019) 19(1):3, 1–12 van Heusden, Harris, Garrido, & Hogendoorn 8

area predicting that population’s future, rather than new motion (Figure 1C). As per the mechanism current, input. In a toy example, Area 1 sends visual proposed by Eagleman and Sejnowski (2007), and information about the position of a moving object to principally consistent with Kwon et al. (2015), this Area 2, which in turn sends a ‘‘prediction’’ back to Area velocity signal biases the perceived position of the 1. That prediction is compared with input in Area 1 and (stationary) target that is briefly flashed superimposed any mismatch error is recursively minimized by on the background. In this sense, the flash-grab effect adjusting the feedback signal to Area 1 to line up with can be thought of as a direct reflection of prediction its input at the time the signal arrives there (for details, error. Two things about this interpretation remain to see Hogendoorn & Burkitt, 2018b). Minimizing error be elucidated. Firstly, even when no flash is presented, therefore requires compensating for the delays incurred the reversal point of the sector edge on which the flash in both feed-forward and feed-back signaling. In the would otherwise be presented still undershoots the true case of visual motion, compensating for these delays physical point. Conversely, the neural representation of can be achieved by extrapolation: Simply multiplying the edge, as measured in the retina (Schwartz et al., the instantaneous velocity of an object by the expected 2007) and proposed here to underlie the flash-grab delay (feed-forward and feedback) yields a spatiotem- effect, does not (Figure 1). The fate of these neural poral prediction which is predictive in both the representations that do not reach awareness remains to hierarchical and temporal sense. Indeed, several au- be elucidated. In a similar vein, this sequence of neural thors have proposed neural mechanisms for motion representations generates a spike in the velocity signal prediction, based on adaptation (Erlhagen, 2003) and (Figure 1C). We argue here that this causes a Bayes-optimal motion-position estimation (Khoei et concurrent flash to be mislocalized, but perhaps this al., 2017; Kwon et al., 2015). However, these have not velocity spike also has other perceptual consequences. been related to the broader hierarchical predictive One interpretation could be that this velocity signal coding framework. actually masks the final section of the position signal Because delays are incurred between each successive that represents the overshoot, comparable to the stage in the processing hierarchy, extrapolation must mechanisms proposed for saccadic suppression during similarly occur at each stage if total prediction error is eye movements (Ibbotson & Cloherty, 2009; Ibbotson, to be minimized. Extrapolation at each stage would Crowder, Cloherty, Price, & Mustari, 2008), but this require information about rate of change (i.e., velocity) remains to be further explored. at each stage. This is consistent with known properties The pattern of illusion strength in the three split of the early visual system: In lower vertebrates, velocity conditions gives some further insight into the mecha- is extracted already in the retina (Amthor & Grzywacz, nisms that are likely to play a role. In the Binocular and 1993), and although the proportion of direction- Monocular conditions, in which the entire stimulus selective retinal ganglion cells in higher vertebrates is sequence is presented within a single eye, the violation reduced (Bach & Hoffmann, 2000), in these animals of the background’s motion direction can be detected direction-selective cells have been reported in the lateral at an early monocular stage. The prediction error geniculate nucleus (Niell, 2013). V1 itself of course also therefore arises early, and is available to influence the represents velocity (Hubel & Wiesel, 1962). Our finding neural representation of the target’s position at both that both monocular and binocular stages in the visual monocular (since the target is presented in the same processing hierarchy carry out extrapolation is there- eye) and binocular stages. This results in maximal fore anatomically plausible, and consistent with a prediction error (evident as maximal illusion strength, version of hierarchical predictive coding that takes into Figure 3). In the Interocular condition, the violation account neural transmission delays. can be detected monocularly, but with the target being Resulting as it does from an unexpected reversal of a presented to the other eye, the target’s representation moving background pattern, the flash-grab illusion is can only be influenced when the monocular channels thought to occur due to a violation of expected motion converge at a later binocular stage, thereby reducing (Cavanagh & Anstis, 2013; Hogendoorn et al., 2015). the magnitude of the illusion. In the two other split In the predictive coding framework, this amounts to a conditions (Before Reversal and After Reversal), prediction error: A higher level area extrapolates the because the two motion sequences are presented to position of the moving background, but by the time different eyes, the violation is only detected at a later that predictive signal arrives at the lower level area, the binocular stage. Because receptive fields are generally stimulus has reversed and the prediction (having been larger further down the hierarchy, such that the extrapolated in the initial direction, as indicated discrepant extrapolated and actual positions of the schematically in Figure 1) is very far from the new target are more likely to fall within the same cell’s input. The resulting prediction error means that the receptive field as one looks further down the hierarchy, represented position subsequently shifts rapidly over a given violation might be expected to yield a smaller time, yielding a spike in velocity in the direction of the error further down the hierarchy. Consistent with this,

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937737/ on 01/17/2019 Journal of Vision (2019) 19(1):3, 1–12 van Heusden, Harris, Garrido, & Hogendoorn 9

in our results, the illusion is attenuated in the two split the monocular and binocular presentations did not. conditions (paired-samples t test of the two split Nevertheless, we do not believe this is a significant conditions averaged together vs. binocular; t(16) ¼ 4.3, confound for a number of reasons. Firstly, a binocu- p , 0.001, Cohen’s d ¼ 0.96. Finally, the illusion is larly presented fixation point and large, high-contrast more strongly reduced in the Before Reversal than squares around the stimulus were presented to assist After Reversal condition, t(16) ¼ 3.1, p ¼ 0.007, with binocular fusion, and during debrief none of the Cohen’s d ¼ 0.75. This is a consequence of the so-called observers reported any difficulty with fusion. Further- Frohlich effect (Kerzel, 2010), in which the onset more, difficulty with binocular fusion mostly occurs position of a moving object is shifted in the direction of when conflicting high-contrast stimuli are presented to that object’s subsequent trajectory. In the After each eye, which was never the case in our stimulus Reversal condition, the second motion sequence does sequences. Instead, individual stimulus components not violate a monocular motion prediction per se, but it were presented to one eye in the absence of any contrast still generates a monocular error signal due to the energy in the other eye, a situation which does not motion onset. In the Before Reversal condition, the negatively affect fusion (Alais & Blake, 2004). Finally, same prediction error arises in the opposite eye to the it is not clear how problems with binocular fusion target. Because this can only influence the target’s would explain the observed differences between the position at the later binocular stage, this again leads to different dichoptic conditions. Nevertheless, we cannot a smaller mislocalization illusion. entirely rule out the possibility that differences in The flash-grab effect has alternatively been explained binocular fusion may have played a role in our in terms of trajectory shortening (Cavanagh & Anstis, experimental conditions. 2013). It has been reported that the perceived trajectory In sum, we have used a dichoptic version of the flash- of an object that reverses its direction is shortened grab effect to study the monocular and binocular (Sinico, Parovel, Casco, & Anstis, 2009), and that the contributions to motion extrapolation in human visual perceived shift in the endpoint of the trajectory is linked motion perception. The results reveal that extrapola- to the position shift induced by the flash-grab effect tion mechanisms operate at both monocular and (Cavanagh & Anstis, 2013). The trajectory-shortening binocular processing stages—a finding that is consistent explanation is formulated at a more abstract, compu- with an extension of the hierarchical predictive coding tational level of description and therefore cannot offer framework that accounts for neural transmission any insight about how the effects should vary under delays. The results further suggest that prediction monocular, binocular, or dichoptic presentation. Cav- errors in this framework can manifest directly as anagh and Anstis (2013) argued that the flash-grab perceptual illusions. effect critically depends on attention, which might be Keywords: motion extrapolation, prediction, taken as corresponding to a late, presumably binocular predictive coding neural locus. As the available information at these stages would not be affected by dichoptic presentation, this interpretation in itself therefore does not provide a parsimonious explanation why the illusion would be Acknowledgments attenuated in these conditions. The proposition that motion and position signals EvH and HH were supported by the Australian interact already in monocular channels is consistent Government through the Australian Research with a recent report studying motion-induced position Council’s Discovery Projects funding scheme (project shifts (the illusory displacement of the envelope of a DP180102268). MG acknowledges a University of Gabor patch when its carrier wave is moving; De Queensland Fellowship (2016000071). Valois & De Valois, 1991). Hisakata, Hayashi, and Murakami (2016) observed that this illusory displace- Commercial relationships: none. ment is observed even when the carrier and the Corresponding author: Hinze Hogendoorn. envelope are presented at widely divergent disparities, Email: [email protected]. suggesting a disparity-insensitive monocular mecha- Address: Melbourne School of Psychological Sciences, nism. They further showed that (as previously reported The University of Melbourne, Parkville VIC, Australia. by Anstis, 1989) when illusory displacements are induced in the two eyes, the resulting illusory disparity yields an illusory depth percept, further supporting the involvement of motion-position interactions at mon- References ocular processing stages. One limitation of the current study is that the three Alais, D., & Blake, R. (2004). Binocular rivalry. dichoptic conditions require binocular fusion, whereas Cambridge, MA: MIT Press.

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937737/ on 01/17/2019 Journal of Vision (2019) 19(1):3, 1–12 van Heusden, Harris, Garrido, & Hogendoorn 10

Amthor, F. R., & Grzywacz, N. M. (1993). Directional signals bias localization judgments: A unified selectivity in vertebrate retinal ganglion cells. explanation for the flash-lag, flash-drag, flash- Reviews of Oculomotor Research, 5, 79–100. Re- jump, and Frohlich illusions. Journal of Vision, trieved from http://www.ncbi.nlm.nih.gov/ 7(4):3, 1–12, https://doi.org/10.1167/7.4.3. pubmed/8420563 [PubMed] [Article] Anstis, S. M. (1989). Kinetic edges become displaced, Erlhagen, W. (2003). Internal models for visual segregated, and invisible. In D. Lam (Ed.), Neural perception. Biological Cybernetics, 88(5), 409–417, mechanisms of (Vol. 2, pp. 247– https://doi.org/10.1007/s00422-002-0387-1. 260). Houston, TX: Gulf Publishing. Friston, K. (2005). A theory of cortical responses. Bach, M., & Hoffmann, M. B. (2000). Visual motion Philosophical Transactions of the Royal Society B: detection in man is governed by non-retinal Biological Sciences, 360(1456), 815–836, https://doi. mechanisms. Vision Research, 40(18), 2379–2385, org/10.1098/rstb.2005.1622. https://doi.org/10.1016/S0042-6989(00)00106-1. Friston, K. (2010). The free-energy principle: A unified Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, brain theory? Nature Reviews Neuroscience, 11(2), G. R., Fries, P., & Friston, K. J. (2012, November 127–138, https://doi.org/10.1038/nrn2787. 21). Canonical microcircuits for predictive coding. Garrido, M. I., Kilner, J. M., Stephan, K. E., & Neuron, 76(4), 695–711. NIH Public Access, https:// Friston, K. J. (2009). The mismatch negativity: A doi.org/10.1016/j.neuron.2012.10.038. review of underlying mechanisms. Clinical Neuro- Berry, M. J., Brivanlou, I. H., Jordan, T. A., & Meister, physiology: Official Journal of the International M. (1999, March 25). Anticipation of moving Federation of Clinical Neurophysiology, 120(3), stimuli by the retina. Nature, 398(6725), 334–338, 453–463, https://doi.org/10.1016/j.clinph.2008.11. https://doi.org/10.1038/18678. 029. Brainard, D. H. (1997). The Psychophysics Toolbox. Hisakata, R., Hayashi, D., & Murakami, I. (2016). Spatial Vision, 10(4), 433–436. Retrieved from Motion-induced position shift in stereoscopic and http://www.ncbi.nlm.nih.gov/pubmed/9176952 dichoptic viewing. Journal of Vision, 16(13) 3, 1–13, https://doi.org/10.1167/16.13.3. [PubMed] [Article] Brenner, E., Smeets, J. B., & de Lussanet, M. H. (1998). Hitting moving targets. Continuous control of the Hogendoorn, H., & Burkitt, A. N. (2018a). Predictive acceleration of the hand on the basis of the target’s coding of visual object position ahead of moving velocity. Experimental Brain Research, 122(4), 467– objects revealed by time-resolved EEG decoding. 474. Retrieved from http://www.ncbi.nlm.nih.gov/ NeuroImage, 171, 55–61, https://doi.org/10.1016/j. pubmed/9827866 neuroimage.2017.12.063. Cai, R. H., & Schlag, J. (2001). Asynchronous feature Hogendoorn, H., & Burkitt, A. N. (2018b). Predictive binding and the flash-lag illusion. Investigative coding with neural transmission delays: A real-time temporal alignment hypothesis. BioRxiv, https:// Ophthalmology and Visual Science, 42, S711. doi.org/10.1101/453183. Cavanagh, P., & Anstis, S. M. (2013). The flash grab Hogendoorn, H., Verstraten, F. A. J., & Cavanagh, P. effect. Vision Research, 91, 8–20, https://doi.org/10. (2015). Strikingly rapid neural basis of motion- 1016/j.visres.2013.07.007. induced position shifts revealed by high temporal- De Valois, R. L., & De Valois, K. K. (1991). Vernier resolution EEG pattern classification. Vision Re- acuity with stationary moving Gabors. Vision search, 113(Part A), 1–10, https://doi.org/10.1016/j. Research, 31(9), 1619–1626, https://doi.org/10. visres.2015.05.005. 1016/0042-6989(91)90138-U. Hosoya, T., Baccus, S. A., & Meister, M. (2005, July 7). Eagleman, D. M. (2008, April 14). Prediction and Dynamic predictive coding by the retina. Nature, postdiction: Two frameworks with the goal of 436(7047), 71–77, https://doi.org/10.1038/ delay compensation. Behavioral and Brain Sci- nature03689. ences, 31(2), 205–206. https://doi.org/10.1017/ Huang, Y., & Rao, R. P. N. (2011). Predictive coding. S0140525X08003889. Wiley Interdisciplinary Reviews: Cognitive Science, Eagleman, D. M., & Sejnowski, T. J. (2000, March 17). 2(5), 580–593, https://doi.org/10.1002/wcs.142. Motion integration and postdiction in visual Hubel, D. H., & Wiesel, T. (1962). Receptive fields, awareness. Science, 287(5460), 2036–2038. Re- binocular interaction and functional architecture in trieved from http://www.ncbi.nlm.nih.gov/ the cat’s visual cortex. The Journal of Physiology, pubmed/10720334 160(1), 106–154. Retrieved from http://www.ncbi. Eagleman, D. M., & Sejnowski, T. J. (2007). Motion nlm.nih.gov/pubmed/14449617

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937737/ on 01/17/2019 Journal of Vision (2019) 19(1):3, 1–12 van Heusden, Harris, Garrido, & Hogendoorn 11

Hubel, D. H., & Wiesel, T. (1965). Binocultar Sciences, USA, 112(26), 8142–8147, https://doi.org/ interaction in striate cortex of kittens reared with 10.1073/pnas.1500361112. artificial squint. Journal of Neurophysiology, 28(6), Lotter, W., Kreiman, G., & Cox, D. (2018). A neural 1041–1059, https://doi.org/10.1152/jn.1965.28.6. network trained to predict future video frames 1041. mimics critical properties of biological neuronal Hubel, D. H., & Wiesel, T. (1968). Receptive fields and responses and perception, 1–18. Retrieved from functional architecture of monkey striate cortex. http://arxiv.org/abs/1805.10734 Journal of Physiology, 195(1), 215–243, https://doi. Maus, G. W., & Nijhawan, R. (2008). Motion org/papers://47831562-1F78-4B52-B52E- extrapolation into the blind spot. Psychological 78BF7F97A700/Paper/p352. Science, 19(11), 1087–1091, https://doi.org/10.1111/ Ibbotson, M. R., & Cloherty, S. L. (2009). Visual j.1467-9280.2008.02205.x. perception: Saccadic omission—suppression or Niell, C. M. (2013). Vision: More than expected in the temporal masking? Current Biology, 19(12), R493– early visual system. Current Biology, 23(16), R681– R496, https://doi.org/10.1016/j.cub.2009.05.010. R684, https://doi.org/10.1016/J.CUB.2013.07.049. Ibbotson, M. R., Crowder, N. A., Cloherty, S. L., Nijhawan, R. (1994, July 28). Motion extrapolation in Price, N. S. C., & Mustari, M. J. (2008). Saccadic catching. Nature, 370(6487), 256–257, https://doi. modulation of neural responses: Possible roles in org/10.1038/370256b0. saccadic suppression, enhancement, and time com- pression. Journal of Neuroscience, 28(43), 10952– Nijhawan, R. (2008). Visual prediction: Psychophysics 10960, https://doi.org/10.1523/JNEUROSCI.3950- and neurophysiology of compensation for time 08.2008. delays. Behavioral and Brain Sciences, 31(2), 179– 239, https://doi.org/10.1017/S0140525X08003804. Jancke, D., Erlhagen, W., Scho¨ner, G., & Dinse, H. R. (2004). Shorter latencies for motion trajectories Parker, A. J. (2007). Binocular depth perception and than for flashes in population responses of cat the cerebral cortex. Nature Reviews Neuroscience, primary visual cortex. Journal of Physiology, 8(5), 379–391, https://doi.org/10.1038/nrn2131. 556(3), 971–982, https://doi.org/10.1113/jphysiol. Patel, S. S., Ogmen, H., Bedell, H. E., & Sampath, V. 2003.058941. (2000, November 10). Flash-lag effect: Differential Kerzel, D. (2010). The Fro¨hlich effect: Past and latency, not postdiction. Science, 290(5494),1051. present. In R. Nijhawan & B. Khurana (Eds.), Retrieved from http://www.ncbi.nlm.nih.gov/ Space and time in perception and action (pp. 321– pubmed/11184992. 337). Cambridge, UK: Cambridge University Press, Ramachandran, V. S., & Anstis, S. M. (1990). Illusory https://doi.org/10.1017/CBO9780511750540.019. displacement of equiluminous kinetic edges. Per- Kerzel, D., & Gegenfurtner, K. R. (2003). Neuronal ception, 19(5), 611–616, https://doi.org/10.1068/ processing delays are compensated in the sensori- p190611. motor branch of the visual system. Current Biology, Rao, R. P. N., & Ballard, D. H. (1999). Predictive 13(22), 1975–1978, https://doi.org/10.1016/j.cub. coding in the visual cortex: A functional interpre- 2003.10.054. tation of some extra-classical receptive-field effects. Khoei, M. A., Masson, G. S., & Perrinet, L. U. (2017). Nature Neuroscience, 2(1), 79–87, https://doi.org/ The flash-lag effect as a motion-based predictive 10.1038/4580. shift. PLOS Computational Biology, 13(1), Schwartz, G., Taylor, S., Fisher, C., Harris, R., & e1005068, https://doi.org/10.1371/journal.pcbi. Berry, M. J., II. (2007). Synchronized firing among 1005068. retinal ganglion cells signals motion reversal. Krekelberg, B. (2000, August 18). The position of Neuron, 55(6), 958–969, https://doi.org/10.1016/j. moving objects. Science, 289(5482), 1107a–1107, neuron.2007.07.042. https://doi.org/10.1126/science.289.5482.1107a. Sillito, A. M., Jones, H. E., Gerstein, G. L., & West, D. Krekelberg, B., & Lappe, M. (2001). Neuronal latencies C. (1994). Feature-linked synchronization of tha- and the position of moving objects. Trends in lamic relay cell firing induced by feedback from the Neurosciences, 24(6), 335–339, https://doi.org/10. visual cortex. Nature, 369(6480), 479–482, https:// 1016/S0166-2236(00)01795-1. doi.org/10.1038/369479a0. Kwon, O.-S., Tadin, D., & Knill, D. C. (2015). Sinico, M., Parovel, G., Casco, C., & Anstis, S. (2009). Unifying account of visual motion and position Perceived shrinkage of motion paths. Journal of perception. Proceedings of the National Academy of Experimental Psychology: Human Perception and

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937737/ on 01/17/2019 Journal of Vision (2019) 19(1):3, 1–12 van Heusden, Harris, Garrido, & Hogendoorn 12

Performance, 35(4), 948–957, https://doi.org/10. Van Heusden, E., Rolfs, M., Cavanagh, P., & 1037/a0014257. Hogendoorn, H. (2018). Motion extrapolation for Spratling, M. W. (2012). Predictive coding accounts for eye movements predicts perceived motion-induced V1 response properties recorded using reverse position shifts. Journal of Neuroscience, 38(38), correlation. Biological Cybernetics, 106(1), 37–49, 8243–8250, https://doi.org/10.1523/JNEUROSCI. https://doi.org/10.1007/s00422-012-0477-7. 0736-18.2018. Spratling, M. W. (2017). A review of predictive coding Whitney, D., & Cavanagh, P. (2000). Motion distorts algorithms. Brain and Cognition, 112, 92–97, visual space: Shifting the perceived position of https://doi.org/10.1016/j.bandc.2015.11.003. remote stationary objects. Nature Neuroscience, Sundberg, K. A., Fallah, M., & Reynolds, J. H. (2006). 3(9), 954–959, https://doi.org/10.1038/78878. A motion-dependent distortion of retinotopy in Whitney, D., & Murakami, I. (1998). Latency differ- area V4. Neuron, 49(3), 447–457, https://doi.org/10. ence, not spatial extrapolation. Nature Neurosci- 1016/j.neuron.2005.12.023. ence, 1(8), 656–657, https://doi.org/10.1038/3659.

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937737/ on 01/17/2019