Sensory Cue Integration

SECTION I

Introduction to Section I: Theory and Fundamentals

The chapters in Section I formalize the com- evidence necessary to test the predictions arising putational problems that need to be solved for from these models. Wei and Körding focus successful cue combination. They focus on the on ideal-observer models for cases where it issue of uncertainty and the Bayesian ways of is not a priori certain that the cues belong solving such problems. Through the sequence together. In this particular case, the nervous of chapters in Section I, the reader will get a system needs to determine whether the cues thorough overview and introduction into the belong together or, in other words, to determine current Bayesian formalization of their causal relationship. Battaglia, Kersten, and combination. Schrater focus on object . They ask This section highlights the fundamental sim- how the kinds of causal knowledge we have ilarity between seemingly distinct problems of about the way objects are made can be used cue combination. The computational objectives to constrain estimates of attributes such as and the algorithms that can be used to find the shape and size. Vijayakumar, Hospedales, and optimal solution do not depend on the modality Haith extend these ideas to model cue com- or kind of cue that is considered. The solutions bination for sensorimotor integration. This is to progressively more complicated problems complicated because sensorimotor tasks depend are largely derived using essentially the same on a set of variables that must be estimated statistical techniques. simultaneously. Lastly, Sahani and Whiteley The first five chapters of Section I are extend these concepts to cue integration in concerned with Bayesian theories of optimal cluttered environments. They point out that cue combination. The book starts with an complicated visual scenes are a special case introductory chapter by Landy, Banks, and Knill of situations in which we are uncertain about that gives an overview of the basic philosophy the causal relationship between cues. These and the mathematics that is used to calculate chapters use a coherent probabilistic language how an ideal observer would combine cues. to develop methods appropriate for a wide range These models form the backbone of much of of problems. the cue-combination research presented in this The last three chapters of Section I highlight book. The following computational chapters limitations of the standard ideas of probabilistic provide more detailed insights into the basic cue combination. They point out interesting techniques, computational tools, and behavioral ways in which observers fall short of

3 4 THEORY AND FUNDAMENTALS optimal predictions. Backus explains how novel scale factor) and yet, perhaps surprisingly, is cues can be recruited for perception. In the closely related to the Bayesian weighted-linear next chapter of the section, Domini and Caudek model discussed in Chapter 1. Finally, Rosas propose an alternative to the Bayesian ideal and Wichmann discuss limits of sensory cue observer for combining cues to depth. The integration, suggesting that simple ideas of simple algorithm they present results in an affine optimality may be inappropriate in a complex estimate of depth (i.e., depth up to an unknown world. CHAPTER 1 Ideal-Observer Models of Cue Integration

Michael S. Landy, Martin S. Banks, and David C. Knill

When an organism estimates a property of the plan (and a fumbled grasp) or incorrect motor environment so as to make a decision (“Do I flee decisions (and a risky descent possibly leading or do I fight?”) or plan an action (“How do I grab to a fall). Thus, estimation accuracy can be very that salt shaker without tipping my wine glass important, so the observer should use all sources along the way?”), there are typically multiple of information effectively. sources of information (signals or “cues”) that The sensory information available to an are useful. These may include different features observer may come in the form of multiple of the input from one , such as vision, where visual cues (the pattern of , a variety of cues—texture, motion, binocular linear perspective and foreshortening, shading, disparity, and so forth—aid the estimation etc.) as well as haptic cues ( the surface of the three-dimensional (3D) layout of the with the hand, testing the slope with a foot). If environment and shapes of objects within it. one of the cues always provided the observer Information may also derive from multiple with a perfect estimate, there would be no such as visual and haptic information need to incorporate information from other about object size, or visual and auditory cues cues. But cues are often imperfectly related to about the location of a sound. In most cases, environmental properties because of variability the organism can make more accurate estimates in the mapping between the cue value and a of environmental properties or more beneficial given property and because of errors in the decisions by integrating these multiple sources of nervous system’s measurement of the cue value. information. In this chapter, we review models of Thus, measured cue values will vary somewhat cue integration and discuss benefits and possible unpredictably across viewing conditions and pitfalls in applying these ideas to models of scenes. For example, provides more behavior. accurate estimates of surface orientation for Consider the problem of estimating the 3D near than for far surfaces. This is due to orientation (i.e., slant and tilt) of a smooth the geometry underlying binocular disparity: A surface (Hillis, Ernst, Banks, & Landy, 2002; small amount of measurement error translates Hillis, Watt, Landy, & Banks, 2004; Knill into a larger depth error at long distances than & Saunders, 2003; Rosas, Wagemans, Ernst, at short ones. In addition, estimates may be & Wichmann, 2005). An estimate of surface based on assumptions about the scene and will orientation is useful for guiding a variety of be flawed if those assumptions are invalid. For actions, ranging from reaching for and grasping example, the use of texture perspective cues is an object (Knill, 2005) to judging whether generally based on the assumption that texture one can safely walk or crawl down an incline is homogeneously distributed across the surface, (Adolph, 1997). Errors in the estimate may so estimates based on this assumption will lead to failures of execution of the motor be incorrect if the texture itself varies across

5 6 THEORY AND FUNDAMENTALS the surface. For example, viewing a frontoparallel organism has evolved mechanisms that utilize photograph of a slanted, textured surface could the available information optimally. Therefore, yield the erroneous estimate that the photograph the hypothesis that sensory information is used is slanted. Unlike stereopsis, the reliability of optimally in tasks that are important to the texture perspective as a cue to surface orientation organism is a reasonable starting point. Indeed, does not diminish with viewing distance. given the efficacy of natural selection and Because of this uncertain relationship developmental learning mechanisms, it seems between a cue measurement and the environ- unlikely to us that the nervous system would mental property to be estimated, the observer perform suboptimally in an important task with can generally improve the reliability of an stimuli that are good exemplars of the natural estimate of an environmental property by environment (as opposed to impoverished or combining multiple cues in a rational fashion. unusual stimuli that are only encountered in The combination rule needs to take into account the laboratory). Third, using optimality as a the uncertainties associated with the individual starting point, the observation of suboptimal cues, and those depend on many factors. behavior can be particularly informative. It can Along with the benefit of improving the indicate flaws in our characterization of the reliability of perceptual estimates, there is also a perceptual problem posed to or solved by the clear benefit of knowing how uncertain the final observer; for example, it could indicate that estimate is and how to make decisions given that the perceptual system is optimized for tasks uncertainty. Consider, for example, estimating other than one we have studied or that the the distance to a precipitous drop-off (Maloney, assumptions made in our formulation of an 2002). An observer can estimate that distance ideal-observer model fail to capture the problem most reliably by using all available cues, but posed to observers in naturalistic situations. Of knowing the uncertainty of that estimate can course, there remains the possibility that we have be crucial for guiding future behavior. If the characterized the sensory information and the future task is to toss a ball as close as possible task correctly, but the nervous system simply has to the drop-off, one would use the most likely not developed the mechanisms for performing distance estimate to plan the toss; the plan would optimally (e.g., Domini & Braunstein, 1998; be unaffected by the uncertainty of the distance Todd, 2004). We expect that such occurrences estimate. If, however, the future task is to walk are rare, but emerging scientific investigations blindfolded toward the drop-off, the decision of will ultimately determine this. how far to meander toward the drop would most In this way, “ideal-observer” analysis is a certainly be influenced by the uncertainty of the critical step in the iterative scientific process of distance estimate. studying perceptual computations. At perhaps Much of the research in this area has focused a deeper level, ideal-observer models help us on the question of whether cue integration to understand the computational structure of is optimal. This focus has been fruitful for a what are generally complex problems posed to variety of reasons. First, to determine whether observers. This can in turn lead to understanding the nervous system is performing optimally complex behavior patterns by relating them to requires a clear, quantitative specification of the the features of the problems from which they task, the , and the relationship between arise (e.g., statistics of natural environments the stimulus and the specified environmental or noise characteristics of sensory systems). property. As Gibson (1966) argued, it forces the Ideal-observer models provide a framework for researcher to investigate and define the infor- constructing quantitative, predictive accounts mation available for the task. As Marr (1982) of perceptual performance at Marr’s computa- put it, it forces one to construct a quantitative, tional level for describing the brain (Marr, 1982). predictive account of perceptual performance. Several studies have found that Second, for tasks that have been important combine sensory signals in an optimal fashion, for survival, it seems quite plausible that the taking into account the variation of cue IDEAL-OBSERVER MODELS OF CUE INTEGRATION 7 reliability with viewing conditions, and resulting of μ is a weighted average in estimates with maximum reliability (e.g., Alais & Burr, 2004; Ernst & Banks, 2002; n xˆ = w x , (1.1) Hillis et al., 2004; Knill & Saunders, 2003; i i i=1 Landy & Kojima, 2001; Tassinari, Hudson, & Landy, 2006). These results suggest that human where the weight wi of cue i is proportional observers are optimal for a wide variety of to that cue’s reliability ri (defined as its inverse perceptual and sensorimotor tasks. = /σ 2 variance, ri 1 i ): This chapter is intended to provide a general r introduction to the field of cue combination = i wi n (1.2) from the perspective of optimal cue integration. rj We work through a number of qualitatively j=1 different problems, and we hope thereby to illustrate how building ideal observers helps (Cochran, 1937). The reliability r of this formulate the scientific questions that need integrated estimate is to be answered before we can understand n how the brain solves these problems. We = . r ri (1.3) begin with a simple example of integration i=1 leading to a linear model of cue integration. This is followed by a summary of a general As a result, the variance of the integrated approach to optimality: Bayesian estimation and estimate is generally lower than the variance decision theory. We then review situations in of the individual estimates and never worse which realistic generative models of sensory than the least variable of them. Thus, if an data lead to nonlinear ideal-observer models. observer has access to unbiased estimates of a Subsequent sections review empirical studies of particular world property from each cue, and the cue combination and issues they raise, as well as cues are Gaussian distributed and conditionally open questions in the field. independent (meaning that for a given value of the world property being estimated, errors in the estimates derived from each cue are independent), the minimum-variance estimate LINEAR MODELS FOR is a weighted average of the individual estimates MAXIMUM RELIABILITY from each cue (Landy, Maloney, Johnston, & There is a wide variety of approaches to cue Young, 1995; Maloney & Landy, 1989). integration. The specific approach depends on To form this estimate, an observer needs to the assumptions the modeler makes about represent and compute with estimates of cue the sources of uncertainty in sensory signals uncertainty. The estimates could be implicit as well as what the observer is trying to in the neural population code derived from optimize. Quantitative empirical evidence can the sensory features associated with a cue or then determine whether those assumptions might be explicitly computed, for example, by are valid. measuring the stability of each cue’s estimates The simplest such models result in linear cue over repeated views of the scene. They could integration. For the case of Gaussian noise, linear also be assessed online by using ancillary cue integration is optimal for an observer who information (viewing distance, amount of self- tries to maximize the precision (i.e., minimize motion, etc.) that impacts cue reliability (Landy the variance) of the estimate made based on the et al., 1995). Estimates of reliability need not = , ··· , cues. Suppose you have samples xi , i 1 n, be explicitly represented by the nervous system, of n independent, Gaussian random variables Xi but they might be implicit in the form of the that share a common mean μ and have variances neural population code (Ma, Beck, Latham, & σ 2 i . The minimum-variance unbiased estimator Pouget, 2006). 8 THEORY AND FUNDAMENTALS

If the variability in different cue estimates narrowly concentrated around one value of s, is correlated, the minimum-variance unbiased it represents reliable data; if broad, it represents estimator will not necessarily be a weighted unreliable data. If it is narrow in one dimension average. For some distributions, it is a nonlinear and broad in others, it reflects a situation in function of the individual estimates; for others, which the information provided by d reliably including the Gaussian, it is still a weighted aver- determines s along the narrow dimension but age, but the weights take into account the covari- does not along the other dimensions. ance of the cues (Oruç, Maloney, & Landy, 2003). Bayes’ rule (Eq. 1.4) shows how to compute the posterior distribution from prior knowledge about the statistics of s—represented by the BAYESIAN ESTIMATION AND prior distribution P(s) (that is, which values DECISION MAKING of s are more likely in the environment than The linear model has dominated cue-integration others)—and knowledge about how likely scenes research and has provided important insights with different values of s are to give rise to the into human perceptual and sensorimotor pro- observed data d, which is represented by the cessing. However, most perceptual and senso- likelihood function P(d|s). Because d is given, rimotor problems encountered by an observer the likelihood is a function of the conditioning in the natural world cannot be accurately variable and does not behave like a probability characterized by the linear model. In many cases, distribution (i.e., it need not integrate to one), the linear model provides a reasonable “local” and hence it is often notated as L(s|d). The third approximation to the ideal observer. In those term in the denominator, P(d), is a constant, cases, the complexity of the problem is reduced normalizing term (so that the full expression in the laboratory setting and this reduction integrates to one), and it can generally be is assumed to be known by the observer. To ignored in formulating an estimation procedure. characterize the complex problems presented From a computational point of view, if one to observers in the real world, a more general has a good “generative” model for how the computational framework is needed. Bayesian data are generated by different scenes (e.g., the decision theory provides such a framework. geometry of disparity and the noise associated with measuring disparity) and a good model of Priors, Likelihoods, and Posteriors the statistics of scenes, one can use Bayes’ rule | In the Bayesian framework, the information to compute the posterior distribution, P(s d), provided by sensory information to estimate a and hence derive a full representation of the scene property or make a decision related to information provided by some observed sensory that property is represented by a “posterior” data about scene properties of interest. probability distribution Gain/Loss Functions, Estimation, and Decision Making P(d|s)P(s) P(s|d) = , (1.4) P(d) Having computed the posterior distribution, the Bayesian decision maker next chooses a course where s represents the scene property or proper- of action. For that action to be optimal, one ties of interest (possibly multidimensional) and must have a definition of optimality. That is, d is a vector of sensory data. In this formulation, one must have a loss function that defines the it is important to delineate what is known by the consequences of the decision maker’s action. observer from what is unknown. The data, d, are Optimality is defined as making decisions that given to and therefore known by the observer. minimize expected loss. For Bayesian estimation, The scene properties, s, are unknown. The the observer’s task is to choose an estimate, probability distribution, P(s|d), represents the and often the loss is defined as a function of probabilities of different values of s being “true,” estimation error (the difference between the given the observed data. If the distribution is chosen estimate and the true value of the IDEAL-OBSERVER MODELS OF CUE INTEGRATION 9 parameter in the world). In other situations, is, choosing the mode of P(d|s) over possible the observer makes a discrete decision (e.g., scenes s. If the prior distribution is not uniform, categorizing the stimulus as signal or noise) or the optimal method is maximum a posteriori forms a motor plan (e.g., a planned trajectory to (MAP) estimation, that is, choosing the mode grasp an observed object). of the posterior P(s|d). If the gain function is Bayesian decision theory prescribes the not a delta function, but treats estimation  errors optimal choice of action based on several symmetrically (i.e., is a function of sˆ − s, where ingredients. First, one needs a model of the sˆ is the estimate and s is the true value in the environment: that is, a set of possible states of scene), the optimal estimation procedure corre- the world or scenes and a prior distribution sponds to first convolving the gain function with across them (random variable S with prior the posterior distribution, and then choosing distribution P(s)). This world leads to noisy the estimate corresponding to the peak of that sensory data d conditioned on a particular state function. For example, the oft-used squared- of the world (with distribution P(d|s)). The task error loss function leads the optimal observer of the observer is to choose an optimal action to use a minimum-variance criterion and hence a(d), which might be an estimate of a scene the mean of the posterior as the estimate. parameter, a button press in an experiment, or a plan for movement in a visuomotor task. Bayesian Decision Theory and For experimental tasks or estimation, the action Cue Integration is the final output of the decision-making task The most straightforward application of upon which gain or loss is based. In other Bayesian decision theory to cue integration situations, like visuomotor control, the out- involves the case in which the sensory data come itself—the executed movement—may be associated with each cue are conditionally stochastic. So we distinguish the outcome of the independent. In that case, we can write the plan (e.g., the movement trajectory) t as distinct likelihood function for all of the data as the from the selected action a(d) (with distribution product of likelihood functions for the data | P(t a(d))). The final ingredient is typically called associated with each cue, the loss function, although we also use the negative of loss, or gain g(t, s). Note that g is n , ··· , | = | , a function only of the actual scene s and actual P(d1 dn s) P(di s) (1.6) outcome of the decision t. An optimal choice of i=1 action is one that maximizes expected gain where di is a data vector representing the sensory data associated with cue i (e.g., disparity for the = , aopt arg max EG(a) where stereo cue) and s is the scene variable being a  estimated. Combining Eqs. 1.4 and 1.6, we have EG(a) = g(t, s)P(t|a(d))P(d|s) n P(s|d , ··· , d ) ∝ P(s) P(d |s), (1.7) P(s)dt dd ds. (1.5) 1 n i i=1

It is worth reviewing some special cases of where we have dropped the constant denomina- this general method. For estimation, the final tor term for simplicity. output is the estimate itself (t ≡ a(d)). If the If the individual likelihood functions and the prior distribution is uniform over the domain prior distribution are Gaussian, with variances = σ 2 σ 2 of interest (P(s) c, a constant) and the i and prior, then the posterior distribution will gain function only rewards perfectly correct be Gaussian with mean and variance identical estimates (a delta function centered on the to the minimum-variance estimate; that is, correct value of the parameter), then Eq. 1.5 for Gaussian distributions, the MAP estimate results in maximum-likelihood estimation: that and the mean of the posterior both yield a 10 THEORY AND FUNDAMENTALS linear estimation procedure identical to that inappropriate. One such situation is when the of the minimum-variance unbiased estimator information provided by two cues interacts expressed in Eqs. 1.1–1.3. If the prior distribution because each cue disambiguates scene or viewing is flat or significantly broader than the likelihood variables that the other cue requires to determine function, the posterior is simply the product of a scene property. Another problem is that the individual cue likelihoods and the mode and information provided by many cues, particularly mean correspond to the maximum-likelihood visual depth cues, depends on what prior estimate of s. If the Gaussian assumption assumptions about the world hold true and holds, but the data associated with the different cues can interact by jointly determining the cues are not conditionally independent, the appropriate world model. A special case of this MAP estimate will remain linear, but the cue is a situation in which an observer has to decide weights have to take into account the covariance whether different sensory cues should or should structure of the data, resulting in the same not be combined into one estimate at all. weighted linear combination as the minimum- Cue Disambiguation variance, unbiased estimate (Oruç et al., 2003). While a linear system can characterize the The raw sensory data from different cues are optimal estimator when the estimates are Gaus- often incommensurate in the sense that they sian distributed and conditionally independent, specify a scene property in different coordinate the Bayesian formulation offers an equally frames of reference. For example, auditory cues simple, but much more general formulation. provide information about the location of a In essence, it replaces the averaging of esti- sound source in head-centered coordinates, mates with the combining of information as whereas visual cues provide information in represented by multiplying likelihood functions retinal coordinates. To apply a linear scheme and priors. It also replaces the notion of for combining these sources of information, perceptual estimates as point representations one would first need to use an estimate of (single, specific values) with a notion of gaze direction relative to the head to convert perceptual estimates as probability distributions. visual position estimates to head-centered This allows one to separate information (as coordinates or auditory position estimates to represented by the posterior distribution) from retinal coordinates, so that the two location the task, as represented by a gain function. estimates are in the same coordinates. Similarly, Figure 1.1 illustrates Bayesian integration in visual depth estimates based on relative motion two simple cases: estimation of a scalar variable should theoretically be scaled by an estimate of (size) from a pair of cues (visual and haptic) with the viewing distance to provide an estimate of Gaussian likelihood functions, and estimation metric depth (i.e., an estimate in centimeters). of a two-dimensional variable (slant and tilt) On the other hand, depth derived from disparity from a pair of cues, one of which is decidedly needs to be scaled approximately by the square non-Gaussian (skew symmetry). While the latter of the viewing distance to put it in the same may appear more complex than the former, the units. Landy et al. (1995) called this preliminary ideal observer operates similarly by multiplying conversion into common units promotion. likelihood functions and priors. A normative Bayesian model finesses the problem in a very elegant way. Figure 1.2 illustrates the structure of the computations NONLINEAR MODELS: as applied to disparity and velocity cues to GENERATIVE MODELS AND relative depth. The key observation is that the HIDDEN VARIABLES generative model for the sensory data associated We now turn to conditions under which optimal with both the disparity and velocity cues depends cue integration is not linear. We will describe not only on the scene property being estimated three qualitatively different features of cue- (relative depth) but also on the viewing distance integration problems that make the linear model to the fixated point. We will refer to viewing IDEAL-OBSERVER MODELS OF CUE INTEGRATION 11 A

P(size | dv ,dh )

P(dv | size)

P(dh | size) P(size)

size

B bias bias

90˚ 90˚

60˚ Skew 60˚ likelihood

30˚ 30˚

45˚ 0˚ –45˚ 45˚ 0˚ –45˚ tilt tilt X X 90˚ 90˚

60˚ Stereo 60˚ likelihood

30˚ 30˚

45˚ 0˚ –45˚ 45˚ 0˚ –45˚ tilt tilt = = 90˚ 90˚

60˚ Combined 60˚ slant slantlikelihood slant

30˚ 30˚

45˚ 0˚–45˚ 45˚ 0˚ –45˚ tilt tilt

Figure 1.1 Bayesian integration of sensory cues. (A) Two cues to object size, visual and haptic, each have Gaussian likelihoods (as in Ernst & Banks, 2002). The resulting joint likelihood is Gaussian with mean and variance as predicted by Eqs. 1.1–1.3. (B) Two visual cues to surface orientation are provided: skew symmetry (a figural cue) and stereo disparity (as in Saunders & Knill, 2001). Surface orientation is parameterized as slant and tilt angles. Skew-symmetric figures appear as figures slanted in depth because the brain assumes that the figures are projected from bilaterally symmetric figures in the world. The information provided by skew symmetry is given by the angle between the projected symmetry axes of a figure, shown here as solid lines superimposed on the figure. Assuming that visual measurements of the orientations of these angles in the image are corrupted by Gaussian noise, one can compute a likelihood function for three-dimensional (3D) surface orientation from skew. The result, as shown here, is highly non-Gaussian. The shape of the likelihood function is highly dependent on the spin of the figure around its 3D surface normal. Top row of graphs: skew likelihood for the figure shown at the top. Middle row: two stereo likelihoods centered on larger (left) and smaller (right) values of slant. Bottom row: When combined with stereoscopic information from binocular disparities, assuming the prior on surface orientation is flat. This leads to the prediction that perceptual biases will depend on the spin of the figure. It also leads to the somewhat counterintuitive prediction illustrated here that changing the slant suggested by stereo disparities should change the perceived tilt of symmetric figures. This is exactly the pattern of behavior shown by subjects. 12 THEORY AND FUNDAMENTALS

P(dvel | sdepth,sdistance) P(ddisp | sdepth,sdistance) AB

depth P(sdistance) P(ddisp,dvel,sdistance | sdepth) s

CD P ( d disp , d vel

|

s depth )

sdistance

Figure 1.2 Cue disambiguation. (A) Likelihood function for the motion cue as a function of depth and | , viewing distance P(dvel sdepth sdistance). The depth implied by a given retinal velocity is proportional to the | , viewing distance. (B) Likelihood function for the disparity cue P(ddisp sdepth sdistance). The depth implied by a given retinal disparity is approximately proportional to the square of the viewing distance. (C) A Gaussian , , | prior on viewing distance. (D) Combined likelihood function P(ddisp dvel sdistance sdepth). The right-hand side , | of the plot illustrates the likelihood for depth alone, P(ddisp dvel sdepth), integrating out the unknown distance. distance as a hidden variable in the problem This is not quite what we want, however. What (statisticians refer to these kinds of variables we want is the likelihood function for depth , | as nuisance parameters; for more discussion of alone, P(ddisp dvel sdepth). This is derived by the role of hidden variables and marginalization integrating (“marginalizing”) over the hidden in visual cue integration, see Knill, 2003). variable, sdistance: The generative model for both the relative- | disparity and relative-velocity measurements P(ddispdvel sdepth) requires that both relative depth and viewing  = , , | distance be specified. This allows one to compute P(ddisp dvel sdistance sdepth)dsdistance likelihood functions for both cues, dvel and ddisp,  P(d |s , s ) and P(d |s , s ) vel depth distance disp depth distance = P(d , d |s , s ) (assuming one knows the noise characteristics disp vel distance depth of the disparity and motion sensors). Assuming × P(sdistance)dsdistance that the noises in the two sensor systems  are independent, we can write the likelihood = | , P(ddisp sdepth sdistance) function for the two cues as the product of the likelihood functions for the individual cues, × | , . P(dvel sdepth sdistance)P(sdistance)dsdistance

| , (1.9) P(ddispdvel sdepth sdistance) = | , P(ddisp sdepth sdistance) (Note that the second step required that depth and distance be independent.) The important × | , . P(dvel sdepth sdistance) (1.8) thing to note is that the joint likelihood for sdepth is IDEAL-OBSERVER MODELS OF CUE INTEGRATION 13 not the product of the individual cue likelihoods, reliabilities of the two cues (i.e., the inverse of the | | P(ddisp sdepth) and P(dvel sdepth). Rather, we had variances of the associated likelihood functions) to expand the representational space for the were similar, this estimate would be at a slant (say scene to include viewing distance, express both 30◦) that is wildly inconsistent with both cues. likelihood functions in that space, multiply the On the face of it, this appears like a standard likelihoods in the expanded space and then problem in robust statistics. For example, the integrate over the hidden variable to obtain a mean of a set of samples can be influenced final likelihood. If we had a nonuniform prior strongly by a single outlier, and robust, nonlinear on relative depth, we would then multiply the statistical methods, such as the trimmed mean, likelihood function by the prior and normalize to are intended to alleviate such problems (Hampel, obtain the posterior distribution. As illustrated 1974; Huber, 1981). The trimmed mean and in Figure 1.2, both cues are consistent with a related methods reduce the weight of a given data large range of relative depths (depending on the point as the value of that data point becomes viewing distance assumed), but because the cues increasingly discrepant from the bulk of the depend differently on viewing distance, when sample. The application of robust statistical combined they can disambiguate both relative methods to cue integration is difficult, however, depth and viewing distance (Richards, 1985). because one is usually dealing with a small An alternative to this approach would be number of cues rather than a large sample, so to estimate the viewing distance from ancillary it is often unclear which cue should be treated as information (e.g., vergence signals from the the discrepant outlier. oculomotor system). With these parameters A discrepant cue may result from a partic- fixed, optimal cue integration will again be linear. ularly noisy sample, but it may also indicate However, this approach is almost certainly that the estimate from that cue was fallacious suboptimal because it ignores the noise in due to a mistaken assumption (Landy et al., the ancillary signals. The optimal approach is 1995). The second problem is more common to incorporate the information from ancillary than the first. The observation that outliers signals in the same Bayesian formulation. In may arise from fallacious assumptions suggests this case, extraretinal vergence signals specify a reconceptualization of the outlier problem. a likelihood function in depth-distance space Consider the case of . All that is simply stretched out along the depth pictorial depth cues rely on prior assumptions dimension (because those signals say nothing about objects in the world (texture relies on about relative depth) (much like the prior in homogeneity, linear perspective on parallelism, the lower-left panel of Fig. 1.2). In this way, relative motion on rigidity, etc.). A notable and vergence signals disambiguate viewing distance very simple example is that provided by the only in so much as the noise in the signals compression cue (Fig. 1.3A). The allows. If that noise is high, the disambiguating interprets figures that are very compressed in one effects of the nonlinear interaction between the direction as being slanted in 3D in that direction. relative disparity and relative motion signals will For example, the visual system uses the aspect dominate the perceptual estimate. ratio of ellipses in the retinal image as a cue to the 3D slant of a figure, so much so that it gives Robust Estimation and Mixture Priors nearly equal weight to that cue and disparity in a One might ask how a normative system should variety of viewing conditions (Hillis et al., 2004; behave when cues suggest very different values Knill & Saunders, 2003). Of course, the aspect for some scene property. Consider a case in ratio of an ellipse on the retina is only useful which disparity indicates a frontoparallel surface, if one can assume that the figure from which but the texture pattern in the image suggests a it projects is a circle. This is usually a reasonable surface slanted away from frontoparallel by 60◦. assumption because most ellipses in an image are A linear system would choose some intermediate circular in the world. When disparities suggest slant as its best estimate, but if the relative a slant differing by only a small amount from 14 THEORY AND FUNDAMENTALS that suggested by the compression cue, it makes Chapter 9 and Knill, 2007b, for details). The sense to combine the two cues linearly. When generative model for the aspect ratio of an ellipse disparities suggest a very different slant, however, in the image depends on both the 3D slant the discrepancy provides evidence that one is of a surface and the aspect ratio of the ellipse viewing a noncircular ellipse. In this situation, an in the world. The aspect ratio of the ellipse observer should down-weight the compression in the world is a hidden variable and must cue or even ignore it. be integrated out to derive the likelihood for Figure 1.3 illustrates how these observations slant. The prior distribution on ellipse aspect are incorporated into a Bayesian model (see ratios plays a critical role here. The true prior is

Circle Model A 1.5

1

0.5 Likelihood .9 0 Mixture Model 10 30 50 70 1.5 Slant Ellipse Model 1 1.5 0.5 Likelihood 1 0 .1 10 30 50 70 Likelihood 0.5 Slant

0 10 30 50 70 Slant Likelihood

10 20 30 Slant B C Likelihood

10 30 50 70 10 30 50 70 Slant Slant

Figure 1.3 Bayesian model of slant from texture (Knill, 2003). (A) Given the shape of an ellipse in the retinal image, the likelihood function for slant is a mixture of likelihood functions derived from different prior models on the aspect ratios of ellipses in the world. The illustrated likelihoods were derived by assuming that noise associated with sensory measurements of aspect ratio has a standard deviation of 0.03, that the prior distribution of aspect ratios of randomly shaped ellipses in the world has a standard deviation of 0.25, and that 90% of ellipses in the world are circles. The mixture of narrow and broad likelihood functions creates a likelihood function with long tails, as shown in the blow-up. (B) Combination of a long-tailed likelihood function from compression (blue) and a Gaussian likelihood function from disparity (red) yields a joint likelihood function that peaks between the two individual estimates when the individual estimates are similar, much as with the linear, Gaussian model of cue integration. (C) When the cue conflict is increased, the heavy tail of the compression likelihood results in a joint likelihood that peaks at the disparity estimate, effectively vetoing the compression cue. IDEAL-OBSERVER MODELS OF CUE INTEGRATION 15 a mixture of distributions, each corresponding disparity-perspective conflict stimuli (Girshick to different categories of shapes in the world. & Banks, 2009; Knill, 2007b) and with auditory- A simple first-order model is that the prior visual conflict stimuli (Wallace et al., 2004). distribution is a mixture of a delta function at Causal Inference one (i.e., all of the probability massed at one) representing circles and a broader distribution The development of the linear cue-combination over other possible aspect ratios representing model is based on the assumption that the randomly shaped ellipses. In Figure 1.3, the individual cues are all estimating the same width of the likelihood for the circle model is feature of the world (e.g., the depth or location due to sensory noise in measurement of ellipse of the same object). However, the observer may aspect ratio. The result is a likelihood function not know for sure that the cues derive from the that is a mixture of two likelihood functions— same source in the world. The observer has to one derived for circles, in which the uncertainty first infer whether the scene that gave rise to the in slant is caused only by noise in sensory sensory input consists of one or two sources (i.e., measurements of shape on the retina, and one one or two objects) before determining whether derived for randomly shaped ellipses, in which the sources should be integrated. That is, the the uncertainty in slant is a combination of observer is faced with inferring the structure sensory noise and the variance in aspect ratios of the scene, not merely producing a single of ellipses in the world. The result is a likelihood estimate. function for the compression cue that is peaked Consider the problem of interpreting audi- at the slant consistent with a circle interpretation tory and visual location cues. When presented of the measured aspect ratio but has broad tails with both a visual and auditory stimulus, an (Fig. 1.3A). observer should take into account the possibility The likelihood function for both the com- that the two signals come from different sources pression cue and the disparity cue results from in the world. If they come from one source, multiplying the likelihood function for disparity it is sensible to integrate them. If they come (which presumably does not have broad tails; from different sources, integration would be but see Girshick and Banks, 2009, for evidence counterproductive. As we mentioned in the that the disparity likelihood also has broad tails) previous section, behavior consistent with model with the likelihood function for the compression switching has been observed in auditory-visual cue. The resulting likelihood function peaks at integration experiments (Wallace et al., 2004). a point either between the peaks of the two Specifically, when auditory and visual stimuli are cue likelihood functions when they are close to presented in nearby locations, subjects’ estimates one another (small cue conflicts, Fig. 1.3B) or of the auditory stimulus are pulled toward the very near the peak of the disparity likelihood visual stimulus (the ventriloquist effect). When function when they are not close (large cue they are presented far apart, the auditory and conflicts, Fig. 1.3C). The latter condition appears visual signals appear to be separate sources in behaviorally as a down-weighting or vetoing the world and do not affect one another. of the compression cue. Thus, multiplying Recent work has approached this causal- likelihood functions can result in a form of inference problem using Bayesian inference model selection, thereby determining which of structural models (see Chapters 2, 3, 4, prior constraint is used to interpret a cue. and 13). These models typically begin with a Similar behavior can be predicted for many representation of the causal structure of the different depth cues because they also derive sensory input in the form of a Bayes net (Pearl, their informativeness from a mixture of prior 1988). For example, Körding and colleagues constraints that hold for different categories of (2007) used a structural model to analyze data objects. This behavior of integrating with small on auditory-visual cue interactions in location cue conflicts and vetoing with large ones is a form judgments. The structural model (Fig. 1.4) is a of model switching and has been observed with probabilistic description of a generative model 16 THEORY AND FUNDAMENTALS of the scene. According to this model, the source) and a broad distribution over the entire generation of auditory and visual signals can be space (corresponding to situations in which the thought of as a two-step process. First, a weighted locations are independent). In this formulation, coin flip determines whether the scene consists the prior distribution has broad tails, but the of one cause (with probability pcommon, left-hand result is similar. If the two signals indicate branch) or separate causes for the auditory and locations near one another, the posterior is visual stimuli (right-hand branch). If there is peaked at a point on the diagonal corresponding one cause, the location of that cause, x, is then to a position between the two. If they are far determined (as a random sample from a prior apart, it peaks at the same point as the likelihood distribution of locations), and the source at that function. The joint likelihood function for the location then gives rise independently to visual location of the visual and auditory sources can and auditory signals. If there are two causes, be described by the same sort of mixture model each has its own independently chosen location, used in the earlier slant example (for further giving rise to unrelated signals. discussion of causal and mixture models, see An observer has to invert the generative Chapters 2, 3, 4, 12, and 13). model and infer the locations of the visual Conclusions (Theory) and auditory sources (and whether they are one and the same). While there are numerous, The previous theoretical discussion has a mathematically equivalent ways to formulate the number of important take-home messages. First, ideal observer, the formulation that is consistent Bayesian decision theory provides a completely with the others in this chapter is one in which general normative framework for cue integra- an observer computes a posterior distribution tion. A linear approximation can characterize on both the auditory and visual locations, xa the average behavior of an optimal integrator and xv . The prior in this case is a mixture of in limited circumstances, but many realistic − a delta function along the diagonal in xa xv problems require the full machinery of Bayesian space (corresponding to situations in which the inference. This implies that the same framework auditory and visual signals derive from the same can be used to build models of human performance, for example, by constructing and testing model priors that are incorporated into human perceptual mechanisms or by modeling Common the tasks human perceptual systems are designed cause? to solve using appropriate gain functions. p 1−p common common Second, the representational framework used to

{ model specific problems depends critically on the structure of the information available and the X X X observer’s task. In the aforementioned examples, V A appropriate representational primitives include “average” cue estimates (in linear models), additive mixtures of likelihood functions, and graphical models. Finally, constructing norma- V A V A tive models of cue integration serves to highlight the qualitative structure of specific problems. This implies that normative models can suggest Figure 1.4 A causal-inference model of the the appropriate scientific questions that need to ventriloquist effect. The stimulus either comes from a common source or from two independent sources be answered to understand, at a computational level, how the brain solves specific problems. (governed by probability pcommon). If there is a common source, the auditory and visual cues both In some cases, this may mean that appropriate depend on that common source’s location. If not, questions revolve around the weights that each cue depends upon an independent location. observers use to integrate cues. In others, they IDEAL-OBSERVER MODELS OF CUE INTEGRATION 17 may revolve around the mixture components of two-cue, inconsistent stimuli (in which the visual the priors people use. In still others, they center stimulus depicts one size xv and the haptic on the causal structure assumed by observers in stimulus indicates a different size xh). Subjects their models of the generative process that gives are presented with two stimuli sequentially and rise to sensory data. indicate which was larger. For example, a subject is shown two visual-only stimuli that depict bars + with heights xv and xv xv . The threshold value THEORY MEETS DATA  of xv (the just-noticeable difference or JND) is Methodology used to estimate the underlying single-cue noise σ v . An analogous single-cue experiment is used A variety of experimental techniques has σ to estimate the haptic-cue noise h. Interleaved been used to test theories of cue integration. with the visual-only and haptic-only trials, the Many researchers have used variants of the two-cue stimuli are also presented. On such perturbation-analysis technique introduced by trials, subjects discriminate the perceived size Young and colleagues (Landy et al., 1995; of an inconsistent-cues stimulus in which the Maloney & Landy, 1989; Young, Landy, & size depicted by haptics is perturbed from that Maloney, 1993) and later extended to intersen- = +  depicted visually, xh xv x, as compared to sory cue combination (Ernst & Banks, 2002).  =  =  a consistent-cues stimulus in which xh xv x . Consider the combination of visual and haptic The size x of the consistent-cues stimulus is cues to size (Ernst & Banks, 2002). The visual varied to find the point of subjective equality stimuli are stereoscopic random-dot displays , (PSE), that is, the pair of stimuli ((xh xv ) and that depict a raised bar on a flat background  ,  (xh xv )) that are perceived as being equal in (Fig. 1.5). The haptic stimuli are also a raised bar size. Linear, weighted cue integration implies presented with force-feedback devices attached   that x is a linear function of x with slope wh to the index finger and thumb. Four kinds of (the weight applied to the perturbed cue). The stimuli are used: visual-only (the stimulus is weight may be predicted independently from seen but not felt); haptic-only (felt but not the estimates of the individual cue variances seen); two-cue, consistent stimuli (seen and and Eq. 1.2. felt, and both cues depict the same size); and There are a few issues with this method. First, one might argue that the artificial stimuli create cue conflicts that exceed those experienced under natural conditions and therefore that observers might use a different integration method than would be used in the natural environment. One can ask whether the results are similar across conflict sizes to determine whether this is a serious concern in a particular set of conditions. Second, the method does not necessarily detect a situation in which the results have been affected by other unmodeled information such as another cue or a prior. Consider, for example, an experiment in which texture and motion cues to depth were manipulated, and the perturbation Figure 1.5 Multisensory stimulus used by Ernst and Banks (2002). A raised bar was presented method was used to estimate the two cue weights visually as a random-dot stimulus with binocular (Young et al., 1993). The observers estimated depth by using texture and motion cues, but disparity displaying a bar height xv and, in inconsistent-cues stimuli, haptically with a height they may also have incorporated other cues such = +  xh xv x. as blur and accommodation that specify flatness 18 THEORY AND FUNDAMENTALS and/or a Bayesian prior favoring flatness (Watt, have an effect when observers compare stimuli Akeley, Ernst, & Banks, 2005). As a result of that differ in reliability: The stimulus with lower using these other cues, observers should perceive reliability displays a stronger bias toward the all of the stimuli as flatter than specified by mean of the prior (Stocker & Simoncelli, 2006). texture and motion, and therefore the texture This would occur in comparisons of single- and motion weights should sum to less than one cue to two-cue stimuli because cue integration + < (wt wm 1). This perceptual flattening would typically increases reliability. Critically, it would occur equally with both the consistent- and also occur in conditions in which cue reliability inconsistent-cues stimuli, and therefore would depends on the resulting estimate so that not affect points of subjective equality. In par- reliability for each cue varies from trial to trial ticular, the inconsistent-cues stimulus for which as, for example, occurs in the estimation of slant x = 0 is identical to the consistent-cues stim- from texture (Knill, 1998). ulus in which x = x = x and thus these two t m Overview of Results stimuli must be subjectively equivalent (except for measurement error). In this experimental Many studies have supported optimal linear cue design, the consistent-cues stimuli are used as integration as a model of human perception for a “yardstick” to measure the perceived depth of stimuli involving relatively small cue conflicts. the inconsistent-cues stimulus. To uncover a bias By and large, these studies have confirmed the in the percept, the yardstick must be qualitatively two main predictions of the model: With small different from the inconsistent-cues stimulus. In cue conflicts, cue weights are proportional to the texture-motion case, for example, when the cue reliability, and the reliability for stimuli with inconsistent-cues stimuli have reduced texture multiple cues is equal to the sum of individual or motion reliability (by adding noise to texture cue reliabilities. Such studies have been carried shapes or velocities), but the consistent-cues out for combinations of visual cues to depth, stimuli do not have the added stimulus noise, the slant, shape (Hillis et al., 2002; Hillis et al., 2004; relative flattening of the noisy stimuli becomes Johnston, Cumming, & Landy, 1994; Knill & apparent, and the separately measured weights Saunders, 2003; Young et al., 1993), and location sum to less than one (Young et al., 1993). (Landy & Kojima, 2001). Multisensory studies This experimental design is still useful if have also been consistent with the model, includ- the observer incorporates a nonuniform prior ing combinations of visual and haptic cues to size into the computation of the environmental (Ernst & Banks, 2002; Gepshtein & Banks, 2003; property of interest. For example, suppose the Hillis et al., 2002) and visual and auditory cues observer has a Gaussian prior on perceived to location (Alais & Burr, 2004). Some studies depth centered on zero depth (i.e., a prior have found suboptimal choices of cue weights for flatness) and that the reliability ri of each (Battaglia, Jacobs, & Aslin, 2003; Rosas et al., experimenter-manipulated cue i is unchanging 2005; Rosas, Wichmann, & Wagemans, 2007). across experimental conditions. The prior has Cue promotion is an issue for many cue- the form of a probability distribution, but it integration problems. Consider, for example, is a fixed (nonstochastic) contributor to the the visual estimation of depth. Stereo stimuli computation. That is, as measured in stimulus are misperceived in a manner that suggests units, the use of the prior will have no effect on that near viewing distances are overestimated, the estimation of single-cue JNDs, nor on the and far viewing distances underestimated, for estimation of relative cue weights. All percepts the purposes of scaling depth from retinal will be biased toward zero depth, but that will disparity (Gogel, 1990; Johnston, 1991; Rogers occur equally for the two discriminanda in each & Bradshaw, 1995; Watt et al., 2005). This phase of the experiment and should not affect misscaling could be ameliorated by combining the results. Thus, the prior has no effect when disparity and relative-motion cues to shape. { } the cue reliabilities ri are the same for the However, the evidence for this particular cue two stimuli being compared. The prior does interaction has been equivocal (Brenner & IDEAL-OBSERVER MODELS OF CUE INTEGRATION 19

Landy, 1999; Johnston et al., 1994; Landy & cues do not come from the same source. For Brenner, 2001). It is important to note that example, optimal cue integration is found in people are essentially veridical at taking distance combinations of visual and haptic cues to object into account—little if any overestimation of size (Ernst & Banks, 2002), but if the haptic near distances and little if any underestimation object is in a different location from the visual of far distances—when all cues to flatness are object, observers no longer integrate the two eliminated (Watt et al., 2005). In other words, estimates (Gepshtein, Burge, Ernst, & Banks, failures to observe depth constancy may be due 2005). Bayesian structural models have been to the influence of unmodeled flatness cues such successful at modeling phenomena like this, for as blur and accommodation. example, in the ventriloquist effect (Körding There is some evidence for robustness in et al., 2007). intrasensory cue combination; that is, evidence Humans also appear to be optimal or nearly that individual cues are down-weighted as they so in movement tasks involving experimenter- become too discrepant from estimates based imposed rewards and penalties for movement on other cues. Most laboratory studies involve outcome (Trommershäuser, Maloney, & Landy, only two experimenter-manipulated cues with 2003a, 2003b, 2008). Yet when analogous tasks small conflicts, but some have looked at cue are carried out involving visual estimation integration with large discrepancies. Bayesian and integration of visual cues, many observers estimation using a mixture prior can lead to use suboptimal strategies (Landy, Goutcher, robust behavior. For example, Knill (2007b; Trommershäuser, & Mamassian, 2007). As we also see Chapter 9) has described two models argued earlier, observers should be more likely for estimation of slant from texture, a more to approach optimal behavior in tasks that constrained and accurate model that assumes are important for survival. It seems reasonable the texture is isotropic and a second that that accurate visuomotor planning in risky does not make this assumption. By assuming a environments is such a task and that the visual mixture prior over scenes (between isotropic and analog of these movement-planning tasks is not nonisotropic surface textures), one can predict such a task. There remain many open questions a smooth switch from the predictions of one in determining the limits of optimal behavior by model to the other as the presented surface tex- humans in perceptual and visuomotor decision- ture becomes increasingly nonisotropic. Human making tasks with experimenter-imposed loss performance appears to be consistent with the functions (i.e., decision making under risk). predictions of this mixture-prior model (Knill, 2007b). Recently, Girshick and Banks (2009) confirmed Knill’s result that observers’ percepts ISSUES AND CONCERNS are intermediate between cue values for disparity Realism and Unmodeled Cues and texture when the discrepancy between the two cues is small, and that percepts migrate Perceptual systems evolved to perform useful toward one cue when the discrepancy is large. tasks in the natural environment. Accordingly, Like Knill, they found that the cue dictating the these systems were designed to make accurate large-conflict percept was consistently disparity estimates of environmental properties in settings in some conditions. But unlike Knill, they also in which many cues are present and large cue found other conditions in which the large- conflicts are rare. The lack of realism and the conflict percept was consistently dictated by dearth of sensory cues in the laboratory give the texture. Girshick and Banks showed that their experimenter greater stimulus control, but they data were well predicted by a Bayesian model in may place the perceiver in situations for which which both the texture and disparity likelihoods the nervous system is ill suited and therefore may had broader tails than Gaussians. perform suboptimally. Empirical studies suggest that integration is Bülthoff (1991) describes an experiment in impeded when the display indicates the two which the perceived depth of a monocularly 20 THEORY AND FUNDAMENTALS viewed display is gauged by comparison with created the desired texture depth once the card a stereo display. Depth from texture alone, was bent onto the form. The results differed and depth from shading alone were both dramatically: Now the disparity-specified depth underestimated, but when the two pictorial dominated the percept. Buckley and Frisby cues were combined, depth was approximately speculated that unmodeled focus cues—blur and veridical. The depth values appeared to sum accommodation—played an important role in rather than average in the two-cue display. the difference between the computer display and Bülthoff and Yuille (1991) interpreted this as real results. an example of “strong fusion” of cues (in We can quantify their argument by translat- with the “weak fusion” of weighted ing it into the framework of the linear model. averaging). However, these were impoverished There are three depth cues in their experiments: displays and contained other visual cues (blur, disparity, texture, and focus cues; focus cues accommodation, etc.) that indicated the display specify flatness on the computer-display images was flat. Bayesian cue integration predicts and the true shape on the real objects. that the addition of cues to a display will The real-ridge experiment is easier to have the effect of reducing the weight given to interpret, so we start there. In the linear Gaussian these cues to flatness (because increasing the model, perceived depth is based on the sum of amount of information about depth increases all available depth cues, each weighted according the reliability of that information), resulting in to its reliability: greater perceived depth than that with either of the experimenter-controlled cues alone. ˆ = + + d wd dd wt dt wf df There is now clear evidence that display cues + + = , (1.10) wd wt wf 1 to flatness can provide a substantial contribution to perceived depth. Buckley and Frisby (1993) where the subscripts refer to the cues of disparity, observed a striking effect that illustrates the texture, and focus. The depth specified by the importance of considering unmodeled cues in focus cues was equal to the depth specified by general and specifically the role of cues from the = disparity: df dd . Thus, Eq. 1.10 becomes: computer display itself. Their observers viewed raised ridges presented as real objects or as dˆ = (w + w )d + w d . (1.11) computer-graphic images. In one experiment, d f d t t the stimuli were stereograms viewed on a com- The texture cue d had a constant value k for each puter display. Disparity- and texture-specified t depths were varied independently and observers curve in their data figure (their Fig. 3); therefore, indicated the amount of perceived depth. The ˆ = + + − − . data revealed clear effects of both cues. Disparity d (wd wf )dd (1 wd wf )k (1.12) dominated when the texture-specified depth was large, and texture dominated when the For this reason, when perceived depth is plotted texture depth was small. In the framework of against disparity-specified depth (dd ), the slope the linear cue-combination model, the disparity corresponds to the sum of the weights given + and texture weights changed depending on the to the disparity and focus cues: wd wf . The texture-specified depth. experimentally observed slope was ∼0.95. Thus, Buckley and Frisby next asked whether the the texture weight wt was small in the real-ridge results would differ if the stimuli were real experiment. objects. They constructed 3D ridges consisting In the computer-display experiment, focus of a textured card wrapped onto a wooden form. cues always signaled a flat surface (Df = 0); Disparity-specified depth was varied by using therefore, forms of different shapes. Texture-specified ˆ = + = + − − . depth was varied by distorting the texture d wd dd wt dt wd dd (1 wd wf )k pattern on the card so that the projected pattern (1.13) IDEAL-OBSERVER MODELS OF CUE INTEGRATION 21

Thus, the slope of the data in their figures was stimuli (which determines blur and the stimulus an estimate of the disparity weight wd . The slope to accommodation) had a systematic effect on was always lower in the computer-display data perceived distance and therefore had a consistent than in the real data, and this probably reflects and predictable effect on the perception of the the influence of focus cues. 3D shape of an object. Frisby, Buckley, and Horsman (1995) further Estimation of Uncertainty explored the cause of increased reliance on disparity cues with real as opposed to computer- The standard experimental procedure for displayed stimuli. Observers viewed the real- testing optimality includes measurements of ridge and computer-display stimuli through the reliability of individual cues (σ ∝ JND). pinholes, which greatly increased depth of For some cue-integration problems, such as focus, thereby rendering all distances roughly the combination of auditory, haptic, and/or equally well focused. The computer-display visual cues to spatial location, this is relatively data were unaffected by the pinholes, but straightforward. However, for intramodal cue the real-ridge data were significantly affected: integration, difficulties arise in isolating a cue. Those data became similar to the computer- And as we argued earlier in the analysis of the display data. This result makes good sense. Buckley and Frisby (1993) study, this can lead to Viewing through a pinhole renders the blur errors in interpretation. Consider, for example, in the retinal image similar for a wide range the estimation of surface slant from the visual of distances and causes the to adopt a cues of surface texture and disparity. It is easy to fixed focal distance. This causes no change in isolate the texture cue by viewing the stimulus the signals arising from stereograms on a flat monocularly. In contrast, it is impossible to display, so the computer-display results were produce disparity without surface markings unaffected. The increased depth of focus does from which disparities can be estimated. cause a change in the signals arising from real The best one can do in these situations is 3D objects—focus cues now signal flatness as to generate stimuli in which the information they did with computer-displayed images—so provided by one of the two cues is demonstrably the real ridge results became similar to the so unreliable as to be useless for performing computer-display results. The work of Frisby and the psychophysical task. For the stereo-texture colleagues, therefore, demonstrates a clear effect example, Hillis et al. (2004) and Knill and of focus cues. Tangentially, their work shows that Saunders (2003) did this by using sparse using pinholes is not an adequate method for random-dot textures when measuring slant- eliminating the influence of focus cues. from-disparity thresholds. While a random-dot Computer-displayed images are far and texture generates strong disparity cues to shape, away the most frequent means of presenting the texture cue provided by perspective distor- visual stimuli in depth-perception research. Very tions (changes in dot density) is so unreliable that frequently, the potential influence of unmodeled its contribution is likely to be small (Hillis et al., cues is not considered and so, as we have seen in 2004; Knill & Saunders, 2003). Hillis and col- the earlier analysis, the interpretation of empiri- leagues (2004) showed this by examining cases cal observations can be suspect. One worries that in the two-cue experiment in which the texture many findings in the depth-perception literature weight was nearly zero; in those cases, a texture have been misinterpreted and therefore that signal was present, but the percepts were dictated theories constructed to explain those findings are by the disparity signal. They found that those working toward the wrong goal. two-cue JNDs were the same as the disparity- Watt et al. (2005) and Hoffman, Girshick, only JNDs. The close correspondence supports Akeley, and Banks (2008) explicitly examined the assumption that the disparity-alone dis- the role of focus cues in depth perception. They crimination thresholds provided an estimate of found that differences in the distance specified the appropriate reliability for the two-cue by disparity and the physical distance of the experiment. Knill and Saunders (2003) took a 22 THEORY AND FUNDAMENTALS different approach. In their study, the random- (weights of 0.5 for each cue in the slant/disparity dot textures used in the stimuli to measure slant- experiment). Fortunately, decision noise affects from-stereo thresholds were projected from the PSEs and JNDs in a predictable way, and so same slant as indicated by disparities and thus one can gauge the degree to which decision contained perspective distortions that could in noise affected the measurements. Both Knill and theory be used to judge slant. They showed, Saunders (2003) and Hillis and colleagues (2004) however, that when discriminating the slants of concluded the effects of decision noise were these stimuli viewed monocularly, subjects were negligible. at chance regardless of the pedestal slant or the Another important assumption of these difference in slant between two stimuli. methods is that subjects use the same infor- Single-cue discrimination experiments are mation in the stimulus to make the single-cue typically used to estimate the uncertainty judgments as the multiple-cue judgments. This associated with individual cues. Suppose that, can be a particular concern when the single-cue in addition to the uncertainty inherent in the stimuli do not generate a compelling percept of individual cues (which we model as additive whatever one is studying. In the depth domain, noise sources N1 and N2), there is uncertainty one has to be concerned about interpreting due to additional noise late in the process, which thresholds derived from monocular displays we term decision noise (Nd , Fig. 1.6). Suppose with limited depth information, particularly further that this noise corrupts the estimate after when presented on computer displays. One the cues are combined but prior to any decisions way around this is to use subjects’ ability to involving this estimate. If this is the case, the discriminate changes in the sensory features single-cue experiments will provide estimates (e.g., texture compression) that are the source of of the sum of the cue uncertainty and the information in a cue and use an ideal-observer uncertainty created by the added late noise (e.g., model to map the measured sensory uncertainty σ 2 + σ 2 1 d ). The optimal cue weights are still those onto the consequent uncertainty in perceptual defined by Eq. 1.2 (based on the individual cue judgments from that cue. Good examples of this = /σ 2 reliabilities, e.g., r1 1 1 ). By using the results outside the cue-integration literature are work of the single-cue discrimination experiments, on how motion acuity limits structure-from- the experimenter will estimate the single-cue motion judgments (Eagle & Blake, 1995) and  = / σ 2 + σ 2 reliabilities as, for example, r1 1 ( 1 d ). heading judgments form optic flow (Crowell & The resulting predictions of optimal cue weights Banks, 1996), and how disparity acuity limits based on Eq. 1.2 will be biased toward equality judgments of surface orientation at different eccentricities and distances from the horopter (Greenwald & Knill, 2009). In a visuomotor context, Saunders and Knill (Saunders & Knill,

N1 2004) used estimates of position and motion acuity to parameterize an optimal feedback- control model and showed that the resulting x N 1 + d model did a good job at predicting how subjects w 1 integrate position and motion information N + + about the moving hand to control pointing 2 movements. Knill (2007b) used psychophysical measures of aspect-ratio discrimination thresh- w2 olds to parameterize a Bayesian model for x2 + integrating figural compression and disparity Figure 1.6 Illustration of a model that incorpo- cues to slant, but he did not test optimality with the model. rates both early, single-cue noise terms (N1 and N2) as well as a late, postcombination, decision-noise Finally, it is important to note that the term (Nd ). implications of optimal integration differ for IDEAL-OBSERVER MODELS OF CUE INTEGRATION 23 displays in which a cue is missing (e.g., the optimal or not. It only depends on the linearity focus cue when viewing a display through a assumption. pinhole) and displays in which the cue is present Bradshaw and Rogers (1996) ran such an but is fixed. When two cues are present rather experiment, measuring the JNDs of the two than just one, both contribute to perceived constituent cues in the presence of the other depth, but both of their uncertainties contribute cue, but the depth indicated by the second cue to the uncertainty of the result. For example, was fixed at zero (flat). That is, both cues’ noise Eq. 1.3 implies that the JNDs for depth should sources were involved. Bradshaw and Rogers satisfy interpreted the resulting improvement in JND for two-cue displays as indicative of a nonlinear 1 = 1 + 1 , interaction of the cues. But, their data were, in 2 2 2 (1.14) JNDc JND1 JND2 fact, reasonably consistent with the predictions of Eq. 1.15, that is, with the predictions of linear where JNDc is the threshold for discriminating cue combination (optimal or not). consistent-cues, two-cue stimuli, and JND1 and Estimator Bias JND2 are the individual uncertainties of the two cues (proportional to the standard deviation of In introducing the cue-combination models in estimates based on each cue), typically measured the theory section of this chapter, we made the by isolating each cue. common assumption that perceptual estimates Suppose an experimenter fails to isolate each derived from different cues are unbiased; that is, cue when measuring the single-cue thresholds we assumed that for any given value of a physical and, instead, single-cue thresholds are measured stimulus variable (e.g., depth or slant), the aver- with both cues present in the stimulus, with one age perceptual estimate of that variable is equal cue held fixed while discrimination threshold is to the true value. This assumption is generally measured for the other, variable cue. For such an incorporated in descriptions of optimal models experiment, the relationship between thresholds because it simplifies exposition: Disregarding measured for each cue is different. Subjects’ bias allows one to focus only on minimizing JNDs should instead satisfy1 variance as an optimality criterion. However, it seems to us that it is generally impossible 1 1 1 =  +  , (1.15) to determine whether sensory estimates derived JNDc JND1 JND2 from different cues prior to integration are unbiased relative to ground truth, largely because where JND and JND are the JNDs measured 1 2 we only have access to the outputs of systems for each cue in the presence of the other, fixed that operate on perceptual estimates (decision or cue. In fact, this relationship applies regardless motor processes) that may themselves introduce of the weights that subjects give to the cues, unknown biases. It is possible, however, to determine whether they are internally consistent, 1To see this, assume that JNDs are defined to be the that is, whether their estimates agree with one standard deviation of the noise that must be matched by another on average. the change in perceived depth to reach threshold. Thus, in If the estimators in the optimal combination the normal perturbation experiment in which single-cue = σ = σ model (Eq. 1.1) are not internally calibrated, JNDs are measured in isolation, JND1 1, JND2 2,   / and by Eq. 1.3, JND = σ = 1/ 1/σ 2 + 1/σ 2 1 2 problems may arise. Consider presenting a 3D c c 1 2 ◦ from which Eq. 1.14 follows. In the Bradshaw and Rogers stimulus with a slant of 0 to the eye and (1996) experiment, when only cue 1 is manipulated and hand. Let us say that vision and touch are cue 2 is fixed, the weighted combination of the cues must equally reliable, but that vision is biased by  +  = overcome the combined noise, so that w1 1 w2 2 ◦  + = σ 20 because the person is wearing spectacles. w1JND1 w2(0) c and similarly for JND2, where  The internal inconsistency introduces a serious i is the difference in cue value for cue i between the two discriminanda at threshold. Because the weights are problem: If the stimulus is seen but not felt, assumed to sum to 1, Eq. 1.15 follows. its perceived slant will be 20◦. If it is felt but 24 THEORY AND FUNDAMENTALS not seen, the percept will be 0◦. If it seen and texture stimuli even when the two cues are felt, the percept will be 10◦ (assuming equal cue put into large conflict by optical manipulation. weights in Eq. 1.1). The internal inconsistency Because they maintain calibration with respect of the estimators has undermined one of the to one another, the visual system can achieve great achievements of perception: the ability greater accuracy and precision by appropriate to perceive a given environmental property as cue combination as described earlier in the constant despite changes in the proximal stimuli chapter. used to estimate the property. Thus, it is clearly Girshick and Banks (2009) also obtained important for sensory estimators to maintain persuasive data that disparity and texture internal consistency with respect to one another estimators maintain calibration relative to one (see Chapter 12). another. They measured the slants of single- There is a rich literature on how sensory cue stimuli that matched the apparent slant estimators maintain internal consistency and of two-cue stimuli. Specifically, they measured external accuracy (Burian, 1943; Miles, 1948; the slant of a disparity-only stimulus that Morrison, 1972; Ogle, 1950). The problem is matched the perceived slant of a two-cue, referred to as sensory recalibration. Adams, disparity-texture conflict stimulus and they Banks, and van Ee (2001) studied recalibration measured the slant of a texture-only stimulus of estimates of slant from texture and slant from that matched the perceived slant of the disparity by exposing people to a horizontally same disparity-texture conflict stimulus. The magnifying lens in front of one eye. The lens was disparity-only stimulus was a sparse random- worn continuously for 6 days. People were tested dot textured plane viewed binocularly and the before, during, and after wearing the lens with texture-only stimulus was a Voronoi-textured three types of stimuli: slanted planes specified by plane viewed monocularly. On each trial, one texture and viewed monocularly, slanted planes interval contained a two-cue stimulus, and specified by disparity and viewed binocularly, the other contained one of the two single-cue and two-cue, disparity-texture stimuli viewed stimuli. Observers indicated the one containing binocularly. The introduction of the lens caused the greater perceived slant. No feedback was a change in the disparities measured at the two provided. The slant of the single-cue stimulus such that a binocularly viewed plane that was varied according to a staircase procedure was previously perceived as frontoparallel was to find the value that appeared the same as now perceived as slanted by ∼ 10◦. The apparent the two-cue stimulus. Figure 1.7 shows the slant of monocularly viewed planes did not results. Each data point represents the disparity- change. Thus, the introduction of the lens had and texture-specified slants that yielded the created a conflict between the perceived slants for same perceived slant as a particular two- disparity- and texture-based stimuli even when cue, disparity-texture stimulus. Clearly, the they specified the same physical slant. Over the disparity- and texture-specified slants were six days, observers adapted until frontoparallel highly correlated and one was not biased relative planes, whether they were defined by texture to the other, showing that the disparity and alone, disparity alone, or both, were again texture estimators were calibrated relative to one perceived as frontoparallel. When the lens was another. removed, everyone experienced a negative after- Variable Cue Weights effect: A disparity-defined frontoparallel plane appeared slanted in the opposite direction (and This discussion brings up one last point: Cue a texture-defined plane did not). The negative weights need not be constant, independent of after-effect also went away in a few days as the the value of the parameter being estimated. observers adapted back to the original no-lens Effective cue reliability can vary with conditions, condition. These observations clearly show that including with changes in the parameter itself. the visual system maintains internal consistency This phenomenon has been observed, for between the perceived slants of disparity and example, with estimation of surface slant. IDEAL-OBSERVER MODELS OF CUE INTEGRATION 25

different locations along the surface itself (Hillis et al., 2004). 75 The interesting point is that the optimal cue 50 weights can change rapidly, from moment to moment or from location to location, sensitive to 25 local conditions. These optimal weight settings 0 are in response to changes in estimated cue reliability. This raises the question of how human −25 observers estimate and represent cue reliability. −50 One suggestion is that a neural population code can simultaneously encode both the estimate

Texture-specified Slant (deg) −75 and its associated uncertainty (see Chapter 21 and Beck, Ma, Latham, & Pouget, 2007; Ma −75 −50 −25 0 25 50 75 et al., 2006). Disparity-specified Slant (deg) Simulation of the Observer Figure 1.7 The slants of single-cue stimuli that In the development of the linear model for a matched the apparent slant of two-cue stimuli. The Bayesian observer, we pointed out that observers data are from Experiment 2 of Girshick and Banks make measurements for each cue, then form the (2009). On each trial, two stimuli were presented in sequence. One was a single-cue stimulus: either product of likelihood functions derived from texture only or disparity only. The other was a two- these measurements and the prior distribution cue stimulus: texture plus disparity. The two-cue, (Eq. 1.7). This results in the linear rule for texture-disparity stimulus had various amounts Gaussian distributions. One can prove this of conflict between the slants specified by the by multiplying the likelihood functions corre- two cues; some conflicts were as large as 75o. sponding to the expected cue measurements and After each trial, observers indicated which of the the prior. But real observers do not have access to two stimuli had the greater perceived slant. Each the expected cue measurement on any given trial. data point represents the disparity- and texture- Rather, they have samples and must derive (and specified slants that yielded the same perceived slant multiply) likelihood functions based on those as a particular two-cue, disparity-texture stimulus. samples. For symmetric likelihood functions like Different symbols represent different conflicts between the texture- and disparity-specified slants Gaussians, the predictions do not differ from and different observers. (Adapted from Girshick & those based on the expected measurements. Banks, 2009.) However, for non-Gaussian likelihood functions or priors (e.g., mixture priors), one is forced to consider the variability of the cue estimates in formulating predictions. The JND for discrimination of surface slant from In formulating Bayesian models incorpo- texture varies substantially with base slant (Knill, rating both likelihoods and priors, one must 1998) and the JND for slant from disparity confront the issue of where the prior comes varies with base slant as well (Hillis et al., from and how to estimate it. Three different 2004). Of course, JNDs can also vary with other approaches have been used in recent years. In stimulus parameters. For example, the reliability one line of research, natural-image statistics of slant estimates based on disparity varies with are gathered and used to estimate a given viewing distance (Hillis et al., 2004). As a result, prior distribution, and then human behavior one predicts changes in the relative weights of is compared to the performance of a Bayesian texture and disparity with changes in base slant ideal observer using that prior (see Chapter 11 (Hillis et al., 2004; Knill & Saunders, 2003) and and Elder & Goldberg, 2002; Fowlkes, Martin, distance (Hillis et al., 2004). For a large, slanted & Malik, 2007; Geisler, Perry, Super, & Gallogly, surface, one predicts changes in cue weights for 2001). An alternative approach is to ask what 26 THEORY AND FUNDAMENTALS prior distribution is consistent with observers’ considerations does the suboptimality behavior independent of whether it accurately reflect? Are there examples in which the reflects the statistics of the environment. The task and required mechanisms have been technique of Stocker and Simoncelli (2006) characterized correctly and the task is provides such an approach, by taking advantage undeniably important to the organism, yet of the differential effect of a prior on stimuli that perception is nonetheless suboptimal? differ in reliability. Others have fit parametric 5. There are now many examples in which models of the prior distribution to psychophys- Bayesian priors are invoked to explain ical data (Knill, 2007b; Mamassian & Landy, aspects of human perception: a prior for 2001). Finally, Knill (2007a) has examined how slowness, light from above, shape convexity, the visual system adapts its internal prior to the and many more. Do these priors actually current statistics of the environment. correspond to the probability distributions encountered in the natural environment? OPEN QUESTIONS The research on cue integration has been wide ranging, and it has led to interesting data and REFERENCES many successful models. Nevertheless, there is Adams, W. J., Banks, M. S., & van Ee, R. (2001). plenty of room for further progress. Here is a Adaptation to three-dimensional distortions short list of interesting open questions: in human vision. Nature , 4, 1063–1064. 1. How is cue reliability estimated and Adolph, K. E. (1997). Learning in the development represented in the nervous system? Observers of infant locomotion. Monographs of the Society seem to be able to estimate cue reliability in for Research in Child Development, 62, 1–158. novel environments, so this is presumably Alais, D., & Burr, D. (2004). The ventriloquist effect not learned in specific environments and results from near-optimal bimodal integration. then applied when one of the learned Current Biology, 14, 257–262. environments is encountered again. Clearly, Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. cue reliability depends on many factors, (2003). Bayesian integration of visual and and thus estimation of reliability is itself a auditory signals for spatial localization. Journal problem of cue integration. of the Optical Society of America A: Optics, Image 2. Are there general methods the perceptual Science, and Vision, 20, 1391–1397. Beck, J., Ma, W. J., Latham, P. E., & Pouget, A. system uses to determine when cues should be (2007). Probabilistic population codes and the integrated and when, instead, they should be exponential family of distributions. Progress in kept separate and attributed to different envi- Brain Research, 165, 509–519. ronmental causes? This problem can be cast Bradshaw, M. F., & Rogers, B. J. (1996). The as a statistical problem of causal inference. interaction of binocular disparity and motion 3. How optimal is cue integration with respect in the computation of depth. Vision to the information that is available in the envi- Research, 36, 3457–3468. ronment? Scientists tend to classify environ- Brenner, E., & Landy, M. S. (1999). Interaction mental properties into distinct categories. The between the perceived shape of two objects. classical list of depth cues is an example. There Vision Research, 39, 3834–3848. are surely many other sources of depth infor- Buckley, D., & Frisby, J. P. (1993). Interaction of stereo, texture and outline cues in the shape mation that observers’ brains know about, perception of three-dimensional ridges. Vision but scientists’ brains do not. A rigorous analy- Research, 33, 919–933. sis of the linkages between information in nat- Bülthoff, H. H. (1991). Shape from X: ural scenes and human perceptual behavior Psychophysics and computation. In should reveal previously unappreciated cues. M. S. Landy & J. A. Movshon (Eds.), 4. When human cue integration is Computational models of visual processing demonstrably suboptimal, what design (pp. 305–330). Cambridge, MA: MIT Press. IDEAL-OBSERVER MODELS OF CUE INTEGRATION 27

Bülthoff, H. H., & Yuille, A. L. (1991). Shape- Gibson, J. J. (1966). The senses considered as percep- from-X: Psychophysics and computation. In tual systems. Boston, MA: Houghton-Mifflin. P. S. Schencker (Ed.), Sensor fusion III: 3-D Girshick, A. R., & Banks, M. S. (2009). Probabilistic perception and recognition, Proceedings of the combination of slant information: Weighted SPIE (Vol. 1383, pp. 235–246). Bellingham, averaging and robustness as optimal percepts. WA: SPIE. Journal of Vision, 9(9):8, 1–20. Burian, H. M. (1943). Influence of prolonged Gogel, W. C. (1990). A theory of phenomenal wearing of meridional size lenses on spatial geometry and its applications. Perception and localization. Archives of Ophthalmology, 30, Psychophysics, 48, 105–123. 645–666. Greenwald, H. S., & Knill, D. C. (2009). Cue Cochran, W. G. (1937). Problems arising in the integration outside central fixation: A study of analysis of a series of similar experiments. grasping in depth. Journal of Vision, 9(2):11, Journal of the Royal Statistical Society, 4(Suppl.), 1–16. 102–118. Hampel, F. R. (1974). The influence curve Crowell, J. A., & Banks, M. S. (1996). Ideal observer and its role in robust estimation. Journal for heading judgments. Vision Research, 36, of the American Statistical Association, 69, 471–490. 383–393. Domini, F., & Braunstein, M. L. (1998). Recovery Hillis, J. M., Ernst, M. O., Banks, M. S., & of 3-D structure from motion is neither Landy, M. S. (2002). Combining sensory Euclidean nor affine. Journal of Experimental information: Mandatory fusion within, but not : Human Perception and Performance, between, senses. Science, 298, 1627–1630. 24, 1273–1295. Hillis, J. M., Watt, S. J., Landy, M. S., & Banks, M. S. Eagle, R. A., & Blake, A. (1995). Two-dimensional (2004). Slant from texture and disparity cues: constraints on three-dimensional structure Optimal cue combination. Journal of Vision, 4, from motion tasks. Vision Research, 35, 967–992. 2927–2941. Hoffman, D. M., Girshick, A. R., Akeley, K., & Elder, J. H., & Goldberg, R. M. (2002). Ecological Banks, M. S. (2008). Vergence-accommodation statistics of Gestalt laws for the perceptual conflicts hinder visual performance and cause organization of contours. Journal of Vision, 2, visual fatigue. Journal of Vision, 8(3):33, 1–30. 324–353. Huber, P. J. (1981). Robust statistics. New York, Ernst, M. O., & Banks, M. S. (2002). Humans NY: Wiley. integrate visual and haptic information in a sta- Johnston, E. B. (1991). Systematic distortions tistically optimal fashion. Nature, 415, 429–433. of shape from stereopsis. Vision Research, 31, Fowlkes, C. C., Martin, D. R., & Malik, J. (2007). 1351–1360. Local figure-ground cues are valid for natural Johnston, E. B., Cumming, B. G., & Landy, M. S. images. Journal of Vision, 7(8):2, 1–9. (1994). Integration of stereopsis and motion Frisby, J. P., Buckley, D., & Horsman, J. M. (1995). shape cues. Vision Research, 34, 2259–2275. Integration of stereo, texture, and outline cues Knill, D. C. (1998). Discrimination of planar during pinhole viewing of real ridge-shaped surface slant from texture: Human and ideal objects and stereograms of ridges. Perception, observers compared. Vision Research, 38, 24, 181–198. 1683–1711. Geisler, W. S., Perry, J. S., Super, B. J., & Knill, D. C. (2003). Mixture models and the Gallogly, D. P. (2001). Edge co-occurrence probabilistic structure of depth cues. Vision in natural images predicts contour grouping Research, 43, 831–854. performance. Vision Research, 41, 711–724. Knill, D. C. (2005). Reaching for visual cues Gepshtein, S., & Banks, M. S. (2003). Viewing to depth: The brain combines depth cues geometry determines how vision and haptics differently for motor control and perception. combine in size perception. Current Biology, 13, Journal of Vision, 5, 103–115. 483–488. Knill, D. C. (2007a). Learning Bayesian priors for Gepshtein, S., Burge, J., Ernst, M. O., & Banks, M. S. depth perception. Journal of Vision, 7(8):13, (2005). The combination of vision and touch 1–20. depends on spatial proximity. Journal of Vision, Knill, D. C. (2007b). Robust cue integration: A 5, 1013–1023. Bayesian model and evidence from cue-conflict 28 THEORY AND FUNDAMENTALS

studies with stereoscopic and figure cues to Ogle, K. N. (1950). Researches in binocular vision. slant. Journal of Vision, 7(7):5, 1–24. Philadelphia, PA: Saunders. Knill, D. C., & Saunders, J. A. (2003). Do Oruç, I., Maloney, L. T., & Landy, M. S. (2003). humans optimally integrate stereo and texture Weighted linear cue combination with possibly information for judgments of surface slant? correlated error. Vision Research, 43, 2451–2468. Vision Research, 43, 2539–2558. Pearl, J. (1988). Probabilistic reasoning in intelligent Körding, K. P., Beierholm, U., Ma, W. J., Quartz, S., systems: Networks of plausible inference. Tenenbaum, J. B., & Shams, L. (2007). Causal San Francisco, CA: Morgan Kaufmann. inference in multisensory perception. PLoS Richards, W. (1985). Structure from stereo and ONE, 2, e943. motion. Journal of the Optical Society of America Landy, M. S., & Brenner, E. (2001). Motion- A: Optics, Image Science, and Vision, 2, 343–349. disparity interaction and the scaling of Rogers, B. J., & Bradshaw, M. F. (1995). Disparity stereoscopic disparity. In L. R. Harris & scaling and the perception of frontoparallel M. R. M. Jenkin (Eds.), Vision and surfaces. Perception, 24, 155–179. (pp. 129–151). New York, NY: Springer-Verlag. Rosas, P., Wagemans, J., Ernst, M. O., & Landy, M. S., Goutcher, R., Trommershäuser, J., & Wichmann, F. A. (2005). Texture and haptic Mamassian, P. (2007). Visual estimation under cues in slant discrimination: Reliability-based risk. Journal of Vision, 7(6):4, 1–15. cue weighting without statistically optimal cue Landy, M. S., & Kojima, H. (2001). Ideal combination. Journal of the Optical Society of cue combination for localizing texture-defined America A: Optics, Image Science, and Vision, edges. Journal of the Optical Society of America A: 22, 801–809. Optics, Image Science, and Vision, 18, 2307–2320. Rosas, P., Wichmann, F. A., & Wagemans, J. Landy, M. S., Maloney, L. T., Johnston, E. B., & (2007). Texture and object motion in slant Young, M. (1995). Measurement and modeling discrimination: Failure of reliability-based of depth cue combination: In defense of weak weighting of cues may be evidence for strong fusion. Vision Research, 35, 389–412. fusion. Journal of Vision, 7(6):3, 1–21. Ma, W. J., Beck, J. M., Latham, P. E., & Pouget, Saunders, J. A., & Knill, D. C. (2001). Perception A. (2006). Bayesian inference with probabilistic of 3D surface orientation from skew symmetry. population codes. Nature Neuroscience, 9, Vision Research, 41, 3163–3183. 1432–1438. Saunders, J. A., & Knill, D. C. (2004). Visual Maloney, L. T. (2002). Statistical decision feedback control of hand movements. Journal theory and biological vision. In D. Heyer & of Neuroscience, 24, 3223–3234. R. Mausfeld (Eds.), Perception and the physical Stocker, A. A., & Simoncelli, E. P. (2006). Noise world: Psychological and philosophical issues in characteristics and prior expectations in human perception (pp. 145–189). New York, NY: Wiley. visual speed perception. Nature Neuroscience, Maloney, L. T., & Landy, M. S. (1989). A 9, 578–585. statistical framework for robust fusion of Tassinari, H., Hudson, T. E., & Landy, M. S. depth information. In W. A. Pearlman (Ed.), (2006). Combining priors and noisy visual cues Visual communications and image processing in a rapid pointing task. Journal of Neuroscience, IV. Proceedings of the SPIE (Vol. 1199, 26, 10154–10163. pp. 1154–1163). Bellingham, WA: SPIE. Todd, J. T. (2004). The of 3D Mamassian, P., & Landy, M. S. (2001). Interaction shape. Trends in Cognitive Sciences, 8, 115–121. of visual prior constraints. Vision Research, 41, Trommershäuser, J., Maloney, L. T., & Landy, M. S. 2653–2668. (2003a). Statistical decision theory and the Marr, D. (1982). Vision. San Francisco, CA: W. H. selection of rapid, goal-directed movements. Freeman. Journal of the Optical Society of America A: Miles, P. W. (1948). A comparison of aniseikonic Optics, Image Science, and Vision, 20, 1419–1433. test instruments and prolonged induction Trommershäuser, J., Maloney, L. T., & Landy, of artificial aniseikonia. American Journal of M. S. (2003b). Statistical decision theory and Ophthalmology, 31, 687–696. trade-offs in the control of motor response. Morrison, L. (1972). Further studies on the adap- Spatial Vision, 16, 255–275. tation to artificially-induced aniseikonia. British Trommershäuser, J., Maloney, L. T., & Landy, M. S. Journal of Physiological Optics, 27, 84–101. (2008). Decision making, movement planning IDEAL-OBSERVER MODELS OF CUE INTEGRATION 29

and statistical decision theory. Trends in Watt, S. J., Akeley, K., Ernst, M. O., & Banks, M. S. Cognitive Sciences, 12, 291–297. (2005). Focus cues affect perceived depth. Wallace, M. T., Roberson, G. E., Hairston, Journal of Vision, 5, 834–862. W. D., Stein, B. E., Vaughan, J. W., Young, M. J., Landy, M. S., & Maloney, L. T. (1993). & Schirillo, J. A. (2004). Unifying A perturbation analysis of depth perception multisensory signals across time and space. from combinations of texture and motion cues. Experimental Brain Research, 158, 252–258. Vision Research, 33, 2685–2696.