Digital Image-Processing Techniques for the Display of Images and Modeling of Visual Perception

Home , Digital image processing, Information processing

Behavior Research Methods, Instruments, & Computers 1986, 18 (6), 493-506 Digital image-processing techniques for the display of images and modeling of visual perception

TERRY CAELLI The University ofAlberta, Edmonton, Alberta, Canada

With the rapid increase in power of image-processing technology, applications in the areas of robot and human vision are inevitable. In this review, I consider how such techniques make it possible to study new dimensions of human vision and I suggest explanations of visual phenomena, which, until now, have not been considered in detail.

The questionas to whether technologicaladvancement volved in aspects of figure/ground segmentation. In this precedes scientific insight remains unanswered and, paper, lessequivocaldefinitionsof the typesof processes indeed, may be so equivocal as a nonspecific question that and paradigms involved in vision are sought in order to it is not worth answering. In the area of visual percep dispel such confusion. tion, however, it is certainly the case that specific ques In Table 1, the "top-down" division of visualprocesses tions in visual perception, and their answers, have been is used to mean that prior knowledge about the environ encountered withinthe evolution of technological develop ment, signal, or task is required before the perceptual ments. The parallel between models for visual informa problem can be solved. This is to be contrasted with tion processing and the evolution of computer architec the "bottom-up" division, for which the processing ture in the 1960s and the recent use of linear systems algorithms, per se, suffice. For example, the detection theory to model spatiotemporal vision (Caelli, 1984; of a signal (having undergone some transformation) Marr, 1982; Watson & Ahumada, 1985) are but two embedded in an image, requires the prior specifi areas in which developments in technology and science cation of the signal and its possible states. Texture seg have been used to studyvision. The use of point-plotdis mentation, by definition, doesnot require the prior specifi plays to study the visual processing of simple spatio cation of signals (or micropatterns): the segmenta temporal signals, vector generators, and raster displays tion process must produce textured regions of its own for more complex images are examples of new devices accord. that have changed the questions of interest to visual The aim of Sections 1 through 5 is to illustrate some scientists. of the paradigms, representations, and modelswithineach There is, however, a deeperparallel betweencomputer of the subareas defined in Table 1. Again, of particular vision and human vision: both areas are partitioned into interest is the parallel between developments in vision similartopics and hierarchies. The historicalevolutionof research and computational vision. Commonto all the is such subareasis more probablydue to the earlier psycho sues to be discussed is the basic representationof the im physicalpredelections and methodologies. Table 1 shows ages to be processed by both systems: a finite resolution a typical representationof areas and associatedproblems or bandlimitedachromatic (in this paper) image depicted across both disciplines. Such areas are not independent; as a 2"x2" array of locations (pixels) for which each lu for example, feature-extraction processes are usefulin pat minance value is limited to a finite number of (perceptu tern recognition. ally) resolvableintensities (typicallylimited to 256 inten It should also be noted that this division of areas is by sities or 8 bits). Clearly the pixel value-to-intensity no means the only valid one, particularly when specific transformation must reflect the specific psychophysical dependenciesexist betweenareas. For example, the idea response function under study. However,generallogarith that figure/ground perception, or texture segmentation, mic or lineartransformations are more typically employed must precede pattern recognition is not self-evident. In whenusingsuchdigital display devicesfor humanstudies. deed, processesof patternrecognition are intrinsically in- For images subtending a visualangleof 20 wherethe (pre sumed) pixel resolution is approximately0.5 in., a reso lution of 256x 256 pixels would be required. This indi Thisresearch wassupported by GrantA2568 fromthe Natural Sciences catesthatthe underlying dimensionality of sucha stimulus and Engineering Council of Canada. Requests for reprints should be is 256x256 65,536, a formidable order. Consequently, sent to: Terry Caelli, Department of Psychology, or Alberta Centre for = Machine Intelligence and Robotics, The University of Alberta, Edmon the aimof mosttechniques is to reducethe effective dimen ton, AB, Canada TOO 2E9. sionality required to solve specific problems.

Table 1 Some Fundamental Problems Associated with the Major Areasof Interest in Computational Vision and Visual Perception Fundamental Computational Visual Section* Problems Vision Perception Bottom-Up Processing I What is the best -Image domain: -Retinal and cortical representation for an image predictive compression encoding principles to satisfy specific biologi -Orthogonal decomposition -Psychophysical cal or computational -Fourier representations perceptive fields constants? 2 What processes lie behind -Image segmentation: -Figure/ground perception texture segmentation or region grouping, -Texture discrimination figure/ground perception? surface gradient extraction 3 What image or pattern -Edge extraction -Feature detectors features are ideal for the -Pattern feature codes for pattern recognition encoding of spatial for object recognition -Edge encoding structures? -Criteria for perceptual form

4 What processes best repre -Pattern recognition via -Pattern recognition: sent the detection of signals transform methods, correlated outputs of embedded in images? feature correlation, feature analyzers, matched filtering Fourier representations 5 How do visual systems -Invariance coding -Problem of stimulus recognize patterns under -Circular harmonic equivalence translations, rotation, and decomposition -Recognition under Top-Down size changes? -Log-polar methods transformations Processing *The section number refers to the section of this paper in which the topic is discussed.

FUNDAMENTAL PROBLEMS Ri(xo,yo) = IId,(a,{3)I(a-x.,{3-yo)dad{3, (1) 1. Issues of Image Encoding

One of the most fundamental problems in human vi where di(a,{3) corresponds to the detector centered at sion is that ofdetermining how the observer encodes an (xo,yo), I(x,y) corresponds to the image, and (x,y) to a lo image. We can restrict our definition ofan image to, for cally Euclidean retinotopic coordinate system. Ifwe con instance, a 256 x 256 8-bit format within the foveal region; sider many such detectors centered at (xo,yo), then the this constitutes up to 524,288 bits of information, which linear perceptron model allows an encoding of the inten is a large amount to be processed within a saccade. Solu sity at (xo,yo) by tions to this problem generally fall into one of two schools: n explanations based on image models (in the image domain) E Ri(xo,yo). (2) or those based on representations in some transform do i=1 main. In both cases, however, the aim is to uncover en Whether it is at the retinal or cortical levels of analy coding units of much lower dimensionality than defined sis, this approach proposes that, relative to sampling con by the pixels alone. sideration (see Geisler & Hamilton, 1986), such detec The predominant image-domain model, in biological tors are distributed over the image such that the response vision, is that based on the perceptron, a model formal at any position (x,y) is determined by: ized by Minsky and Papert (1969). In this model, spatial information is envisaged to be encoded by large numbers 00 of "receptive fields," or, in the psychophysical world, Ri(x,y) = IIdi(a,{3)I(a-x,{3-y)dad{3. (3) those "perceptive fields" that cover more than one pixel. The responses of such detectors are determined by the In this case, any pixel intensity is usually envisaged to cross-correlation (or convolution, depending on how the be "internally" encoded by detector profile is defined) of the detector's profile with n the incoming image. Being localized in space, such a de I(x,y) = E R;(x,y) , i=l tector's response is defined by: DIGITAL IMAGE PROCESSING 495

FIgure 1. Sets of bypotheticlll cktedors corresponding to fe&'lible retiDaI ganglion cell receptive fields(Column 1) andsimple ceI1

FACE TEXTURE SPECTRUM

e :/4 ,10 ---

o 1,10

Figure 2. Top row shows two original images digitized as 25(i x25(i 4-bit pixels and the amplitude spectrum for the texture. Rows 2 through 5 represent the inverse Fourier transforms of the amplitude-averaging process illustrated for the texture case in the right-band column. 1beunit radial frequency widtbs (octaves: RF) and orientation tuning (in degrees) are shown on the left (RF, 8). From "Coding images in the frequency domain: Filter design and energy processing cbaracteristics ofthe human visual system" by T. CaeJ6 and M.IIiibner, 1983, IEEE Transactions on Systems,Man, and Cybernetics, 13, 1018-1021.Reprinted with permission. DIGITAL IMAGE PROCESSING 497 nents form discrete numbers of "channels" (see pressed (by a fixed amount) along this dimensionshould R. de Valois & K. de Valois, 1980; Sekuler, 1974). be greater than parametrically equivalent compressions Usually the outputsof suchchannelsare restrictedto (sca along others. A good example of this is compression in lar) energy values (Braddick, Campbell, & Atkinson, Fourier power compared with phase, where it has been 1978)and, as such, by no means uniquelyencode the in conclusively shown(Caelli, 1983;Caelli& Bevan, 1983) comingimage.In fact, usingthis criteria, Caelliand Hub that phase is the more important processing parameter. ner (1983) found that at least 90 channels would be re In tum, it can alsobe shownthatphasecompressions have quired in the two-dimensional frequency domain(without more deformingeffects on the image luminanceprofiles; evenconsidering phaseencoding) to capturethe inputsig thus, Caelli and past collaborators have concluded that it nal (Figure 2). Although this may be possible for a is thepixel,or image-domain, parameters, thatare of para parallel-processing biological system, it does not make mountimportance, evenafter 20 yearsof attempts to show good scienceto have an encoding model with at least 90 that visualencoding may be accomplished in a transform parameters. domain. This does not implythat the visual systemis not Such results suggest that perceptrons do not produce differentially sensitive to differentluminance gradients. As a more efficientimagecode than do the individual pixels shown in Figure3, it isclearthatbandpass components con and that if efficiency is required, then one may as well tainthemoreperceptually salient features of an image, since remainin the imagedomain. In contrast to this situation, thepredominant face is theone belonging to that range when there are manytechniques in digitalimageprocessing that two images aredigitally superimposed (seeCaelli & Yuzyk, produce considerably more efficient image-encoding 1985,for moredetails). Indeed, suchdemonstrations show strategies. For illustrative purposes, we consider one: that image encoding cannot be divorced from the selec predictive compression. tive attention of the visual system for specific features, Predictive compression differs from the above ap such as edges (this topic is covered in Section 3). proaches insofaras it does notpositthe existence of fixed, invariantdetectorsthat encode the image. Rather the en 2. Image Segmentation: coder is a correlator, which determines the degree to Figure/Ground Processing? whichan individual pixelcanbe predictedfrom its neigh Although image encoding is concerned with image bors (seeRosenfeld & Kak, 1982,for moredetails). Since representation in general, image or texture segmentation nonwhite (nonrandom) images, by definition, carry highly research has the specific aim of developing models for correlated information, this technique can produce con howbiological andcomputervisualsystems maysegment siderablereduction in bit codes(Caelli& Hubner, 1984). arbitrary images into regions that are defined by their The technology required to study issues in visual or "granularity" or surface properties. Texture segmenta computer image encoding includes standard image tion also differs from pattern recognition insofar as no processing systems thathavedigitizing cameras (512x512 a priori informationabout the segmentation areas is sup 8-bit frame "grabber"), frame memories, and high plied before segmentation occurs. quality (raster) monitors. In addition, a sufficiently fast Psychophysical studies of texture segmentation have processor (or dedicatedperipheraldevice) is required to usually been restricted to artificial textures composedof compute the convolutions for the perceptron-based pseudowhite noise or textures generated from micro models. The detectorresponse functions (Equation 3) can patterns (Beck, 1966; Julesz, 1962, 1981). In all such alsobe enacted by fastFouriertransform techniques since studies, the aim has been to enumerate the specifictypes Equation 3 is equivalent to: of texture properties that determine perceptual segmen tation, and, over the past 20 years, it has becomeappar R.(x,y) = Jrl{F(d;(x,y» . F(/(x,y»}, (4) ent that the process seems to occur in the visual system where F and F" correspond to Fourier and inverse Fou through the registration of differences between the out rier transforms, respectively. Discrete forms of the Fou puts of detectors, sometimes referred to as "textons" rier transform are enacted by the well-known Cooley (Julesz, 1981). The idea that classification of textured Tukey method (Cooley & Tukey, 1965), which, for large regions comesaboutby sucha processwas studied earlier kernels (detector profiles), is considerably faster than by Richards and Pollit (1974) with one-dimensional enacting Equation 3 directly. However, it shouldbe noted granularities and recently by Caelli (1985). One typical that with the increasing speed, versatility, and "parallel computational scheme for texturesegmentation andan ex ness" of array processors, this may not be the case in ample are shown in Figure 4. the near future. From the example in Figure 4, three processes seem As is evident in Figure 2, one direct psychophysical necessary to representhow segmentation may be accom paradigmfor studying the superthreshold sensitivity to the plished: (1) a form of convolution, where specified de compressionor detectionof spatial informationis that of tectorsare cross-correlatedwith the incoming texture, as discrimination. Ifa specificencodingdimension is criti in Equation 3; (2) a form of impletion, or perceptual cal to the perception of an image, then the perceived "relaxation," wherestrongand contiguous responses are differencebetween an originalimageand a versioncom- allowed to spread while isolated or weak detector 498 CAELU

Figure 3. Face-face montage. Row 1 consists of Face 1 images after filtering (0, 8, 16, 32, 64 picture cycles). Column 1 consists of Face 2 images after filtering (0, 8, 16, 32, 64 picture cycles). The remaiDiDg images are the result of factorially combining filtered Face 1 and Face 2 images. responses are inhibited; (3) grouping, where the vector responses ofeach position (pixel) are correlated such that di·dj = E Edi(x,y)·dj(x,y) = o. (5) x y positions with similar detector outputs are grouped into the same textured region. Below a certain correlation Earlier attempts to define the basic set of orthogonal threshold, the positions are classified intodifferent regions. detectors (see, e.g., Caelli, Julesz, & Gilbert, 1978) did Questions about the number and shape ofthe detectors not use this form of independence, at least as an objec involved are important; however, their answers do not tive benchmark. Indeed, claims that such texture features solve the problem of texture segmentation. Indeed, as crossing-lines and terminators are distinctively differ Figure 4 illustrates one difference between computer vi ent from orientated edge or bar detectors (Julesz, 1981; sion and much ofbehavioral research insofar as the former Treisman & Gelade, 1980) are not consistent with the requires that a full segmentation algorithm be supplied. above definition oforthogonality, and are not even neces Even the issue of detector specification can be clarified sary to predict the very results used to argue for such fea if one adopts the objective definition oforthogonality; that tures via Equation 3 (see Caelli, 1985, for details). Such is, two detector profiles (di,dj) are orthogonal if, and only distinctions simply demonstrate the differences between if, their scalar product is zero: descriptions developed for spatial codes in the vernacu- DIGITAL IMAGE PROCESSING 499

lal (bl (c) (dl

~I ~I IINPUTTEXTURE P L- ..1 IIMPLETlON CORRELATION GROUPING

elemen1s highly COITlIIatlId within each region and not~

-Detector i - active detectors propagate detector responses Binary output -low activity suppressed correlated ~ classification positions ., \.' ---.Jcase,

Heuristic 2: Heuristic 3: Heuristic 1:

detection of textons selective activity of preattentive decision process as nonlinear textons convolution kemels: lines. edges. Gabor signals, crossings, terminators. etc.

(al (bl (c) (dl Input Convolution Impletion Grouping

Figure 4. Texture segmentationmodel proposed in "Three processing cbaracteristics of visual texture segmentation" by T. CaeIIi, 1985, Spatiol Vision, 1, 19-30(reprinted with permission). (a) Basic detectors, their spatial coupling, aDd selectivity, wbich result in the "tilling in" of textured regiom aDd segmentationeffects. (b) Aneumple in wbich the input texture (left) is processed by eight edgeJbar detectors (outputs represented in the first two columns), after wbich impletion is activated to result in "tilling in" by active nnits aDd suppression of inactive units. The fiDal output represents the grouping of positions according to the detector outputs (from CaeIIi, 1985). 500 CAELU lar and objective formal languages as suggested by usually decreases with increases in spatial frequency (lu Equation 5. minance gradients); however, visual sensitivity is typi The technological requirements for texture displays and cally "inverted-U" shaped with pure grating patterns (as experiments are dependent upon the type of textures un in contrast sensitivity measures) or images in general. This der investigation. Point or vector-type black/white follows naturally from the predominance of known de micropattern textures can be generated on point-plot tectors within the vertebrate visual system that have little 'CRTs, as in earlier displays (see, e.g., Caelli et al., 1978). response to constant luminance and specific sensitivities However, fully gray-scaled, or even chromatic, textures to edges, bars, and so forth (see Hubel & Wiesel, 1977, require raster monitors and image-processing systems as for a review). described in the previous section. Techniques for edge extraction are consequently based upon at least some form of bandpass filtering and 3. Image-Feature Extraction thresholding operation. The choice of bandpass is usually Consistent with the view that the critical information made to minimize the degree of high-frequency edge in images for pattern recognition, classification, and so "noise," but must be sufficiently highpass to extract ade forth lies in what can be loosely termed the image's quate luminance gradients (detail). One popular way of 2G "structural features," attempts have been made over the enacting such a bandpass filter is to use a V a operator past century to define such image components in an ob consisting of a Gaussian lowpass filter whose bandwidth jective way. Similarly, much research has been done to is determined by a (see Caelli & Nagendran, in press). determine whether the visual system is particularly sen By definition, the blurring process (Ga ) attenuates the sitive to a subset of such features and whether such fea high-frequency components, and the V2 operator, being tures are sufficient to predict many aspects of spatial a Laplacian differential operator, enhances the remain vision. ing detail. This operation is analytically defined by From the detection of Gaussian windowed gratings (Watson, Barlow, & Robson, 1983) to the masking of one ;P/ iJ2/ 2 (6) image by another (Caelli & Yuzyk, 1985), it is clear that V / = iJx2 + iJ y2' the human visual system is more sensitive to image band pass information than is physically present and measured which, in finite-difference form, is equivalent to cross by the image power spectrum. That is, image energy correlating the image with the kernel:

Figure 5. Input sigDaI

FIgure 6. FIgure 6 is the llUDe'U FIgure 5, eKept for the useofrelu8don, where oaIy regions of high edge demity are retained in differentdegrees. Bere, b, e, meld correspood to edge-4lnIy images with DOextracoostraiDts (top rigid), 0DIy retaiDing edge points after averaging andtbresboIding within4- and 12 pixel radii.

tion (see Rosenfeld & Kak, 1982) and thresholding oper o -1 -14 -010] (7) ation. The two most common edge-decision criteria are o -1 the simple thresholding operation illustrated in Figures 5 and 6 and the use of zero-crossings in the V1G images. Indeed, when two such V1s are put together (one at 45 0 Indeed, the psychophysical evidence for the perceptual to the other), one sees the usual lateral inhibition (L.I.) uniqueness of any of these criteria is lacking (Caelli & operator: Yuzyk, 1986; Watt & Morgan, 1985). -1 -1 -1 These operations are simple examples of image processing techniques that may reflect human visual LL ~ [ -1 8-1 (8) -1 -1 -1 information-processing characteristics and offer new in sights into feature encoding. For example, the pattern fea which, again, serves to differentiate the image. An ex tures for the character "S" shown in Figure 6 are not ample of edge extraction, based on V1G.., is shown derived from an explicit list of curvature detectors, but in Figure 5 in a more elaborate form. Figures 5b and rather from a process that results in thesecritical features 5c represent the thresholded outputs of two V1G.. filters for a specific pattern. As will be shown in Sections 4 and at two different values of a: the Gaussian space con 5, thisprocessapproach to feature generation significantly stant is set at 4- and 6-pixel widths. All edge responses decreases the dimensionality of the pattern-recognition greater than V1G.. = 0 are retained. Figure 5d repre problem. sents the logical AND between these two edge outputs, Image-processing systems not only allow one to extract and the effect is to retain the stronger edges and, such edge-only images quickly by dedicated pipe-line pixel thus, suppress "edge noise" (Caelli & Nagendran, processors (see Caelli & Nagendran, in press), but also in press). enable more realistic experiments on feature processing. By introducing an operation called "relaxation," such One example is the alignment ofimage-edge information edge-only information can be reduced to edge regions of with an original image where, at raster speed (every high density (as shown in Figure 6). Such relaxation tech 33 msecs), one can move (pen and scroll) the edge ver niques have wide use in computer vision and are based sion overthe original until subjective alignment is reached upon an iterative scheme that enhances edge responses (see Caelli & Yuzyk, 1986, for details). if neighborhood responses are strong and that inhibits the response if neighborhood outputs are weak (see Hummel 4. Pattern Recognition: Detecting & Zucker, 1983). Signals in Images There are many other versions ofedge operators in use, The division in Table 1 between "bottom-up-" and although most involve some form of image differentia- "top-down" processing is to be interpreted in terms of 502 CAELU the degree to which prior knowledge is introduced into the detection of a signal !(x,y) embedded in an image the solution algorithms. In the three topics briefly dis g(x,y), where cussed already, the only''prior knowledge" involved was the computational processes or image-representation g(x,y) = !(x,y)+n(x,y) , (9) strategies; however, with pattern recognition (Section 4) for n(x,y) corresponding to the background. Here, the and invariance coding (Section 5), increasing amounts of prior information consists of at least some signal and back prior knowledge are required. This knowledge controls ground specifications, and the question is what filter (of the processing and, thus, is termed "top-down." g) best reveals the presence of ! in the image (the The simplest example of pattern recognition involves "matched filter problem"; see Rosenfeld & Kak, 1982).

Figure 7. The normalized CI"CIS8-COI"I'etor: Top left images show image with the center (small) signa1 embedded, and top right images show the normalized CI'ClSS-COrreIation output image. Here the brightness indexes the like6hood of the signal's presence. Bottom images show performance of two observers in localizing suchsignals using a "mouse." TIle demities are derived from 200 trials per observer. From "On the localization of signa1s in images" by T. Caelli and M. Nawrot, 1986, manuscript submitted for publication. Printed with permission. DIGITAL IMAGE PROCESSING 503

The solution to this problem is determined from what is termed the Cauchy-Schwarz inequality (see Hall, 1979, p. 481); that is, the cross-eorrelation between/and g is upper-bounded by: co Cfg(x,y) = !! /(a,{3) . g(x+a,y+{3)dad

::s; nj2(a,{j)dad{j' ng2(x+a,y+{j)dad{j. -co (10)

If the signal energy (Ef ) is defined by co Ef = !! F(a,{3)dad{3, and the equivalent image energy is defined by co E,(x,y) = !! g2(x+a,y+{3)dad{3, -co then Equation 10 becomes

Cf,(x,y) Iri"" --- < 'IE, (11) ..JE,(x,y) - f with equality if, and only if, / = 'Ag (Rosenfeld & Kak, 1982). The normalized. cross-correlation images defined by Equation 11 are illustrated in Figure 7. A cross-eorrelator determines the likelihood of the signal's presence over the full image: both the what and the where. When n(x,y) in Equation 9 is white-noise, having (1) a flat power spec trum, (2) random phase spectrum, and (3) a Gaussian pixel histogram, then E, is constant and the cross correlator is optimal for detection (Rosenfeld & Kak, 1982). When the background [n(x,y) in Equation 9] becomes more variable (neighborhood pixel variances and/or means increasingly vary), the optimality of Equation 11 decreases and additional filters have to be developed to improve detection. For example, in a recent paper Liu and Caelli (in press) estimated a bandpass filter (h) that minimizes the difference between the signal auto correlation (Cff) and cross-correlation (Cf,) images:

min EE {Cff-h*Cf,P, (12) Figure 8. ~Dly cnJllS

Figure 9. Cartesian edge-oDIy signal (top) and image (bottom) on left, and log-polar versions on right. In the log-polar version, the horizontal axis corresponds to orientation and the vertical axis to log-polar length. The center is chosen with respect to the center of the letter B in both cases. detectability and decrease false alarms. Furthermore, the changes into translations in the log-polar domain defined use of edge-only information enhances the slope of the by correlation image as shown in Figure 8, where the edge r' log(r-ro) only Snoopy pattern is detected and localized by the peak = of the edge-only cross-correlation function. o= tan-1 (y - yo/x - xo) (13) for r r« where (xo,yo) corresponds Invariance Coding = .Jr+f, = .Jrc,+fo' s. to the center of the transform. Cross-eorrelation techniques for pattern recognition are As is evident from Equation 13, rotations and size shift-invariant insofar as the peak value of Equation 11 (scale) changes ofthe pattern become simple translations is invariant to the location ofthe signal. It is not invariant, in the log-polar image, since for: however, to rotation and size changes of the signal. We call the problem of developing techniques to solve this r, = a[r-ro), problem, and that of determining how the visual system lnr, In(r-r ) + Ina. accomplishes the same, the invariance coding problem, an = o example of the problem of stimulus equivalence. Also, for rotations, 01 = O+

Other approaches to this problem involve the use of Researchover the past two decades clearly indicatesthat transform techniques that, in various degrees, retain the "edges" of mostinterestto humanobserversare those properties of the image. Theclosestto the conformal map that are produced from edge and bardetectors extracted ping techniquediscussedabove is that of determiningthe from the middlefrequency range of an image. Secondly, two-dimensional log-polar circular harmonic decompo evidence for edge information from the joint output of sition of the image (Ferrier, 1986). Essentially, the idea detectors enhances the visually predominant edges and is that if the imageis decomposed in terms of concentric also suppresses edge noise contained within each edge circle/spiral-type •• gratings," then the power spectrum filter in isolation. Finally, via relaxation, I showed that of the circular harmonic Fourier spectrum is size- and pattern-eritical features can be extracted without the rotation-invariant. Second, as with the normal Fourier knowledgeof a priori feature lists. Such processes have spectrum using vertical gratings, the phase spectrumen importantimplications for pattern recognition basedupon codes the actual sizeand rotationstate of the image (Fer the correlationbetweensuchfeatures and their spatial dis rier, 1986): the positionalinformationin this conformal tribution. space. The differencebetweenthese three processes(Sections 1-3 and Table 1) and pattern recognition is that, in the CONCLUSIONS: latter, considerable prior knowledgeand constraintsmust TECHNICAL IMPLEMENTATIONS apply.Thisis particularly true withinvariance codingand evenhigherorder processes in computational vision.Once In Sections 1 through 5, I have presented some recent pattern featuresare determined, the signal specified, and applications of both image-processing technology and the possibilities of log-polartransformsimplemented, the modelsto the studyof humanvisualinformation process recognition of patternsof arbitrary scale, orientation,and ing. My aim was to demonstrate that a new class of position is quite feasible. However, the adaptability of images-naturally occurring achromaticimages-can be such processes, the learning strategies, and so forth, all carefully manipulated and controlled to test many criti require further experimental work before adequate cal hypotheses about information encoding, feature ex parallels between computer and human vision can be traction, and patternrecognition. Centralto this approach made. is the image-processing system that allows one to digi In conclusion, then, I have illustrated how conceptsand tize, analyze, and display images in formats up to techniquesin computational visionmay be applied to the 512x512 pixels at relatively little expense. study of humanvision. It is my hope that these brief ex In the area of image encoding, I briefly examined the amples may suggest many new experiments and models question of whether image- or transform-domain-encoding for human visual information processing. I believe that techniques are more efficient or more representative of such image-processing systemsallow us to study images humanimageprocessing. I concluded that theonlybenefit that are more "ecologically valid" than the dots, of spectral representations would be to enable the black/white point-plot images, or even gratings that are processesof encodingvia the outputsof perceptivefields so frequently used in visual psychophysics. via enacting convolutions through spectral products (the convolution theorem; see Rosenfeld & Kak, 1982). I also REFERENCES noted, however, that this encodingrequiredfull spectral products and not simply the specificationof detectors in BECK, J. (1966). Perceptual grouping produced by changes in orienta the amplitude (power)domainalone. Thisbeingthe case, tion and shape. Science, 154, 538-540. then, there would be a formal equivalencebetween fea BRADDlCK, 0., CAMPBELL, F., & ATKINSON, J. (1978). Channels in ture detector models and Fourier filters or "channels," vision: Basic aspects. In R. Held, H. Leibowitz, & L. Tenbergen (Eds.), Handbook ofsensory physiology: Perception (Vol. 8). Ber since the latter would constitute detectors localized in lin: Springer-Verlag. space and have precisely defined phase responses. BURGES, A., & GHANDEHARlAN, H. (1984). Visual signal detection: For visual texture segmentation, I examined the idea I. Ability to use phase information. Journal ofthe Optical Society that texture regions are classifiedaccordingto the corre ofAmerica A, 1, 900-905. CAELU, T. (1983). Energy processingand coding factorsin texturedis lation betweenthe responsesof detectors within regions. criminationand image processing. Perception &:Psychophysics, 34, These detectorsincludethe usualedge and bar detectors, 349-355. and one maydetermine theobjective orthogonality of such CAELU, T. (1984). On the specificationof coding principlesfor visual textons via the usual definition of zero scalar (dot) imageprocessing.In P. C. DodweU & T. Caelli (Eds.), Figural syn product. Suchdefinitions and models endeavorto restrain thesis. Hillsdale, NJ: Erlbaum. CAELU, T. (1985). Three processing characteristics of visual texture ambiguous speculations about the detectors involved in segmentation. Spatial Vision, 1, 19-30. texture segmentation. CAELLI, T., & BEVAN, P. (1983). Probing the spatial frequency spec The issue as to the types of image features that trum for orientation sensitivity with stochastic textures. Vision predominate visual information processing, in general, Research, 23, 39-45. CAELU, T., & HUBNER, M. (1983). Coding images in the frequency follows naturally from the attempt to define detectors domain: Filter designand energy processingcharacteristicsof the hu operating in texture segmentation. In Section 3, I ex man visual system. IEEE Transactions on Systems, Man & Cyber amined one important type of image feature: edges. netics, SMe-l3, 1018-1021. 506 CAELLI

CAELU, T., & HUBNER, M. (1984). On the number of intensitylevels HUMMEL, R., & ZUCKER, S. (1983). On the foundation of relaxation detected in textures. Perception, 13, 21-31. labeling processes. IEEETransadions onPattern Analysis & Machine CAELU, T., JULESZ, B., & GILBERT, E. (1978). On perceptual analyzers InteUigence, PAMI-5, 267-287, underlying visual texture discrimination: Part 2. BiologicalCyber JULESZ, B. (1962). Visualpatterndiscrimination. IRE Transactions of netics, 29, 201-214. the Professional Groupon Information Theory, 8, 84-92. CAELU, T., & NAGENDRAN, S. (inpress).Fastedge-only matching tech JULESZ, B. (1981). Textons, the elements of texture perception, and niquesfor robot pattern recognition. ComputerVision, Graphics, & their interactions. Nature, 290, 91-97. Image Processing. LIU, Z. Q., & CAELU, T. (1986). Adaptivepost-filter matchingtech CAELU, T., & NAWROT, M. (1986). On the localization ofsignals in nique for detecting signals in nonwhite noise. Applied Optics, 25, images. Manuscript submitted for publication. 1622-1626. CAELU, T., & YUZYK, J. (1985). What is perceivedwhen two images MARR, D. (1982). Vision. San Francisco: Freeman. are combined? Perception, 14, 14-48. MINSKY, M., & PAPI!RT, S. (1969). Perceptrons: An introduction to com CAELU, T., & YUZYK, J. (1986). On the extraction and alignmentof putationalgeometry. Cambridge, MA: MIT Press. image edges. Spatial Vision, I, 205-218. PAPOUUS, A. (1968). Systemsand transforms withapplications in op COOLEY, J. W., & TUKEY, J. W. (1965).An algorithmfor the machine tics. New York: McGraw-Hill. calculationof complex Fourier series. Mathematical Computation, RICHARDS, W., & POLUT, A. (1974). Texture matching. Kybernetik, 19, 297-301. 16, 155-162. DAUGMAN, J. (1984). Spatialvisualchannelsin the Fourier plane. Vi ROSENFELD, A., & KAK, A. (1982). Digitalpicture processing. New sion Research, 24, 891-910. York: Academic Press. DE VALOIS, R., & DE VALOIS, K. (1980).Spatialvision.AnnualReview SEKULER, R. (1974). Spatialvision. AnnualReviewofPsychology, 25, ofPsychology, 31, 309-341. 292-332. FERRIER, N. (1986). Two-dimensional circularharmonicdecomposi TREISMAN, A., & GELADE, G. (1980). A feature-integration theory of tions and recognition ofpatterns under rotations and size changes. attention. Cognitive Psychology, 12, 97-136. Unpublished master's thesis, University of Alberta, Edmonton, Al WATSON, A., & AHUMADA, A. (1985). Modelof humanvisual-motion berta, Canada. sensing. Journal ofthe OpticalSociety ofAmerica A, 2, 322-342. GEISLER, W., & HAMILTON, D. (1986). Sampling-theory analysis of spa WATSON, A., BARLOW, H., & ROBSON, J. (1983). What does the eye tial vision. Journal ofthe Optical SocietyofAmericaA, 3, 62-70. see best? Nature, 302, 419-422. HALL, E. L. (1979). Computer imageprocessing and recognition. New WATT, R., & MORGAN, M. (1985). A theory of the primitive spatial York: Academic Press. code in human vision. Vision Research, 25, 1661-1674. HUBEL, D., & WEISEL, T. (1977). Functionalarchitectureof macaque WILSON, H., & GELB, D. (1984). Modified line-elementtheory for monkey visual cortex. Proceedings of the Royal Society, B, 198, spatial-frequency and widthdiscrimination. Journal ofthe Optical S0 1-59. ciety ofAmerica A, 1, 124-131.