U.S. Department of Commerce

National Bureau of Standards

NBSIR 82-2582 AN OVERVIEW OF

September 1982

f— QC Prepared for iuu National Aeronautics and Space

. Ibo Administration Headquarters 61-2562 Washington, D.C. 20546

National Bureau ot Standard!

OCT 8 1982 ftsVac.c-'s loo

- 2% NBSIR 82-2582 AN OVERVIEW OF COMPUTER VISION

William B. Gevarter*

U.S. DEPARTMENT OF COMMERCE National Bureau of Standards National Laboratory Center for Manufacturing Engineering Industrial Systems Division Metrology Building, Room A127 Washington, DC 20234

September 1982

Prepared for: National Aeronautics and Space Administration Headquarters Washington, DC 20546

U.S. DEPARTMENT OF COMMERCE, Malcolm Baldrige, Secretary NATIONAL BUREAU OF STANDARDS, Ernest Ambler, Director

* Research Associate at the National Bureau of Standards Sponsored by NASA Headquarters * •' '

J s rou .

Preface

Computer Vision *

Computer Vision — visual perception employing computers — shares with "Expert Systems" the role of being one of the most popular topics in today. Commercial vision systems have already begun to be used in manufacturing and robotic systems for inspection and guidance tasks. Other systems at various stages of development, are beginning to be employed in military, cartograhic and image inter pretat ion applications.

This report reviews the basic approaches to such systems, the techniques utilized, applications, the current existing systems, the state-of-the-art of the technology, issues and research requirements, who is doing it and who is funding it, and finally, future trends and expectations.

The computer vision field is multifaceted, having many participants with diverse viewpoints, with many papers having been written. However, the field is still in the early stages of development — organizing principles have not yet crystalized, and the associated technology has not yet been rationalized. Thus, this report is not as smooth and even as would be desirable.

Nevertheless, this overview should prove useful to engineering and research managers, potential users and others who will be affected by this field as it unfolds.

* Th i s re port Isi n support of the more general NBS/NASA report, An Overview of Artificial Intelligence and Robotics

i .

Acknowledgments

I wish to thank the many individuals and organizations who have contributed to this report by furnishing information and suggestions. I particularly would like to thank Drs. T. Binford of Stanford, M. Brady of M.I.T., M. Tenenbaum and H. Barrow of

Fairchild and Gennery et al of JPL for furnishing source material that was essential to the development of this report. In addition, I would like to thank the staff of the NBS Industrial

Systems Division, CDMR R. Ohlander of DARPA, Drs. A. Rosenfeld and M. Schneer of U. of MD., M. Fischler of SRI, E. Sacerdoti of

M.I.C. and J. Wilder of Object Recognition Systems for reviewing this report and suggesting corrections and modifications.

However, any remaining errors or omissions must remain the responsibility of the author. I would also like to thank Margie

Johnson for typing and facilitating the publication of this NBS series of reports on Artificial Intelligence and Robotics.

It is not the intent of the National Bureau of Standards to recommend or endorse any of the manuf ac turers or organizations named in this report, but simply to attempt to provide an overview of the computer vision field. However, in such a diverse and rapidly changing field, important activities and products may not have been mentioned. Lack of such mention does not in any way imply that they are not also worthwhile. The author would appreciate having any such omissions or oversights called to his attention so that they can be considered for future reports Table of Contents

Preface i

I. Introduction 1

II. Definition 5

III. Origins of Computer Vision 7

IV. Relation to Human Vision 11

V. Applications 16

VI. Basis for a General Purpose Image Understanding System 19

VII. Basic Paradigms for Computer Vision 22

A. Hierarchical -- Bottom-up Approach 22

B. Hierarchical Top-down Approach 24

C. Heterarchical Approach 26

D. Blackboard Approach 26

VIII. Levels of Representation 28

IX. Research in Model-Based Vision Systems 34

X. Industrial Vision Systems 38

A. General Characteristics 38

B. Examples of Efforts in Industrial Visual Inspection Systems 40

C. Examples of Efforts in Industrial Visual Recognition and Location Systems 42

D. Commercially Available Industrial Vision Systems 44

XI. Who is Doing It 46

A. Research Oriented 46

B. Commercial Vision Manufacturers 46

XII. Who is Funding It 49

iii XIII. Summary of the State-of-the-Ar t 52

A. Human Vision 52

B. Low and Intermediate Levels of Processing ... 52

C. Industrial Vision Systems 54

D. General Vision Systems 56

1 . Introduction 56

2. Difficulties 57

3. Techniques 59

4. Conclusions 60

E. Visual Tracking 62

F. Overview 62

XIV. Current Problems and Issues 64

A. General 64

B. Techniques 65

1 . Low Level Processing 65

2. Middle Level Processing 65

3. High Level Processing 66

C. Representation and Modeling 66

D. System Paradigms and Design 66

E. Knowledge Acquisition — Teaching and Programming 67

F. Sensing 67

G. Planning 67

XV. Research Needed 68

A. General 68

B. Techniques 68

1 . Low Level Processing 68

2. Middle Level Processing 68

iv 3. High Level Processing 69

C. Representaion and Modeling 69

D. System Paradigms and Design 69

E. Knowledge Acquisition - Teaching and Programming 70

F. Sensing ..... 70

G. Planning 70

XVI. Future Trends 71

A. Techniques 71

B. Hardware and Architecture 71

C. AI and General Vision Systems 72

D. Modeling and Programming 73

E. Knowledge Acquisition 73

F. Sensing 73

G. Industrial Vision Systems 7^

H. Future Applications 75

I. Conclusion 77

References 78

Appendices

A. Low Level Features and Representations 86

A. Pixels 86

B. Texture 86

C. Regions 86

D. Edges and Lines 87

E. Corners 87

B. Extracting Edges and Areas 88

A. Extracting Edges 88

1. Linear Matched Filtering 88

v 2. Non-Linear Filtering 88

3. Local Thresholding 92

4 . Surface Fitting 92

5. Rotationally Insensitive Operators .... 92

6. Line Following 92

7. Global Methods 92

B. Edge Finding Variations 94

C. Linking Edge Elements and Thinning Resultant Lines 94

D. Remarks on Edge Finding 95

E. Extracting Regions 96

£. Segmentation and Interpretation 98

A. The Computer Vision Paradigm 98

B. An Early Bottom-up Systems 98

C. Problems with Bottom-up Systems 99

D. Interpretation-Guided Segmentation 99

E. Use of General World Knowledge to Guide Segmentation 100

E). 2-D Representation, Description and Recognition 103

A. Pyramids 103

B. Quadtrees 103

C. Statistical Features of a Region 103

D. Boundary Curves 104

E. Run-Length Encoding 104

F. Skeleton Representations and Generalized Ribbons 104

G. Representation by a Concatenation of Primitive Forms 105

H. Relational Graphs 105

vi I. Recognition . 105

E, Recovery of Intrinsic Image Characteristics .... 106

A. Basic Approach 106

B. Shape from Shading 109

C. Stereoscopic Approach 110

D. Photometric Stereo 110

E. Shape from Texture Ill

F. Shape from Contour Ill

G. Shape and Velocity from Motion Ill

F. Higher Levels of Representation 113

A. Volumetric Models 113

1. Generalized Cones . 113

2. Wire Frame Models 113

3. Polyhedral Models 113

4. Combining ID, 2D, and 3D Primitives .... 114

5. Planes and Ellipsoids 114

6. Sets of Prototype Volumes 114

B. Symbolic Descriptions 114

C. Procedural Models 115

G. Higher Levels of Interpretation 116

H, . Tracking 118

JE. Additional Tables of Model-Based Vision 122

£. Tables of Commerically Available Systems 135

K. Glossary 146

L. Publication Sources for Further Information 154

vii Figures

1. A Framework for Early and Intermediate States in a Theory of Visual Information Processing 12

2. An Example of a 2-1/2D Sketch 13

3. Examples of Applications of Computer Vision Now Underway 17

4. Model-Based Interpretation of Images 20

5. Basic Image Understanding Paradigms 23

6. Computational Architecture for a General-Purpose Vision System 30

7. Organization of a Visual System 32

8. Intensity Variations at Step Edges 89

9. A Set of Intrinsic Images Derived from a Single Monochrome Intensity Image 108

viii Tables

I. Examples of Non-Linear Filtering for Extracting Edge and Line Elements 91

1 1 — 1 . Model-Based Vision Systems -- ACRONYM 36

II-2. Model-Based Vision Systems — VISIONS 37

III. [Additional] Model-Based Vision Systems 123-134

IV. Example Efforts in Industrial Visual Inspection Systems 41

V. Example Efforts in Industrial Visual Recognition and Location Systems 43-44

VI. Commercially Available Industrial Vision Systems . . . 136-145

VII. University Organizations Engaged in Computer Vision Research 47

VIII. Commercial Vision System Developers 48

IX. Visual Tracking Approaches 119-120

X. Example Future Applications for Computer Vision Systems 76

.

Computer Vision

I . Introduction

Following the lead of Cohen and Feigenbaum (1982, p. 127) we may consider computer vision to be the information-processing task of understanding a scene from its projected images. Other fields such as image processing and also utilize computers in vision tasks. However, we can distinguish the fields by categorizing them as follows:

Image processing is a signal processing task that transforms an input image into a more desirable output image through processes such as noise reduction, contrast enhancement and registr ation

Pattern recognition is a classification task that classifies images into predetermined categories.

\.

£o rn£u t.e r_ vision is an image understanding task that automatically builds a description not only of the image itself, but of the three dimensional scene that it depicts. The term scene analysis has been used in the past to emphasize the distinction between processing two dimensional images, as in pattern c 1 a s s i f i c i a t ion and seeking information about three- , dimensional scenes.

In this report, we will emphasize the Artificial

Intelligence (AI) aspects of vision and therefore will dwell on image understanding. Image understanding includes among its techniques, many of the methods found in image processing and pattern recognition. However, it also includes geometric

1 .

modeling, and A I knowledge representation and cognitive processing techniques.

Hiatt (1981, p. 3) observes, "For practical purposes, investigators of computer vision often define seeing as gathering visual data for the purpose of making complex decisions.

Computer vision is accordingly, a major adjunct to the study of artificial intelligence." Arden (1980, p. 482) adds, "A view widely held by psychologists is that perception is an active process in which hypotheses are formed about the nature of the environment and sensory information is sought that will confirm or refute these hypotheses. This view of perception, as a form of problem-solving at least at some stage, is held by many researchers in artificial intelligence." Thus, computer vision with its many current and potential applications is a major

Artificial Intelligence (AI) topic today. The following chapters are an attempt to provide an overview of this important and growing field. In addition to reviewing the conceptual basis for computer vision and its associated techniques, we will also review their implementation in vision systems, both research and commercial

Chapters II, III and IV further define computer vision, reviewing its origins and its relation to human vision. Chapter

V briefly indicates applications of computer vision.

Chapters VI outlines a basis for a general purpose computer vision system, in the process providing a structure for comprehending systems with lesser aspirations.

2 .

Chapter VII reviews the basic control structures suitable for vision systems. Chapter VIII examines the successive levels of repr esenta t ion found in computer vision systems.

Vision systems, both research and industrial are covered in

Chapters IX and X. Information on who the principal participants are in the computer vision field is given in Chapters XI and XII.

The state-of-the-art, current problems and issues, research requirements and future trends are presented in Chapters XIII to

XVI.

Reviews of the repr esent a t ion methods and processing techniques used in computer vision are given in the appendices.

Appendix A reviews repr esentat ions for low level image features such as pixels, edges, regions, etc.

Appendix B reviews techniques (such as filtering and thresholding) for extracting edges and regions.

Appendix £ discusses methods for symbiotically combining with interpretat ion

Appendix D provides an overview of methods (such as statistical features, boundary curves, primitive forms, and relational graphs) for succinctly representing image features and utilizing the resulting r epresentat ion for recognition.

Appendix E reviews the various methods for extracting intrinsic image character istics such as surface shapes, ranges and orientations from 2-D images. Also included is a discussion of extracting shape and velocity from successive images of objects in motion.

Appendix £ provides an overview of higher levels of

3 .

representa t ion--both volumetric and procedural models, and symbolic descriptions such as relational graphs.

Appendix G reviews how intrinsic images can be given higher level i n t e r pr e t a t i on s by segmenting intrinsic surface character istics into objects (either by model or symbolic description matching) yielding object recognitions or scene descriptions

Appendix H reviews real-time visual tracking, needed for guidance, assembly and other tasks.

A glossary of terms in computer vision is given in Appendix

K. Publications sources for further information are listed in

Appendix L.

4 . ,

II. Definition

Computer (computational or machine) vision can be defined as perception by a computer based on visual sensory input.

Horn (1979, pp. 70-71) character izes machine vision (from a robotic orientation) as follows:

An optical system forms an image of some three- dimensional [3-D] arrangement of parts. The two- dimensional [2-D] image is sensed and converted into machine readable format. It is the purpose of the machine vision system to derive information from this image useful in the execution of the given task. In the simplest case the information sought will concern only the location and orientation of an isolated object--more commonly, objects have to be recognized and their spatial relationships determined. This can be viewed as a process in which a description of the scene being viewed is developed from the raw image. The description has to be appropriate to the particular application. That is, irrelevant visual features should be discarded, while needed relationships between parts of objects must be deduced from their optical projection

Barrow and Tenenbaum (1981, p. 573) enlarge on this from a more general viewpoint, stating:

Vision is an informat ion-processing task with well-defined input and output. The input consists of arrays of brightness values, representing projections of a three-dimensional scene recorded by a camera or comparable imaging device. Several input arrays may provide information in several spectral bands (color) or from multiple viewpoints (stereo or time sequence). The desired output is a concise description of the three-dimensional scene depicted in the image, the exact nature of which depends upon the goals and expectations of the observer. It generally involves a description of objects and their interrelationships but may also include such information as the three- dimensional structure of surfaces, their physical character istics (shape, texture, color, material), and the locations of shadows and light sources...

In this report, we will follow the lead of Ballard and Brown

(1982, p. 2) and define Computer Vision as "the enterprise of automating and integrating a wide range of processes and

5 representat ions used for vision perception." The emphasis will be on generating a description or an understanding of the scene from which the image was obtained. The next chapter will enlarge on this point of view.

6 . .

III. Origins of Computer Vision

Computer vision is based largely on ideas from three related fields: image processing, pattern recognition and scene analysis

Rosenfeld (1981, p. 596) states that, "In image processing, the input and the output are both images with the output an improved version of the input." In preprocessing we have gray- scale modification (usually to normalize scene brightness and contrast), sharpening (to restore the weakened high spatial frequencies) and smoothing to remove noise in the image. If two images have to be compared, they may have to be registered (i.e., geometrically transformed to make then congruent) before matching them

In pattern recognition the input is the image, but the output is a description of the image based on a priori knowledge of expected patterns. The computer usually starts with a list of brightness values associated with the array of hundreds of thousands of points corresponding to the image. Recognizing a pattern means replacing this mass of undigested data with a much simpler more useful description. However, it is usually impractical to search directly for examples of the patterns we are interested in this array of intensity values. Instead, it is often more convenient to first search for examples of simpler patterns (such as edges and regions), referred to as features. A

7 :

simplified description of the image constructed from these features can then be used as the basis for pattern recognition*.

Scene analysis is concerned with the tr ansf or mat ion of simple features into abstract descriptions relating to objects that cannot be simply recognized based on pattern matching.

( 8 1 4-5) to Brady 1 9 A , pp. referring scene analysis as image understanding (IU) expands on the differences between pattern recognition and IU, observing that typically pattern recognition systems are concerned with recognizing the input as one of a usually small set of possibilities. Pattern recognition systems are mostly concerned with images of basically two dimensional objects. When the images are of three dimensional objects, such as engine parts, they are effectively treated as two dimensional, by considering each stable position as a separate object. In contrast, IU has dealt extensively with three dimensional images.

More significantly, pattern recognition systems typically operate directly on the image. IU approaches to most visual processes (e.g., stereo, texture, shape from shading), operate not on the image but on symbolic representations that have been computed by earlier processing such as .

Arden (1980, pp. 482-483), taking a historical perspective, contrasts the pat t ern-r ecogn i t ion and the IU or AI approach as follows

* Pratt (1978, pp. 568-569) indicates that in many cases for simple objects in uncluttered imagery, it is feasible to extract needed features by transformations of the images (e.g., using a two dimensional Fourier transform). The resulting feature space can be partioned into regions for classification into objects, based on prototypes.

8 Since the early sixties there has been a marked divergence between the pattern-recogni tion and AI ap- proaches to computer analysis of images. The former approach has continued to stress the use of ad hoc image features in combination with statistical classification techniques. More recently, use has been made of "syntactic" methods in which images are recognized by a "parsing" process as being built up hierar chically of primitive constituents. By contrast, the AI approach has employed problem-solving methodologies based on extensive use of knowledge about the class of images, or "scenes," to be analyzed...

Much of the work on computer vision has dealt with images of scenes containing solid objects viewed from nearby. These are the sort of images with which a robot vision system must cope in using vision to guide its motor activities, including manipulation and locomotion. The analysis of such images is usually called "scene analysis," to distinguish it from the analysis of images that are essentially two- dimensional, such as photomicrographs (which show cross-sections), radiographs (which show projections), satellite imagery (in which terrain relief is negligible), documents, diagrams, maps, and so on. The methods of computer vision, however, apply equally to these latter classes of images; the term need not be restricted to three-dimensional scene analysis.

In this report we will only treat image processing and low- level vision to the extent needed for image understanding.

Pattern recognition, which has broken off from AI and has also become a separate field, will also be given minimum treatment.

To a large extent, the terms scene analysis, image understanding, and computer vision have become synonymous. The more advanced vision systems have a strong AI flavor, being heavily concerned with symbolic processes for representing and manipulating knowledge in a problem solving mode. Though vision systems that primarily depend on pattern recognition techniques are also treated in this report, the intent is to concentrate on the knowledge-based scene analysis (IU) approach which is the major focus in AI computational vision.

9 In the next chapter we will briefly look at the relation that human vision has to the AI approach to computer vision. IV. Relation to Human Vision

MIT’s Harr and Nishihara (1978, p. 42) take the view that

"Artificial Intelligence is (or ought to be) the study of infor- mation processing problems that character ist ically have their roots in some aspect of biological information processing." They developed a computational theory of vision based on their study

of human vision. Figure 1 represents the transition from the raw image through the primal sketch to the 2-1/2D sketch (indicated in Figure 2), which contains information on local surface orientations, boundaries, and depths.

The primal sketch, reminiscent of an artist’s hurried drawing, is a primitive but rich description of the way the intensities change over the visual field. It can be represented by a set of short line segments separating regions of different brightnesses. A list of the properties of the lines segments, such as location, length, and orientation for each segment can be used to represent the primal sketch.

The late Dr. Marr and his associates’ development of a human visual information processing theory (Marr, 1982) has had a substantial impact on computational vision. Figure 1

A Framework for Early and Intermediate States in A Theory of Visual Information Processing

Intensity Representations Visible Surface Representations

The computations begin with representations of the intensities in an image — first the image itself, (e.g., the gray-level intensity array) and then the primal sketch, a representation of spatial variations in intensity. Next comes the operation of a set of modules, each employing certain aspects of the information contained in the image to derive information about local orientation, local depth, and the boundaries of surfaces. From this is constructed the so-called 2-1/2 dimensional sketch. Note that no "high-level” information is yet brough to bear: the computations proceed by utilizing only what is available in the image itself.

Source: Marr and Nishihara, 1978, p. 42.

12 Figure 2

An Example of a 2-1/2D Sketch

t i

^ ! N

*—r x \

/ / f v ^ t

• • %.•••"*< > /: ? v,

A candidate for the so-called 2- 1 /2-dimensional sketch, which encompasses local determinations of the depth and orientation of surfaces in an image, as derived from processes that operate upon the primal sketch or some other representation of changes in gray-level intensity. The lengths of the needles represent the degree of tilt at various points in the surface; the orientations of the needles represent the directions of tilt... Dotted lines show contours of surface discontinuity. No explicit representation of depth appears in this figure.

Source: Marr and Nishihara, 1978, p. 41.

13 .

Barrow and Tenenbaum (1981, pp. 579-580) also seek insights

into the organization of a high-performance, gener al-pur pose visual system from observations of the behavior of the human visual system. They observe that a person looking at a natural

scene, such as the landscape, is aware of many intermediate

levels of description, such as surfaces, volumes, and shadows.

Over a wide range of viewpoint and illumination, a person can readily estimate quite accurately such local surface character istics as reflectance, color, texture, distance, and orientation, as well as such global characteristics as size, and shape. Boundaries are seen not merely as intensity discontinuities, but as physically significant events-- discontinuities in distance, orientation, reflectance, incident illumination, and so forth. Humans also experience immediate global perceptions: the type of scene (landscape), the dominant orientations of the support plane and the gravitational vertical, the direction of illumination, and the viewpoint with respect to these. Thus, what a person sees are intrinsic character istics of three-dimensional surfaces, not transient features of a two- dimensional image as observed under a particular set of viewing conditions

They also note that perception by humans of surfaces and

surface boundaries does not appear to depend critically upon contrast nor familiarity with the specific objects depicted.

There are strong indications (c.f. Gevarter, 1977) that the

interpretat i ve planning areas of the human brain set up a context

for processing the input data. (This is captured by Minsky's

(1975) AI "frame" concept for knowledge representation). The

14 brain then uses visual and other cues from the environment to draw in past knowledge to generate an internal representat ion and interpretation of the scene. This knowledge-based expectation- guided approach to vision is now appearing in the advanced AI computer vision systems (discussed in later Chapters).

Barrow and Tennenbaum suggest that insights gained by studying human vision, coupled with experience resulting from building machine vision systems, can provide the basis for a computational model of visual processing. Their approach to a general purpose computer vision system will be pursued in Chapter

VI, but now we pause to motivate this pursuit by briefly reviewing applications of computer vision already underway in this rapidly growing field.

15 V . Applications

( 1 Brady 1 9 8 A , p. 2) states that, "There is currently a surge of interest in image understanding on the part of industry and the military." Current computer vision applications, primarily

( 8 1 taken from Brady 1 9 A , pp. 3-4), are listed in Figure 3.

16 . .

Figure 3: Examples of Applications of Computer Vision Now Underway

# OF INDUSTRIAL PROCESSES

Object acquisition by robot arms, for example sorting or packing items arriving on conveyor belts.

Automatic guidance of seam welders and cutting tools.

VLSI-related processes, such as lead bonding, chip alignment and packaging.

Monitoring, filtering, and thereby containing the flood of data from oil drill sites or from seismographs.

Providing visual feedback for automatic assembly and repair.

INSPECTION TASKS

The inspection of printed ciruit boards for spurs, shorts, and bad connections. Checking the results of casting processes for impurities and fractures.

Screening medical images such as chromosome slides, cancer smears, x-ray and ultrasound images, tomography.

Routine screening of plant samples.

Inspection of alpha-numerics on labels and manufactured items

Checking packaging and contents in pharmaceutical and food industries.

Inspection of glass items for cracks, bubbles, etc.

# REMOTE SENSING

Cartography; the automatic generation of hill-shaded maps, and the registration of satellite images with terrain maps.

Monitoring traffic along roads, docks, and at airfields

Management of land resources such as water, forestry, soil erosion, and crop growth.

Exploration of remote or hostile regions for fossil fuels and mineral ore deposits.

17 Figure 3 (cont.)

MAKING COMPUTER POWER MORE ACCESSIBLE

Management information systems that have a communication channel considerably wider than current systems that are addressed by typing or pointing.

Document readers (for those who still use paper).

Design aids for architects and mechanical engineers.

MILITARY APPLICATIONS

Tracking moving objects.

Automatic navigation based on passive sensing.

Target acquisition and range finding.

AIDS FOR THE PARTIALLY SIGHTED

Systems that read a document and speak what they read.

Automatic "guide dog" navigation systems.

18 .

VI . Basis for a General Purpose Image Understanding System

Barrow and Tenenbaum (1981, p. 573) observe that in going

from a scene to an image (an array of brightness values) that the image encodes much information about the scene, but the

information is confounded in the single brightness value at each

point. In projecting onto the two-dimensional image,

information about the three-dimensional structure of the scene is

lost. In order to decode brightness values and recover a scene description, it is necessary to employ a priori knowledge

embodied in models of the scene domain, the illumination, and the

imaging process.

Scene models can be devised to describe the three-

dimensional world in terms of surfaces and objects.

Illumination models can be utilized to describe the primary light sources, their positions, spatial extents, intensities,

colors, and so forth.

Sensor models describe the photometric and geometric

properties of the sensor, which can be used to predict how a

particular scene, observed from a particular viewpoint and under

particular illumination conditions, is transformed into the two-

dimensional array of brightness values that constitutes the

input

As indicated by Figure 4, computer vision is an active

process that uses these models to interpret the sensory data. To

accommodate the diversity of appearance found in real imagery, a

high-per f or mance general-purpose system must embody a great deal ,

of knowledge in its models.

19 :

Figure 4

Model-based Interpretation of Images

Source Barrow and Tenenbaum, 1981, p. 573.

20 The next three chapters review the work in devising computer vision systems. Chapter VII discusses paradigms for computer vision systems. Chapter VIII presents the levels of representation appropriate to high performance systems. Chapter

IX reviews research efforts in building such systems.

21 .

VII. Basic Paradigms for Computer Vision *

In broad terms, an image understanding system starts with the array of pixel amplitudes that define the computer image,

and using stored models (either specific or generic) determines

the content of a scene. Typically, various symbolic features

such as lines and areas are first determined from the image. These are then compared with similar features associated with

stored models to find a match, when specific objects are being

sought. In more, generic cases, it is necessary to determine

various character istics of the scene, and using generic models

determine from geometric shapes and other factors (such as

allowable relationships between objects) the nature of the scene

content

A variety of paradigms have been proposed to accomplish

these tasks in image understanding systems. These paradigms are

based on a common set of broadly defined processing and manipulating elements: feature extraction, symbolic

representation, and semantic interpretation. The paradigms

differ primarily in how these elements (defined below) are organized and controlled, and the degree of artificial

intelligence and knowledge employed.

A. Hierarchical-Bottom-up Approach

Figure 5A is a block diagram of a hierarchical paradigm of

an image understanding system that employs a bottom-up processing

approach. First, primitive features are extracted from the array

of picture element intensities that constitute the observed

*This chapter is primarily based on Pratt, 1978, pp. 570-57*1.

22 Figure 5

Basic Image Understanding Paradigms

A. Hierarchical Bottom-up Approach

Features Symbols

B. Hierarchical Top-down Approach

Visual Models0

Feature Symbolic Semantic "* I Extraction Representation Interpretation ]

Image I Description Features Symbols i

i

Feature Control Symbol Control

Feature Control

C. Heterarchical Approach

D. Blackboard Approach

Source: Pratt, 1978, pp. 570-574.

23 image. Examples of such features are picture element ("pixel”) amplitudes, edge point locations and textural descriptors.

Next this set of features is passed on to the semantic

interpretat ion stage where the features are grouped into symbolic representations. For example, edge points are grouped into line

segments or closed curves, and adjacent region segments of common attributes are combined. The resultant symbol set of lines, regions, etc., in combination with a priori stored models, are then operated upon (i.e., semantically interpreted) to produce an application dependent scene description.

Bottom-up refers to the sequential processing and control operation of the system starting with the input image. The key to success in this approach lies in a sequential reduction in dimensionality from stage to stage -- vital as the relative

processing complexity is generally greater at each succeeding

stage. The hierarchical bottom-up approach can be developed

successfully for domains with simple scenes made up of only a

limited number of previously known objects.

B. Hierarchical Top-down Approach

This approach (usually called hypothesize and test), shown

in Figure 5B, is goal directed, the interpretat ion stage being

guided in its analysis by trial or test descriptions of a scene.

An example would be using — matched filtering

— to search for a specific object or structure within the scene.

Matched filtering is normally performed at the pixel level by

cross correlation of an object template with an observed image

field. It is often computationally advantageous, because of the

reduced dimensionality, to perform the interpretation at a higher

24 level in the chain by correlating image features or symbols rather than pixels.

25 .

C. Heterarchical Approach

Hierarchical image understanding systems are normally designed for specific applications. They thus tend to lack adaptability. A large amount of processing is also usually required. Pratt (1978, pp. 572-573) observes that often much of this processing is wasted in the generation of features and symbols not required for the analysis of a particular scene. A technique to avoid this problem is to establish a central monitor to observe the overall performance of the image understanding system and then issue commands to the various system elements to modify their operation to maximize system performance and efficiency

Figure 5C is a block diagram of an image understanding system that achieves heter archical operation by distributed feedback control. If the semantic interpretation stage in the model experiences difficulty in working with its input symbol set, control can be fed back to the symbolic representation stage to request a new set of symbols. This action in turn may result in a command to the feature extraction stage requesting a modified set of features. When required, direct feedback control is also possible between the semantic interpreter and feature extractor. This paradigm provides an important auxiliary benefit in addition to flexibility. That is, the dimensionality of the feature and symbol sets can be kept at minimum levels because the sets can be restructured on command.

D • Blackboard Approach

Another image understanding system configuration called the blackboard model has been proposed by Reddy and Newell (1975).

26 Figure 5D is a simplified repr esen tat ion of this approach in

which the various system elements communicate with each other via

a common working data storage called the blackboard. Whenever

any element performs a task its output is put into the common data storage, which is independently accessible by all other

elements. The individual elements can be directed by a central control, or they can be designed to act autonomously to further

the common system goal as required. The blackboard system is

particularly attractive in cases where several hypotheses must be

considered simultaneously and their components need to be kept

track of at various levels of representation.

27 VIII. Levels of Representation

A computer vision system, like human vision, is commonly considered to be naturally structured as a succession of levels of representation. Tenenbaum et al. (1979, pp. 242-243) suggests the following levels (listed from low to high):

Images Pictorial features Intrinsic surfaces and bodies 3-D surfaces and bodies Space map Symbolic relationships

Tenenbaum et al. contrast this with current industrial vision systems relying heavily on detailed models of particular objects to accomplish tasks, employing levels of:

Images Pictorial features (Edges & Regions) 2-D feature attributes Objects (specific 2-D views)

Current industrial systems usually begin by thresholding the original gray-level image to obtain a binary array. Pictorial features (regions or edges) are then extracted from the gray- level or binary image and equated with surfaces or surface boundaries. These 2-D attributes of these pseudo-surface features are then symbolically matched against 2-D models

(r epresen ting specific views of expected objects) to achieve recognition. As these industrial systems rely on prototype 2-D representations of anticipated objects, they are very limited for use in more general environments.

Barrow and Tenenbaum (1981, pp. 580-581) suggest the levels given in Figure 6 as those appropriate to a general-pur pose vision system. The processing steps in the figure that transform each level of representat ion to the next require knowledge from

28 models of the physics of the imaging process, the illumination and the scene. At the lower levels, these models help resolve the ambiguity associated with going from a three dimensional world to a two dimensional image. At the higher levels, these models provide a foundation for organizing surface fragments into recognizable objects.

29 Applicable Models Tasks Utilized

Sensor

I Process 1 f Scaling Correction )" Photometry Z3T Gpnmpt-ry

[Representation ] Input Image

Continuity f 2-D Analysis '}**<- Anomaly

I Features Alerting - Edges - Statistics

Geometry

. Photometry Recovery — JV finnrlnnlry

Navigation Intrinsic Manipulation Characteristics! j_

Surface Homogeneity Grouping C Boundary Interpretation Analogical T Reasoning Surfaces, Volumes

, * . Primitive [ Recognition Object T Prototypes Object Bodies Recognition —*- Complex Recognition Object (r > Prototypes

Scene T Conf igurations Recognition Convention Inference Intention ( Causality T Event Events Recognition

Figure 6: Computational Architecture for a General-Purpose Vision System

Source: Barrow and Tenenbaum, 1981, p. 58.

30 The input models required to do the processing at each level are shown at the right. On the left are shown the tasks for which vision can be used at each level of processing

Tenenbaum, et al., (1979, pp. 254-255), sketch in Figure 7 another way in which to view an organization of a vision system.

They divide the figure into two parts. The first is image oriented (iconic), domain independent, and based on the image data (data driven). The second part of the figure is symbolic, dependent on the domain and the particular goal of the vision process.

The first portion takes the image, which consists of an array of intensity of picture elements ("pixels," e.g.,

1000x1000), and converts it into image features such as edges and regions. These are then converted into a set of parallel

"intrinsic images", one each for distance (range), surface orientation, reflectance*, etc.

The second part of the system segments these into volumes and surfaces dependent on our knowledge of the domain and the goal of the computation. Again using domain knowledge and the constraints associated with the relations among objects in this domain, objects are identified and the scene analyzed consistent with the system goal.

*Fraction of normal incident illumination reflected.

31 Figure 7

Organization of a Visual System

Low Level Sensor

Source: Tenenbaum et al., 1979, p. 255.

32 .

Reviews of the representation methods and techniques for performing the operations indicated in Figure 7 are given in the appendices

The next chapter (Chapter IX) provides an overview of research in model-based vision systems. These systems endeavor to start with an image and produce, using a priori models, a desired description of the original scene, thereby spanning the complete hierarchy of Figure 7. The systems are constructed using the various representations, techniques and models reviewed in the appendices.

33 1

IX. Research in Model-based Vision Systems

Most research efforts in vision have been directed at exploring various aspects of vision, or toward generating particular processing modules for a step in the vision process rather than in devising general purpose vision systems. However, there are currently two major U.S. efforts in general purpose vision systems: The ACRONYM system at Stanford University under the leadership of T. Binford, and the VISIONS system at the

University of Massachusetts at Amherst under A. Hanson and E.

Riseman.

The ACRONYM system, outlined in Table II— 9 is designed to be a general purpose, model-based system that does its major reasoning at the level of volumes rather than images. The system basically takes a hierarchical top-down approach as in Figure 5c. ACRONYM has four essential parts: modeling, prediction, description and interpretation. The user provides ACRONYM with models of objects (modeled in terms of volume primitives called generalized cones) and their spatial relationships; as well as generic models and their subclass relationships. These are both stored in graph form. The program automatically predicts which image features to expect. Description is a bottom up- process that generates a model-indepentl ent description of the image.

Interpretation relates this description to the prediction to produce a three-dimensional understanding of the scene.

The VISIONS system outline in Table II-2, can be considered to be a working tool to test various image understanding modules and approaches. Rather than using specific models, its high level knowledge is in the form of framelike "schemas" which

34 represent expectations and expected relationships in particular scene situations. VISIONS is based on monocular images and does its reasoning at the level of images rather than volumes.

Other research efforts in model-based vision systems are

summarized in Tables III in Appendix 1^.

It will be observed that each system is individually crafted by the developer to reflect the developer’s background, interests and domain requir ements. All, except ACRONYM (and to an extent

MOSAIC), use image (2-D) models and are viewpoint dependent.

Models are mostly described by semantic networks, though feature vectors are also utilized. The systems capitalize on their choice to limit their observations to only a few objects, by using predominantly a top-down interpretation of images, relying heavily on prediction.

35 ' « « «

QJ o c CO QJ CO c ft- •*- 4-5 -X . o f- c c co o C 4-5 9f- QJ CJ o O E 4- o CJ C 9 C CJ QJ X T- 4-> CO f* QJ O to •»9 CO QJ co CO o CO C 4J ft. *9 ft. • X CO > •F— QJ • «*> 4-5 o CO 4-> C ^ 4-5 O o p. X QJ QJ > > 4-> o 4-> K f- X CO CO 4-5 CO X C CO 4-5 4-5 QJ W •r- CO X 4J •r- •F- CO J- co 5 QJ QJ 4J to o • X O co F *r- o 4- X Q> ft. QJ 4- c QJ CO • 4-5 4J ft- QJ CO X O ft. ft- CO i— X CO ft- QJ 4-> c i CO o QJ 3 9 QJ r— ft- f- X 4-5 9 C CO C CO QJ QJ o QJ 4- 4J 4-5 • C f- Oi E 4-5 to O Q> E QJ >, CO 9 tO C C ID Q) f- 45 4-5 t- X ft- QJ CD CO C > 9 •r- •r- QJ — co C f— X 9 XX *—

4-5 9 QJ 4. 1 CO OCX • to O X 4-> ft. f- XX ft. 9 QJ 45 90 X QIC o. CO to to CO ft. CO C ft. X ft. CO O QJ t- QJ • X 4- •r- ID IC C t o X QJ X X ft- O C CO X •*» E w cr O 45 co 4-> X QJ CO QJ o CO 4-5 CO E CO 4-> r— X 9 4-> f- >»4> QJ CO 3 C ft. C QJ 4-> CO *— CO X f- ft. c E X n- CT f- > E • •*- £ C »— 9 •f- o CD QJ ft. 4-> 9 9*f- ft. p. E CO >» ft- t— QJ •e C X o (0 QJ CD O QJ 4-> O X 4- •»- CO f- ID -o O 4-5 ft. X *r- *»“5 QJ 4-> O X QJ X > X ft. ft- O C O O f— 10 X co ft. 4- u O O o o 4J f- E 4J ft. f- ft. 7 4-> (D (D 5 CD X QJ co ; ic cox QJ (0 to co E co CO 4-5 40 > qj ft. E X 4-> ft. C QJ C QJ > E co QJ QJ to qj o QJ f- C O X X ft. c E f- X c o X 4- T X o X QJ o c 9 4-> CJ O QJ O X X ft- CO (0 4- O r— QJ 4-> •*- X C 4-> O E co ft- X ft- O 4J X CC 4Jf CO X O ft. QJ 9)0 QJ 4_> > 4-5 SC SC O to X X QJ9 O 1C •*- «o x to CO C . c c X • * •*- o o t- 0) o o CO *r QJ ft. X oi ioc x 9 C X > C CO QJ Q) QJ x x 4-> O •»" i- Of- ft. ft. > co QJ o •*- ft- 9 f- QJ • 3 X ft. E c • •r* CO CJ 4-> ft. X QJ 9 ft- X ft- O CO (O CO X CJ aj CO QJ QJ 4-> •o *— »— o QJ 9 4-J D. QJ > C 01 (0 7 la- *e x x X X x x 10 t f o c QJ o CO 4-> CJ C 4-) C Q) C CO E *r- O X ID 9* O C 40 4- r- X qj e x e co f- CO CO Q) o 0 0(0 u to to oi E c CJ QJ X cu ic qj .coto uft- to E ft- 4- (O 0) 5 21 V-4- QJqj QJ >, ft. ft- o 4-> 45 X s- ft- 45 X (/> O 9 ft- CO QJ t- 4- 3 co OW o co< ux ec to X -w o x X co x X c g X O QJ >v CO C co o CO o c. ts o QJ to t • a; . CO 4-> »— CD CO QJ (O >» CJ 45 X CO CO 9 c c X f— 7" c c r— CO O'- X 10 •*- E Q> co O CJ CO X c u 0 0 5 «c f- QJ I 4J > 40 CO t— CO f* 4-5 40 ft CO X X QJ QJ 5 CO QJ CJ 4-J ft. QJ ft. U (J QJ c c (J CO X QJ 5 QJ (J QJ QJ ft. 4-J Table X 9 (S X > X o QJ > QJ QJ CO 9 9 QJ E •x**- C •f X CO x E (O ft. CO *9 X to •X 3 4-5 (O 4-5 4-> ft. 4J C CO 4-5 E C X *»“ 9 > CO *5 QJ O X C to O CO f- QJ Or- O X CQ 4J r— X e> E 40 f- 5 x o -- E c co CO 4-5 1 F— co QJ CJ Of Ql QJ X > QJ f CO QJ 4-J f- *r- 9 co E X U X 1 ft. c ft. CO f— co ft. ft- X > QJ T- O O 7“ CO fO X o 4-> X QJ (J *— CO X 4-5 f- QJ9 4- X ft. 4J a: 5 X ft. a •— c QJ X f- QJ +J to f- 4-5 o QJ 3 QJ 40 QJ CO QJ c O ft. ft. CO QJ ft 3E < c QJ E U 4-5 CO E X •»- E -r- E QJ E QJ E ft. O QJ C C QJ 9 QJ 4-J QJ 4- c x *QJ 5 x ft. QJ •— •— C ft. X X O X Kl CO QJ C QJ C X *- +5 ft 4- •»- •*- o QJ CO CO (J o o X QJ X CO 9 CO o O co X 4- X 2: > to o s: > ft. co X 5 C E 9 c co C Qj c QJ 1 QJ »-4 3 o QJ 4-> f- > *o > 00 X CO 9 CO 4-> E ft. 3 9 E 4-J CO >» CO 4-5 to fc QJ 45 ft- fH co QJ CO 9 CO *r- •*“ C • QJ to -« N— 4-5 4-> 9 CO to O CO 4-> X CO E C co C C X QJ c C f- o 0) QJ O >> QJ co • QJ X O QJ to QJ C • 4- 9 4-J tO CO CO CO 4-> (J CO to X o CO O C 0 CO CO QJ QJ CO C QJ o 45 QJ •F— • •*- CO o >> QJ 4-5 ft- *- E i E c ft. E c 9 E CJ ft. ft* to c o X CO 41 <0 1 o o X QJ CO ft- m co QJ C > c QJ ft- f- 4-5 X 9 o o X F E 4- 4- c o ft- O QJ *-4-4-5 C QJ o CO c •» o x a: r— O 9 co ft. E o c m 4-5 4- f- i. 4- 4-J *T ft. X 4^ O •r- c QJ QJ 45 •r- 9 10 f- ft- O to X r- X C X c X fv f- o 4-5 9 O ID IC 4) X •*“ C c CO 4-> 4-5 9 >• n- CO C •»- X ft- to QJ co o CO X 9 »— QJ to CD QJ • ft- o r— C • • — QJ C C QJ CO ft. . > 4-5 4-5 to ft- QJ •r- o CO f- O > QJ CO CO >, Cl w CO CO X 1 r— • 4-J • QJ ft- QJ QJ (J QJ C E QJ X o 9 QJ ro QJ 4-> X 4- 4J X F— X CO 9 *— •*- ft- ft- ft- O c X c E C X co ft. f- co CJ QJ QJ C 9 f— co CO O o CO f- to c o 4-p • 4- CO 4-> ft. X t- X •»“ 4-5 E c ^ •r— E 4-J o C •r- CO o X u *— X (J CO CO c O O co 4- •»- (4- QJ QJ E ft. QJ > 9 3E co QJ co O c *“ T— to to 1 »— X X ft- O O > c •F- co o CO CO 9 ft- 4-5 CO QJ QJ ro CO ' to X QJ ft- CO ft. ft. X o X o X QJ C ^ •*- ^ Z ft. CO <*- X»— QJ QJ O f- X x 45 CO »F- co C CJ ft- X o QJ 4- a QJ C CO 4-5 QJ *— »— o C > CJ QJ o *— *— O X c c f- i- ft- X C X O CO QJ u to f- CO U 45 X 4-5 4-5 •r- CO 45 ft- e: QJ 5 X CO QJ O *C I i— CO CO > C QJ C O •• to ft. CO tc < c o ft- f- E r— CO QJ c CO CO CO k 4-5 ft- 4-5 4-J 1 W-r 4- CO X . X C QJ QJ Q CwU O «- *— QJ QJ co QJ E QJ to CO o o 5 E to C QJ X C u E -a X to CO ft- O r- 4> to QJ 4-» E ft. 4-5 4-5 CO CO 4-5 CO co f- ft ID O X CO QJ QJ QJ X •*- 4-p QJ r— 3 co ft- o ft- co C 45 ft. 9 O ft- c O X 1o X ft. X X X C QJ X ft- ft. ID to QJ 4- ft. X O CO X ft. Xf- QJ • O > f- O QJ QJ ft- co V) Xl ft. QJ to to X ft- CO X •• QJ QJ £ to QJ C 4-5 C 4-> X X QJ > CO 4-> X O ft- QJ X QJ ft- o E to o C X f— to U QJ U O X CJ •»- X X O X 45 45 *— to r- QJ o x ft. O QJ E f- 9 QJ C CO 4-> X CO f- ft- ft- QJ CJ QJ c QJ 4-5 x E CO CO CO X X •»“> QJ 4-5 QJ X O f— ft- X to c °*~ if- > CO ft. co QJ CO CO 9 O QJ O X 4- -X C .X QJ CJ X co E re o QJ >» 3 X •r- QJ X f- 4-5 ft- 4-5 C O CO •*» ^ CO ft- o C CO QJ QJ X 45 o OCOCLLJ 2C X X X z: z; X •— to •—

36 « < > X c 4-1 to • co o 3 3 C 3 0) 0> (/S TO OJ C Q. U CO co **w3 c CO O e 3 CJ e •f- X CJ CJ o o co CL uJ O y_ 3 CO C- E to 4-> 00 p- Qj to Qi ru CJ X c >nf— ^ u o CJ o 00 r> DO

CJ to c C c #1 3 • i d: >»3 31 CJ • £ o © o 3 -X 3 c uj k. O 3 4J to o T* 31 C to 4-1 CO o > o to CJ CO CJ f— 3 4-1 k. 4-1 CO c »— CJ 4-1 _ E o 3 r— r— k. X CJ 1 CO CJ 3 to 4-1 © 4- s 2- co S CO 31 CO *** £ co CJ 4-1 3 CL X > to E •t— 4J c CJ Z co •*- CJ o u c © o CO c © 3 CO © k- CO c k. c 31 •r- 4-> o 3 X X c o c E o to CJ *© £ © >» f O T— kJ 3i C -X X 31 •r- Tu u o 4- © > DO •C 4-> C CO o •r* u 4- 31 k. © 4- o C • u to CO 3 CO X CJ k. X to O C 4-> to 4-1 © 3 CSJ C r— X 1— CJ o r— CO u r— •t— CL 4-» X k- 4-1 CO 3 3 © CO 1 o 3 CJ 4-» 3 to CO X t- •r- CJ Q) to 3 CO © C X 3 CO to t-4 3>f- e»— m z> C 3 c k. CJ X 3 U 3 e X 4- © 4-> © CO ’CO O 4- C > o 4-> 4-1 4-> £ O co E CO o E 4-> CO X •C E O 1- o •r- >» CJ E CL •* © © k. to © 3 CO © 4-1 > CO E k. to (J k. 3 X X CO to 4-» 3 CL © *© C k CO o 0) X CJ r— E CL «r— co co E u r— 4-1 4-1 cr •f- © © CL 3 CJ 4-> CO s co (J X to CJ CO CO X CL O 4-1 3 3 c X >> X CJ k- CO -r- CO k. X CJ to co L> o o E X 3 to k. c C © 4-> X CO to co k. X 4-> 31 U 3 c CJ k. k. 4-> k. w X O CO CO 3 O »- CO CJ 0) k. X O CJ u to 3 CL k. CO 4-1 3 (J E to CL CL £ CO in c co CJ k. X 4- X CJ c; O CL CO OJ 4-1 C © © c X >> >» © 1 CJ c. 4-1 CL 25 O £ c to to CO Ql to CO 4-1 co to o © X X 3 •“ cr

o » to © O r: o 4-> co c c -o c u I u id ai o; Of O © co o l- •*- 4-> ® k. E k. CO o o « i X CD D 0) O V X3 CO r— u 4-> i— CO C © >> © 0/ E © ac a) c to CO '—** © IC CO f W L > Xf- x O >>4J 3 © © f— fO © CO f- *r- co CD O lO L O r- co Cl cn k. f- 3 L k- co CL o 3 V >»•— 3 © ft) r— CL4-> C 3 co CO 4-> © f Q.TJ W O 01-0 k. 0> k ID CC X X*w 3 3 -o c > 0/ 4 aj c 4-> tO © X a; #o o 3 4-1 © CO O aj 4-> c CDt- C IQ4J k. © CO 4-> li- •* x o I— C CL) 3 £ f- k- 3 IQ C k- f- X © * c c © o cn ©_ k O O 4J CO CM to ici © k. © O U4J > 3 »— i- © cn © 3 CT> O Cn cn-o §, o •*- X 2 © 3 •»- to f <0 0)0 r oiQ) +j 4-> o k a k k x u E E e 4-i to f— CO CO -3 *-• o E 3 OHO Qj 3 O X) o f- o •o •*- x © CO k*D 4J 3> CO to 3 3' 3 a) © © © © «0 © O O r- c CO k. ' 3 r- tO CJ X 3 O CO 4-> X o CO =>

CO to to t- CO 4-> © k. X © © 3 E © 4-) to to CO to 3 E © tO 3 3 *r» 3 -X © 4-> ^ l co to 3 i— CI o k- •f- X • 3 **- © k. © O o T- CO o 4-1 © (J 1 O k. 4- k. f > r> «0 © X 4-1 f u +-> , s c o 4-1 3 4-» 3 © k. 4-1 3 B) 4-1 © © CO 3 U -c t- »-> -o c © © 3 O CO © co O -X k. X 3 CCS 3 CO © CJ • to r— © E 3 •f— C co o +) ajimt- © to u © £ k. r- 4-1 © © co CO CO O k. © ••3 k. 4- 4-> Cn 4-> co to to IM X 8 © >, cn •r- X > CO O co © c CO © CO © © f- CD k k C 4-1 © -*- 4-> f 4-1 X to co to (D k- O © _ _ k. 4-» O 4-1 3 C 0) O Cl co co O O 3 o 4-> x x 0 aw CO co Q- se: CO X CO O > CO © "CJ 31 CO c X 4-1 I I I I to O 3 I I >1 1 I

O to o © 4-> c a; r— 4-1 © © cn > © 3 X cn u c © > © 4-> to 3 C/1 »— © to O 3 © -X r" © X to O •— k- k- 3 k. co f £ 3 CO O 3 3 X £ k- O 3 3 3 © O to CO 3 CO 3 O 3 k. f— © to > -X U co k. O r- CO 3 cn o cn k- o CO © O © © 3 X c © cn 4-1 X f- Kl r- 00 O X O 4J 4-1 f- X 4-> tO X 4-> O CO i— -r— COX 3 O k o k> X 3 f- 4-> k. 4-1 f— © 4-> 4- k. co © ^f 4-1 f- © co •f- 4- o X X 3 3 3 X 4-> 3 to k. X E E © 3 CO © © to X •^3 0 O Cn >» CO 4-1 k. c © o. to O (J 4-1 3 3 E CO © © 3 co c/i *»- 3 *4- ro © 3 CJ © 0 4-1© >> CO O E co 3!f- CO CJ k- X 3 4-1 4-1 3 to CO co r— CO •*- to >» cn f- 3 © W Of *— © 4-1 3 X © CO 3 3 CO CO 3 t— r— •«- O •c © (J 3 co 3 © X 3 DO k. O O x O «— 3 E 3 •*- CO 3 z CL © m or o E co £ t X O X O o k. X k. O © © E •*- to CO M © © 3 4-> 3 CO © 3 DO 4-> 3 CO 3 1 3 C c © to to CO •— 3 CO CL CJ • © X *»— o k. o © Z > *-< kJ • • a. p* r— OI o © f- •t— XX to «c x o 4-> CO X 4-1 3 4-1 3 CJ k. tO 3 4-> O cr © o 1- 4-1 CO 3 O CO © X X CO c to 4-» 2S E k. o k. a; X o 0) o CO X X to k. k. Q a. > CO CD CO O 3) © X 3 O 3 k. CL CO © © c O CD to © E to co CO a k cj to >> © O CL CO o k 3 U ^ 4- 4-1 •*- © ©L E to k. co O -CO 3 r— k. © >» 3 X k. o f— 0) •—coo o DO CL UJ • 4- to cC >

37 X. Industrial Vision Systems

A. General Characteristics

The prominent aspect of industrial vision systems , in distinction to more general vision systems, is that they operate in a relatively known and structured environment. In addition, the situation (such as placement of cameras and lighting) can be configured to simplify the computer vision problem. Usually, the number and nature of possible objects will tend to be restricted, and the visual system will be tailored to the function performed.

Thus many of them are based on a pattern recognition , rather than an image understanding approach. Industrial vision systems are character istically used for such activities as inspection, manipulation and assembly.

In an inspection task, the focus is on deviations from a standard, and usually little or no information is needed for identification. A manipulator controller, designed to pick parts off a conveyor, needs to be able to determine the identity, orientation and position of parts, but needs to know little of their precise shape except, perhaps at the grasp point. A visual controller for an arc welder will have its focus on the seam properties and needs little information about the appearance of the parts.

Kruger and Thompson (1981, p. 1525), in discussing the design of industrial vision systems, state:

the complexity of most perceptual tasks requires that the problem be decomposed into manageable subunits. Thus major design decisions include the function of each module, the computational techniques and data representations imbedded in each module, and the control structures that relate modules and transfer

38 .

information between them. Most computer vision systems use a hierarchical organization...

A popular organization for industrial computer vision is a two-stage hierarchy with a bottom-up control flow. The lower

level segments the image into regions correspond ing to object

surfaces. The higher level uses this segmentation to identify objects from their surface descriptions.

In practice, most successful systems incorporate aspects of both bottom-up and top-down control. The bottom-up processing is used to extract prominent features of a part to determine its position. Then, top-down control is used to direct a search to determine if the part satisfies an inspection criterion.

Industrial inspection and assembly operations are well suited to model-based analysis, because of the well-defined geometric descriptions associated with manufactured items.

CAD/CAM technology allows the specification of objects using

either volumetric or surface-based models. These geometrically based models are particularly appropriate to the hypothesis- verify approach, in which low-level image features are extracted

and matched to an appropriate computer generated 2-D representation

In addition to geometric models, objects may also be represented by graphs. In this case, recognition becomes a

graph-matching process.

More commonly at present, rather than using geometric models or graphs, industrial vision systems are taught by being

presented sample parts to be recognized in each of their expected

stable states. Aspects of the resulting images are typically

39 stored as templates, and recognition becomes template matching.

The objects can also be represented in terms of their characteristic features, such as area, number of holes, etc., and the resulting feature vector stored to be matched (via a search process) to the corresponding extracted feature vector of the image during system operation.

To simplify industrial vision systems, the input is usually reduced to a binary (black and white) image, so that objects appear as silhouettes. Simplicity is important in industrial vision systems because the computation time is limited, as most systems are expected to operate in near real time.

B. Examples of Efforts in Industrial Visual Inspection Systems

Table IV (based largely on Kruger and Thompson, 1981) lists some example efforts of vision systems designed for inspection.

The systems listed are primarily for the inspection of printed circuit boards and IC chips, with template matching being the predominant inspection approach.

Kruger and Thompson (1981, p. 1529) note that: "Automated visual analysis has also been applied to the inspection of surface properties such as roughness, scratching and other potential defects. The best successes have come with highly specialized illumination and sensing systems, specifically tailored for a particular application. Recently, greater sophistication in the modeling of the imaging process has lead to prototype surface inspection systems with the promise of increased generality."

Chin (1982) has recently published an extensive bibliography on automated visual inspection techniques and applications.

40 Table IV

Example Efforts In Industrial Visual Inspection Systems

Developer Purpose Modeling and Sample Domains Approach Representation

Baird (1978) Inspection process consists of Inspection field Inspection consists of a 50x5C Automated manufacture 1) Detection of the IC location and orientation on the heat-sink substrate pixel region of power transistor-pair digitized to 16 2) Quality control assessment after acquisition IC chips gray levels GM A gradient edge detector Is used to compile the histogram of all edge directions In the Inspection field. Peak of this histogram Indicates Templates for IC the approximate orientation of the chip corners

Next the corners of the IC are located by template matching. If any corners not located, the IC rejected

Cracked, fractured chips are eliminated by a simple contrast thres- hbldlng operation

Chin, Harlow & Dwyer (1977) Tralnlnq Phase Edges, Graph Model Inspection Use an operator-interactive model-building graph procedure to train PCB's the Inspection system U. of Maryland Using an Interactive camera/display system, the binary Image edges of of a prototype PCB board are detected, smoothed to reduce noise, and encoded Into a compact data structure.

Inspection Phase

Matching (against prestored edges and the graph model) Is used to detect flaws in test Images

Krakauer & Pavlidis (1979) Ingenious use of binary template matching using a limited number of Rlnary Templates Inspection well-chosen templates accessed via a rapid lookup technique Mass-Produced PCB's Princeton University

Jarvis (1980) 1) Local pattern matching to stored binary templates List of 5x5 Inspection pixel binary Supplemental tests for suspicious reqions Mass-Produced PCB's 2) templates -Computation of conductor area Bell Labs -Length of the conductor-substrate boundary -ratio of area to length

Processing done with simple logical operations

Hseih & Fu (1979, 1980) Inspection paradigm for proposed system is (for the most part)top-down Design and Inspection and wirebonding and model -driven using a tree-like syntatlc approach inspection guidance specification take Various inspection algorithms are called for based on the actions of a Multi-layered IC chips the form of a controller, which monitors the whole vision process descriptive data base Purdue University First, the Image goes thru a series of task and context-dependent filters to reduce ambiguities Six subpattern masks Then, 8 special purpose defect detectors are used, as required

41 C. Exa m ples of Efforts in Industrial Visual Recognition and Location Systems

Table V (again largely derived from Kruger and Thompson,

1981) lists some example efforts of vision systems designed for industrial part recognition and location. All these systems use a bottom-up approach. It will be observed that (except for Vamos

1979, and Albus et al., 1982) these systems utilize template or feature vector matching. Vamos does work from a 3D wire frame model which utilizes computer graphics type techniques to transform a model projection into alignment with observed lines in the image.

Albus’ Machine Vision Group in the NBS Industrial Systems

Division is using simplified 3D surface models of machined parts to generate expectancy images from needed viewpoints. The group

is seeking to achieve real-time, hier archical , multi-sensory, interactive robot guidance.

D. Commercially Available Industrial Vision Systems*

Table VI in Appendix J_ lists many of the Industrial Vision

Systems that are currently commercially available. Most of the systems require special lighting.

It will be observed that many of the systems designed for verification and inspection use pattern recognition, rather than

AI techniques. The systems tend to be bottom-up because of the speed requirements to achieve real time operation. Often unique edge and feature extraction algorithms are programmed in hardware or firmware.

^Add i t i o n a 1 i n f o r m a t i o n can be found in Gevarter (1982A).

42 — ' ^ — — — j

i

i fC °l— * 4-) -JZ W— 4-J S- C e to -o O i 00 00 cu CD 4— O i- O) CD C 4-J j s- • at z CO •O E 3 •r- a> to 4-> •r* fa ro o 4-J •r- *— ,— 4-J 0) CU Cl oo 4-> ) Q_ E ro 4- 00 4-) E CL U. t C •r— > O S- ‘r- fa *f— CVI C cd •t— s- x cu c: s- r-— OJ r— V) ! cj • • z >, CO 4-) -r- a) X 4-J to at fO 4-J CU CL C\J ^ j X X X U a) -a to E cu t. r— >0 CL x E o x 3 • E 0) f0 CU •i- .x o cu S- cd z; X fa 1— Cl) E s- c o -c o— fO cc 4-J -O O

ra CD s- CD up CJl CD CO c 4- s- E on (O CD o fO 1 CL CD ZZ E aj CD U O •r- > a •!- .z oo c 4-J ZZ -r- r— 4-> to CO *r- > 3 C) z oo oo •r— 1) tfl i- 1— ro 3 CO i— fD z 1—3 0 S- cameras a; t- a> at X -C CD CTO O CD x z o. u CD S- X -C ZZ d 4-J ZZ Systems •>- o cl fa ra «3 cu cd 4-J -r- O 4-J o. ta s_ E CD-C *i- at 3 +-> i— 4-> at u at O -Q X Z cd z s_ S- +j c 4-J £Z -r- CD S- CD E a) <0 1 to O E cu > ro zz oo 4-J 4-J C (J CD C e E separate 4-> 4-J O CHO i. OO O O 1 -z o S O) CO s_ at cd O) -r- M- 4-J .r- £- o to «o S- S- #ra +-> E 4-J Location •r- s- ra C- (O (D ZD CL CD E (J oo oo cd CL cu o *r- to CD CD 4-> Z ZZ ^ a) oo c u CD r— c *i- +j cu CO N s- cu o IO 3 CD a) •r- -C z •i- o X JZ to 4-> OO CD O i— CL) 4-> o c 4-j at 4-j at C CD and accommodate •r— at cu e to at U L- i c at OO X -i- oo s- s_ Q s- 0J u ia •r- CL X CD 4-J >> x aj •r— v— s_ oo E 3 fO fa x u s- a) .z +j zz a> s- o -z CD Z fO ro u at to to c x o JZ to r— O S- z X fO X z ta c to ZD CD Recognition •r— uj at 4-J z o O o 3 CD X Z time-shared 4J 4-> JD E 1— 4- 4-> CZ -r- u • Id fa x ia at s- x cu 4-> ta at cd r— 4- 4-> O cu oo X •i- Z 0) -i- s- • >, c a) ta s- CD CD •!- ta x i— cu E oo o a) 0) >, o ZZ sz -t-J u s- a> o o •r- CLr— 4-J X CJ o a 4-) 4-J 4-J -j architecture s- to o to to C CL) CD i— OO z 4 ta Visual a) U 3 cu ta 4-> jz ta S- fO fa CD E 4-J on > >> at o ZZ s- o 4-> C •r- X S- 1 z r— J- *r- 4-> fO -D C O >>4-J C CD 4-> Z o 1— > at •» CU *r- jd cz ta to co ra CJ ra r— CU CD D. o to *r- to 0) i- s- •>- X 4-j X ZD CU o 4-> O CL •r— fa O O O a) o a) ZD >, 4-J Z 4-) CO oo D. 4_j DD • 1- o Q- S- X •i— ra 4-j Z Industrial 4-> o oo 3 CD ZZ X •i- o ra 1— 1— CD to ro x computer ra c cr -t-> S- >J S- I— 3 z •1- CJ CD o •r^ cu s- • c E o c u 4-> fa a •r- S- z S- x 00 o o CO c o OO o to 1— -Z CL O0 *r— a. r— 4-) CL ra S- CD •!- C oo c: ta o o cd ro oo CL o 00 o •r- 4- X 4-j *r- •r- *i— CJ 4-J US- S- CD ro ZZ •r- "O _z a> •r- JZ fO CL-r- 4-j 4-> in bonding oo o at to U X -O oo E CD >, E CL 2 ro CL cd cu 4-> .z Z !Z CD o o re cd a i— ro >or- *4—3 4-J 3 s- c cj O ra to r- CL *r- i CD i— (ZL 4-> •|— -j 4-> 1 .z •r- r— to -Q S- X oo 4 fa oo ro i— E 4-> 4-> E f— fa ta 0) u at o o ra cd -E ro CD Multiplexed 50 p— 4-J +J 4-J Efforts o E E ra cu <_> CD-.- ZZ ZZ to 3 *r- V- 4-> to -Z s- r— o. o JZ CD 4-> 4-j OO 4-J CL ro 4-> 4-> at OO at CL •r- _£Z -r- cu cu •!- c+- o o s_ CU ta F a o to o to fO H— •!— s- fO CL OO 4-J CU E -z u e CD o OO z o CO =3 UJ ZD t— I— I M

Example •r— SZ CD u U fO z a) r— 4-J ro x ra •r- X j I •i— *r— •p— i 3 0- 3 DO CD 4-> r» JD C5 Z to ra *r- X 3 _J DO X Z X to z z CO ta z CD .E *r- O z a; •r— r— fa to i— CL O ST E C to a) x 3 a; s- cd o 4-> J-J Dl Z > ZJ CD o z x *r- S- CD ro s- CO o •I- 4-> ra r— cd a. i O cro •r* rD Q. ra ro z -L> O O- n £= p— to r— -X s_ O OO v— •r— CD +-i »r- *r— ro 3 a; o go > 4-> CL OO c •r— Z 4J OO c •r— -C cu ro z •r- o z s_ CD oo O O ra ao d; o ic ia c ra o s- <=C on -1 CL UJ — i— E 43 Table V (continued)

and Location Systems Example Efforts In Industrial Visual Recognition

Developer Modeling and Representation Purpose Approach Sample Domains

light on a conveyor belt Feature vector of part Two linear light sources superimpose a line of Holland Rossol & Ward (1979) separate, Image characteristics perpendicular to its direction of motion. The two lines Conslght I determines proportional to the part passing by. Point of separation part thickness. Industrial part location, part boundary* degree of separation determines manipulation recognition and silhouette The scene Is Imaged with a linear array camera and a Engine parts automatically generated. Uses same feature vector approach as SRI Module. GM , . sheets-of-structured-light generator Uses quadratic Employs a point light source, a NBS: Albus et al . (1982) approximations to camera, all mounted on the wrist of a robot arm. and a surfaces of Idealized 3-D objects. Visual servoing for robot Uses alternate frames of: guidance (real-time identification of the entire object, and location and 1. A regular point source illumination for manipulation) 2. Two parallel planes of structured light. Machined parts on tri angulation System determines location and orientation based National Bureau of Standards light sheets with (associated with relative height of Intersection of of °bser 11 part), and recognition based on shape and size ''^ "®^*! part. Uses thisc Information1nn the planes of light makes as it Intersects by the point source to Interpret outline seen in image produced illumination. organized Analysis of vision Input is performed with a hierarchically and analytic group of microprocessors. At each level of the hierarchy, The process is guided by an expectancy-generating modeling process. knowledge, by modeling process is in turn driven by a store of a priori from the analytic knowledge of the robot's movements, and by feedback provides output to guide a process. Each such level of the hierarchy control system. corresponding level of the robot's hierarchical

Concurve models of Perkins Operates on 32 gray levels (1978) sample parts Industrial parts recognition Bottom-up scene segmentation approach

Engine components 1. Reduce 256x256 pixel image to an "edge gradient" Image 2. Link edges with similar gradient magnitudes to form chains GM 3. Characterize chains as either straight lines or circular arcs. (This reduces 65,000 pixel Image to about 50 concurves.) System matches observed concurves with model generated concurves using

1. A preset control structure to select the order In which combinations of model and scene concurves are to be matched.

2. Starts by matching one model and one scene concurve

3. The stored model Is spatially transformed and rotated to fit associated scene concurves System Interactively trained by 'generating concurves of sample parts Can Identify parts partially occluded by other parts

Yachida and Tsujl (1978) Uses a boundary detection and Isolation of parts In a binary Image Stable orientation approach similar to SRI Vision Module Industrial Parts Recognition models of parts Recognition system based on a structured step-by-step analysis with -part name Nonoccluded parts of a the previously stored small gasoline engine models -orientation -list of primitive Use a series of special feature detectors Osaka Univ. features -hole detector -polar coordinate -line finder boundary -texture detector representation -small hole detector System training Involves Interactive man-machine examination of the Identification task

Vamos (1979) Finds edges using a simplified version of the Hueckel -operator using 3D Wire Frame Models only two linear templates Recognition of 30 Objects Lines are then fitted Bearing housings to edges Assembly Wire-frame model transformed (and hidden line elimination used) to correspond to Image - yielding recognition and part orientation Sheet metal parts to be painted Objects are Interactively taught to system either by building a geometric model or by a computer-aided transformation of viewed Neural nets In microscopic- samples section In neural research

Hurgarian Acad, of

44 The more sophisticated systems tend to utilize variations and improvements on the SRI Vision Module described in Table V.

A few systems make good use of structured light for 3D sensing. A number of efforts in guidance of arc welding take this form.

45 XI . Who is Doing It

Rosenfeld, at the University of Maryland, issues a yearly bibliography, arranged by subject matter, related to the computer processing of pictorial information. The issue covering 1981,

(Rosenfeld, 1982) includes nearly 1000 references.

The following is a list by category of the U.S. "principal players" in computer vision.

A. Research Oriented

1 . Universities

These are shown in Table VII.

2 . Non-Profits

SRI International, AI Center JPL

3. U.S. Government

NBS, Industrial Systems Div., Gaithersburg , MD NOSC (Naval Ocean Systems Center), San Diego. NIH (National Institutes of Health)

B. Commercial Vision Systems Developers

A partial listing is given in table VIII. It has been

reported that hundreds of companies are now involved in

vision systems.

46 Table VII

University Organizations Engaged in Computer Vision Research

Artificial Intelligence and Laboratories Funded Under DARPA IU Program

C.S. Labs A. I. Labs Other

CMU Robotics Institute

U of MD X

MIT X

U of Mass. Comp. & Info. Sci. Dept.

Stanford U X

U of Rochester X use Information Processing Institute U of Rhode Island Robotics Res. Lab

Other Active Universities

U of Texas X at Austin

VPI X

Purdue X

U of PA X

U of IL X

Wayne State U X

JHU E. E. Dept.

RPI Elec. & Sys. Engr. Dept.

47 ii c — 1 c i i i 1 1c i — 4

CO CO S- cj L.3 o 4-> JO O CJ O C co QC *-• 4- 3 -O * £ r— £ X •<- it S- -P « +-> d) ro E o cl e o jo O 'r- 4-> o ° £= = ex CJ ZD

a. i- >> • o e JO o CO * CO O- CO _ +-> E Developers £ e o cd x; CD cj S- CO u E 3 CD s- 3 c +J c CO S- o CJ •P“ CD +J •r~ CO XI CO CO p 4- a t- CD £ CO 1— System 3 CO o CO e CO O "D •p z: o CO d) 4-> c 3 r— CO > r— CO C co o 4-> CD CO CO CO C •r- •f“ i. •r— £ in o 1- co CO O Vision 4- O CO CO oj CD -t-> r— Q_ •r— +-> CL s- 3 E r— (O E -P CD. CO cj s- O CO 3 CO CD CL CO d) 3 x: s- CD o S- £ CO CJ •!- O CD 4-> cu r— cj o r— o 3 1 S- Q 3 > LlJ z: ro CO •r— O a co O •r- s- •r* e 4-> x: -o 1— Z" -O s- Q r— , 0) 3 CO d> +-> x: -x s_ +-> O CD S- c s- £ *o X P to CD cj •I- s- O CD >> CC

CD CO

CO cd CO • •I— Cl E co 1 £ S- CD CJ E CO £ CO O +-> £ CD E •r“ Q CJ co •— +J CD E >> co >. +-> s_ o CD CO > • u 4— c d) c •r- r— C/) CJ £ c CD -P o i- £ o £ t— CD o CD CO •r- CD o £ CD £ • 1—4 L. •r— •r— >> 4-> CD •P“ CO x: £ O CJ #4 ro CO r— co •r— £ CO +-> O CD o »r* •r— £ «4 £ •4— p— C •p— CD •r* CD r— CD CO 1—4 CO O co > CD c CD CD • r— £ -o s- h- CD •r— E •t— E +-> o X O • E S- CD *r— CO CD > 4% CD +-> CD p— £ •1 •r— O u LU +-> O > x: £ £ £ +J ro -t-> CO t— CO s- CD c CO +j O o O •p— •a <0 CO E CO •t— •f 4-> QI 1— £ 3 CO CD o *p— CD •r— CD £ •p— >> s- CD > CD O T3 E a; -o +j 4-> £ 4-> +j +-> tn £ £ CO p e E 4-> X S- £ o CO 4-> a LU +-> CO CL CO O co r— •P o CJ .X CD +J i— +-> CD S- CD CD CD E CD E S- CJ CD 3 XI o CD CD CD c u £ CD o S- Cl s M o a o CD •r~ E -a CJ JO T3 •c“5 +J CD CD E CO ro r— CD to CD £ +-> i- CJ JO 4-> o e co o •p— JO CJ O Q. CO 3 E o > £ •i— CO 3 d> •1 >i CD CO i— z: oc: > O O O oo m Cr 1— c_> LU l— > > < CL > O QC *

48 XII. Who Is Funding It

To date, the principal source of funding for computer vision research has been the U.S. Government, which is estimated to spend in the order of $10 million a year in this area.

The major U.S. Government program has been the DARPA Image

Understanding Program. Other government agencies funding vision research are:

NSF (National Science Foundation) NIH (National Institutes of Health) NBS (National Bureau of Standards) ONR (Office of Naval Research) DMA (Defense Mapping Agency) NASA (National Aeronautics and Space Administration) USGS (U.S. Geological Survey) AFOSR (Air Force Office of Scientific Research).

It is estimated that DARPA (Defence Advanced Research

Projects Agency) spends in the order of $2.5 million dollars a year in computer vision research. DARPA thrusts include automatic stereo and terrain mapping, autonomous navigation,

robot vision , symbolic representation, autonomous expert image systems, and photo analysis aids. DARPA helps support a number of Image Understanding laboratories at universities where I.U. work at all levels is performed.

DMA has entered into a very active program in image and scene analysis. Their goal is to achieve ’’fully automated” production for mapping, charting and geodesy by 1995, in which the primary role of human beings will be to validate the inputs and the output extracted information. They intend to commence by furnishing computer vision aids to the cartographer and achieve the desired high-level automation via an evolutionary route.

Their current approach is to focus a portion of DARPA's image

49 understanding effort on producing an Image Understanding Testbed for integrating and evaluating current and emerging computer vision techniques and systems. An initial version of this

Testbed has been constructed at SRI in Menlo Park, CA (Hanson and

Fischler, 1982). The future emphasis of the Testbed will be on expert systems that facilitate the application of IU research results to cartographic problems.

NSF spends roughly $1.5 million a year on a variety of research topics in computer vision.

NIH spends a substantial sum in obtaining and evaluating images for a variety of medical research applications. This has included efforts in semi-automatic cancer screening, computer- assisted photometry, tomography, image formation and imaging equipment and various other medical application related areas.

As the focus is application oriented, rather than computer vision oriented, it is difficult to pinpoint the portion that can be considered computer vision research. However, a rough guess might put the figure at one to two million dollars a year.

It has been estimated that NASA spending on image processing and evaluation approaches one hundred million dollars a year. To help support this effort, NASA funds somewhat less than one million dollars a year on research in computer vision*. NASA spends roughly half this sum to support research at JPL in vision systems to guide robot manipulation.

^Additional "funds have been spent on image processing and analysis hardware such as the Massively Parallel Processor (MPP) at the NASA Goddard Space Flight Center.

50 The National Bureau of Standards has an ongoing in-house robotics vision research effort, which currently is in the order of one half million dollars a year. Collectively, other government agencies probably spend another one to two million dollars per year for research in this area. It is estimated that perhaps an additional one to two million dollars a year is spent by government contractors using

IRAD (Independent Research and Development) funds associated with their prime contracts.

51 XIII. Summary of the State-of-the-Art

A. Human Vision

Human vision is the only available example of a general

purpose vision system. However, thus far not many AI researchers have taken an interest in the computations performed by natural visual systems, but this situation is changing.

The MIT vision group (among others) believes that, to a

first approximation, the human visual system is subdivided into modules specializing in visual tasks. There is also evidence that people do global processing first and use it to constrain

local processing.

Considerable information now exists about lower level visual

processing in humans. However, as we progress up the human

visual computing hierarchy, the exact nature of the appropriate

representations becomes subject to dispute. Thus, overall human

visual perception is a still very far from being understood.

B. Low and Intermediate Levels of Processing

Though methods for powerful high-level understanding visual

analysis are still in the process of being determined, insights

into low-level vision are emerging. Alan Mackworth, from the

University of British Columbia observed at IJCAI-81 that there is

an exciting convergence in the theory of low level vision from

the major vision centers, such as MIT, CMU, SRI and Stanford.

The basic physics of imaging, and the nature of constraints in

vision and their use in computation is fairly well understood.

Detailed programs for vision modules, such as "shape from

shading" and "optical flow," have begun to appear. Also, the

representational issues are now better understood.

52 However, even for well understood low-level operations such as edge detection, there has been no convergence among the many techniques proposed, and no method stands out as the best. In general, edge detectors are still unreliable, though Marr and

Hilbert's approach, based on the zero crossing of the second derivative of the intensity gradient, appears promising. Brady

( 8 1 1 9 B , p. 3) states that operators designed to extract the

"important" intensity changes in an image are still more an art than a science. Approaches to edge detection consist mostly of convolving images with local operators tuned to particular applications. These operators fare badly outside their limited domain or in the presence of noise.

Barrow and Tenenbaum (1981, p. 576) note that the direct approach to image segmentation is inherently unreliable. A number of research groups successfully circumvented this problem by integrating segmentation and interpretation. However, this approach is not suitable for a general purpose vision system as it is based on advance knowledge of the objects to be expected.

In industrial vision, the primary technique for achieving robust edge finding and segmentation is to use special lighting and convert to a silhouette binary image in which edges and regions are readily distinguishable.

At intermediate levels, edge classification and labelling have been very successfully used in the blocks world. Barrow and

Tenenbaum (1981, p. 573) believe that the various techniques developed for dealing with the blocks world could be integrated into a complete, highly competent vision system for that

53 . . domain. Thus far, however, no such system has actually been built

Binford (1982) in reviewing existing research in model-based vision systems observed that most systems first segment regions then describe their shape. None of the systems makes effective use of texture for segmentation and description. In general, shape description is primitive and interpretation systems have not yet made full use of even these limited capabilities.

As yet, the extraction of useful information from color is extremely rudimentary. The perceptual use of motion (optical flow) has been a focus of attention recently, but findings are preliminary

For low level processing, many recent algorithms take the form of parallel computations involving local interactions. One popular approach having this character is "relaxation." These locally parallel architectures are well suited to rapid parallel processing using special purpose VLSI chips.

C. Industrial Vision Systems

Barrow and Tenenbaum (1981, p. 572) observe that:

Significant progress has been made in recent years on practical applications of machine vision. Systems have been developed that achieve useful levels o f performance on complex real imagery in tasks such as inspection of industrial parts, interpretation of aerial imagery, and analysis of chest X-rays. Virtual- ly all such systems are special purpose, being heavily dependent on d o m a i n - s p e c i f i c constraints and techniques. For example, industrial vision systems usually require high contrast to obtain binary images and use overhead cameras to minimize variations in object appearance.

A much more pressimistic view is taken by Kruger and Thompson (1981, p. 1524) who state that:

54 Despite substantial research efforts, the study of computer vision is still in its infancy... Significant reductions in complexity are possible if automated perception is limited to an industrial environment. Even here, however, we still lack a clear understanding of the fundamental problems that must be addressed if computer vision is to have a major impact on manufacturing.

Hiatt (1981, pp. 2, 3) observes that in industry, robot vision systems are limited to simple repetitive processes, and that the classic bin of jumbled parts problem still overwhelms industrial vision systems. However, Birk and Kelley at the University of Rhode Island have devised algorithms to successfully pick out parts from a bin on up to 90% of the computer-vision robot’s machine cycles.

Krueger and Thompson (1981, p. 1537) observe that, ’’The current state of the art precludes the construction of one general-purpose computer vision system with applicability to all industrial vision tasks.... Current systems use no common primitives for formal r epresen tat ions of object properties.

There is also no common programming language for these applications. [Current industrial vision systems are limited in their flexibility in allowing users to reprogram the system to new situations.] This situation will likely improve as computer vision becomes more integrated into the production process.”

In adapting concepts generated in the research laboratory to industrial vision applications, many important additional factors come into play such as speed, cost and complexity. It has also been found that the lighting and optics play a key role in the robustness of an industrial system. Most potential industrial vision applications cannot be reduced to working with binary

55 silhouettes, due to texture and other real-life environmental factors. Thus, is an important ingredient.

Unfortunately, at present many prospective users have inadequate inhouse capability to do the systems planning and integration needed to successfully adapt computer vision to their operations.

This has inhibited the industrial use of sophisticated vision systems. The vision manufacturers are now beginning to try to remedy this situation by starting to provide easier user programming, friendlier user interfaces, and systems engineering support to prospective users.

It has been estimated that as of mid-1982, though less than

50 sophisticated industrial vision systems were actually in use, approximately 1000 simple line-scan inspection systems were in regular operation. Though special purpose systems have thus far been the most effective, successful vision applications are now becoming commonplace and are expanding. Many firms are now entering the industrial vision field, with technical leap- frogging being common due to rapidly changing technology.

D. General Purpose Vision Systems

1 . Introduction

Though many practical image recognition systems have been developed, Hiatt (1981, pp. 2, 8) observes that, "In current vision applications, the type of scene to be processed and acted upon is usually carefully defined and limited to the capability of the machine... General purpose computer vision has not yet been solved in practice.” This domain specificity makes each new application expensive and time consuming to develop. Thus, there is a clear need for computer vision systems capable of dealing

56 with a variety of industrial applications, particularly those with less structured environments.

Barrow and Tenenbaum (1981, p. 572) note that "Developing general-purpose computer vision systems has proved surprisingly difficult and complex. This has been particularly frustrating for vision researchers, who daily experience the apparent ease and spontaneity of human perception. Research in the last few years, however, has provided new insights into the computational nature of vision that could lead to systems capable of high performance in a broad range of visual domains."

Brady ( 1 9 8 1 A ) observes that there has been a research shift toward topics corresponding to identifiable modules in the human vision systems, and away from particular domains of application.

The consequence has been a sharp decline in the construction of entire vision systems.

2. Difficulties

Barrow and Tenenbaum (1981, p. 574) emphasize that

Model-based interpretation of image data is an enormously complex computational task. The variety of possible scene configurations and viewpoints is so great that an exhaustive search through the space of possible interpretations is out of the question. Only the most promising or most important alternative interpretations can be pursued. Selection of candidate interpretations depends both upon information derived from the input image, and upon the observer’s goals and expectations. A delicate balance must be struck between data-directed and goal-directed search to avoid oversight (not seeing things that are really present) and hallucination (seeing things that are not).

Gennery et al . (1981, pp . 10-1, 10-3, 3-6) observe that

The statement "Vision is hard" is found often in the computer vision literature. There are several reasons for the difficulty. In the first place, an image contains an enormous amount of information, much

57 "

of it irrelevant to the task at hand, and it is an imperfect projection of the real world, containing noise and distortion. From this the relevant information must be extracted. In the second place, the transformation from the image to the real world is highly ambiguous. Thus world knowledge must be relied on to resolve the ambiguities. (This is especially true in monocular vision of three-dimensional scenes, but it is also true to a lesser extent in stereo vision.) In the third place, an object seen may only vaguely resemble others of its generic type or even itself at other times or under other conditions. In the fourth place, in a powerful vision system an object must be recognized out of a large number of possible objects or generic types.

These facts appear to manifest themselves in two ways in practice. First, vision requires an enormous amount of computing. Second, it seems that the computational methods needed are very complicated, and it is unknown today what the right methods will be... Some experimental systems hold promise for recognition of generic three-dimensional objects, although they require a large amount of computing time on existing computers. Some special-purpose hardware is becoming available, which enables some very low- level computations to be performed rapidly. Even in these cases, however, a variety of techniques are in use, with no consensus about which are the best. This becomes even truer as we move to the higher-level, more general, on more advanced areas. Furthermore, many of the approaches that have been used are ad hoc, with little promise of generality.

...two tasks that are beyond the capability of any existing computer vision system are the recognition of parts in a jumble in a bin and the operation of a robot vehicle in a complicated outdoor environment.

Rosenfeld (1981, p. 3) observes that "Image processing and scene analysis have definitely saturated the capacity of computers .

In relation to earth observation imagery for resources management, Alan Mackworth of UBC stated at IJCAI-81 that it will be necessary to alter the popular current multi-spectral paradigm that pixel meaning can be determined by intensity alone -- it doesn’t work. It is necessary to understand spatial

58 . "

organization, meaning and context. The spatial constraints are very important. There is no chance of getting a general purpose vision system to understand satellite imagery alone -- it is necessary to use a system-generated "sketch map" to interact with the scene.

3 • Techniques

( 8 1 Brady 1 9 B , p. 4) observes that, "Most AI workers have... abandoned the idea that visual perception can profitably be studied in the context of a priori commitment to a particular program or machine architecture." Binford (1982) believes

"...that building a vision system is 1% a system effort of the sort which are familiar in computer science, and 99% basic science .

The research emphasis has moved to developing techniques

(vision modules) for extracting intrinsic images (shape from

shading, shape ( 8 1 from texture, etc.). Brady 1 9 A , p. 6) observes that, "Representations have been developed that make explicit, the information computed by a module... [This] leads to a view of visual perception as the process of constructing instances of a sequence of representations."

Gennery et al (1981, p. 3-1) note that at higher levels of descriptions it becomes difficult to judge what are the best approaches. As a result, a wide variety of techniques have been used

Brady (1981, p. 99) observes that though it appears that the most difficult visual problem is the perception or planning of

59 .

movements through cluttered space, a solid start has been made on this problem by Lozano-Perez (1981).

4 . Conclusions

Binford (1982) in reviewing current model-based research vision systems concludes that most systems have not attempted to be general vision systems, though ACRONYM does demonstrate some progress toward this goal. Existing vision systems performances are strongly limited by the performance of their segmentation modules, their weak use of world knowledge and weak descriptions, making little use of shape. The systems primarily relate image relations to image observables; in general lacking the ability to relate three dimensional space models to images. Existing systems show little emphasis on basic vision problems in systems building

Binford observes that until recently, systems efforts have been small and short-lived, generally only a few man years effort. Focussed and continuous efforts are necessary but not sufficient for system building. The system programming effort alone in building a vision system is enormous.

With the exception of ACRONYM (and to an extent MOSAIC), the systems surveyed depend on image models and relations, and therefore are strongly viewpoint-dependent. To generalize to viewpoint insensative interpretations, would require three- dimensional modeling and interpretation as in ACRONYM.

Binford found that the systems jump to conclusions based on flimsy evidence which would probably not distinguish many objects in a complex visual environment. The systems typically use the hypothesis-verification paradigm. Hypothesis generation is the

60 crucial part, made easy in the top-down case. The systems succeed best with quasi-2d scenes, for example aerial photographs, industrial scenes from a fixed viewpoint, x-ray images, and ground level photos from a fixed viewpoint. Even

ACRONYM, which incorporates viewpoint-insensitive mechanisms, has been demonstrated only on aerial images, although it appears applicable to ground level photographs as well.

Binford concludes that though that the results of these and other efforts are encouraging as first demonstrations, nevertheless as general vision systems, they have a long way to go.

Tenenbaum and Barrow (1981, p. 594) in discussing the general computer vision problem conclude that:

We are beginning to understand the computational nature of vision at a fundamental level, independent of implementation. This understanding provides new insights into limitations of early scene analysis systems and a solid scientific foundation upon which future general-purpose high-performance computer vision systems can be built...

The competence of a vision system ultimately rests upon the representations it uses to describe the world and the models available for manipulating and transforming descriptions. Many levels of description are necessary to achieve human performance requiring models of scene domains, objects, surfaces, illumination, sensors, and the geometry and photometry of imaging.

A vision system is naturally structured as a sequence of levels of representation. The initial levels are primarily iconic (edges, regions, gradients) because that is the nature of the information available directly from an image. The highest levels are primarily symbolic (surfaces, objects, scenes), because that is the nature of the information that is sought. Intermediate levels are constrained by the information available from preceding levels and that required by subsequent levels. In particular, physical and three-

61 .

dimensional surface characteristics provide a critical transition from iconic to symbolic representations.

Early levels of processing in a vision system are primarily data driven, while higher levels are controlled by goals and expectations. At intermediate levels, some combination of data-driven (bottom-up) and goal-driven (top-down) operation is needed both to compensate for errors, and to avoid computational overload. Although the detailed nature of processing is dependent on representation and therefore considerably different at low and high levels, it is significant that at virtually all levels processing appears to be inherently parallel, and thus amenable to implementation by networks of computational elements (e.g., neurons or VLSI chips)...

While no such [general-purpose computer vision] system yet exists, most of the pieces have been experimentally demonstrated. Thus it would not be unreasonable to attempt to construct one within the current state of the art. Of course, many details still remain unresolved, especially at the higher levels of processing.

E. Visual Tracking

Real-time tracking of objects is important to manipulation and guidance. The state-of-the-art in visual tracking is reviewed and Appendix H. Though some success has been achieved under limited conditions, it remains as an important area for research

F . Overview

In conclusion, we might observe that computer vision can be viewed as a set of very difficult problems. However commercial vision systems are available and are operating successfully in specialized environments on low level problems of verification,

62 inspection, measurement, recognition, and determination of object location or orientation.

There is now a much better understanding of the computer vision problem than there was just a few years ago. A major focus of the current research effort is in extracting 3D shape from intrinsic image characteristics.

Though quite a number of high level research vision systems have been explored, no general vision system is available today or is imminent. Major current efforts in this area are ACRONYM at Stanford U. and VISIONS at the U. of Mass.

63 . .

XIV. Current Problems and Issues

A . General

Some of the general issues are:

• Can general vision be reduced to computer analysis?

- What assumptions about the world are restrictive enough?

How much data is required?

Need to incorporate generic aspects of perception (Binford, 1982)

Similarity, not spatial congruence is the paradigm of interpretation in nature.

Humans are always seeing things they haven’t seen before

3D interpretation of images versus 2D Binford (1982) believes that general vision systems depend on building three dimensional descr ipt ions--that prediction, description and interpretation take place largely in three dimensions

How necessary is it to follow the central paradigm (Figure 6, in this report) to achieve high level vision? Is it essential to employ a key intermediate representation such as the 2-1/2D sketch or Intrinsic Images? It is possible to obtain these using only local constraints?

Is the hierarchical vision paradigm (Figure 6), which implies complete segmentation and labeling inappropriate for natural scenes? Is a more isomorphic representation needed, such as a map which implicitly captures the detail and relations and is more appropriate for natural computations? For such isomorphic representations, is the serial digital computer inappropriate and another calculating medium such as a network needed? (See Fischler, 1978, 1981.)

Methods and hardware to reduce the software generation costs and processing time for computer vision*.

* N u d d HW5 provides a good overview of computer processing requirements for computer vision and appropriate ar chi tectures and hardware to implement them.

64 Lack of interface standards for connecting computer vision systems to robots and industrial machines.

• Active vs passive sensing in vision systems.

• Relative merits of binary versus gray-scale imagery.

• Most issues are still poorly understood.

Techniques

1 . Low Level Processing

Many of the unsolved problems in computer vision are at this level.

The whole issue of constructing the primal sketch from zero crossings of the second derivative of the intensity is far from resolved.

• Direct edge finding and region segmentation are still unreliable for general vision.

• A key insight is that local information is usually inadequate to guide segmentation and interpretation in a general scene. Global structure such as the shading gradient is required. To what extent can modeling and using physical contraints in scene analysis provide global restrictions which can guide segmentation and assist in classification? Possible examples include 1) utilization of shadows to locate lighting sources and to pinpoint objects casting shadows, and 2) use of sky-land boundaries as a global constraint. Another global approach for man-made scenes is to employ the camera model and geometric perspective to detect vanishing points associated with parallel lines in an urban scene. Fishier et al. (1982) indicate that the detection of clusters of parallel lines by finding their vanishing points can be used to automatically screen large amounts of man-made structures.

How to best utilize and avoid difficulties with texture in natural scenes is still unsolved.

Rectification of images prior to stereo matching remains a problem.

2 . Middle Level Processing

A key problem remaining in computer vision is bridging the gap between pictorial features (e.g., edges and regions) and 3D objects.

65 . .

• Techniques for analyzing time-varying imagery.

• Limits of the intrinsic image approach: It is not clear that we can reliably obtain intrinsic images from images of real scenes via the methods outlined in this report. Alternative approaches when available, such as stereo or active ranging sensors, may be preferable for extracting intrinsic characteristics.

• How best to deal .with shadows and occlusions.

3. Higher Level Processing

• Relation of higher level vision to AI

• Modules that operate on the surface orientation map to produce object representations

• Generic interpretation in terms of object classes.

• Semantic interpretation.

• Semantic search techniques for use in matching schemes using semantic segmentations and indexing.

• Identification for interpretation of which geometric parameters are casually (functionally) rather than statistically determined (Binford,

1982 )

C. Representation and Modeling

Representations for complex and amorphous shapes (e.g., a tree, a crumpled sweater, a flowing stream).

Proper level for the dividing line between iconic representations at the lower levels and symbolic at the higher levels, and how much these representations should overlap.

How to index efficiently into a database containing a large number of models.

* What sort of features should be extracted from the scene (edges, corners, regions, surface oriention, etc.) and how should objects be modeled (wire-frame models, generalized cylinders, etc.)

D. System Paradigms and Design

* Is the relaxation process the most attractive approach at the lower levels where global aspects are not directly considered?

66 How far can parallel methods (like relaxation) be pushed at all levels?

• Is a combination of top-down and bottom-up the preferred approach for complex vision tasks?

• Under what circumstances is the blackboard approach to be preferred? Note that hierarchical image understanding systems suffer from lack of adaptability and also require a large amount of processing.

Knowledge Acquisition - Teaching and Programming

• Methods for knowledge acquisition at all levels.

Methods for learning and tracking of generic types.

• How to make the system versatile by having it programmable at a very high level.

• Design of a very-high-level programming language especially for vision. Little has been done to date in this area.

Sensing

• Active vs passive sensing.

• Best methods for using structured light.

• Methods for acquiring 3D directly.

• Is scanning laser radar the wave of the future?

Planning

Methods to incorporate planning into robotic systems utilizing vision. .

XV. Research Needed

A . General

• Need to understand human vision as it’s our best example of a general purpose vision system.

Need research in general purpose systems capable of high performance in a wide variety of visual domains.

• Need to be able to use generic recognition.

• Methods to reduce software costs of computer vision and reduce processing time.

Image processing techniques for greater capability.

• Interface standards.

Methods for visual guidance in cluttered spaces.

• Improved understanding of the extent and use of domain specific information in visual perception.

• Need to determine how to best utilize range information

Need to develop global methods (e.g., utilizing shading) either to bypass or to help guide the current hierarchical paradigm.

B. Techniques

1 . Low level processing

More reliable and faster edge and region finders for general scenes.

Ways to extract motion measures from sequences of intensity arrays.

• Reliable stereo disparity modules.

• Determination of surface properties, such as color, smoothness, coatings, etc.

2. Middle level processing

• Techniques for analyzing time-varying imagery.

• Methods to bridge the gap between edges and regions and 3D objects.

• Improved methods of extracting intrinsic images.

68 . .

3. High level processing

Better understanding of how to use texture.

• Modules that operate on the surface orientation map to produce object representations.

• Methods for generic and semantic interpretation.

C. Representation and Modeling

Representations for complex and amorphous shapes.

• Techniques for indexing into a large data base of models

Mathematical methods to model texture conviently.

• More precise representations of surface orientation maps at different levels of resolution.

• Methods to group properties at each level of resolution for each representation, so that a hierarchical structure can be imposed upon the representations.

Determination of under what conditions is binary imagery most favorable and under what conditions is gray-scale to be preferred.

• Choosing which features to extract from a scene.

• Methods for modeling 3D objects.

D. System Paradigms and Design

Need to explore the extension of relaxation processes to multiple levels in the pyramid of description and interpretation

Methods of local parallel processing which can discover global information through propogation.

Efficient methods and techniques for maintaining concurrently a number of images of a scene in various stages of processing, so that these explicitly represented images can interact with each other and with higher and lower levels of processing as processing proceeds. This is especially pertinent to globally consistent relaxation processing.

• Need investigation of paradigms, other than the conventional hierarchical paradigms, such as the "blackboard" and other paradigms being explored in Artificial Intelligence areas such as expert systems.

69 .

Faster pre- and post-processing hardware (e.g., special digital circuits to evaluate intensity gradients).

E. Knowledge Acquisition - Teaching and Programming

• Need better techniques for rapid reprogramming of vision systems by the user.

• Inspection and Assembly vision systems software approaches that can easily be modified to adapt to new situations. It would be desirable that control structures be incorporated that will specify tests to be performed and possible alternate paths of actions. Need high-level programming languages designed especially for vision.

• Need methods for learning and teaching of generic types.

• Methods for knowledge acquisition at all levels.

F . Sensing

Techniques for rapid 3D sensing -- ranging by lidar (scanning laser radar), defocussing, stereo and triangulation

Improved methods for use of structured light for 3D evaluation and shape identification.

Methods for exploiting multiple light sources.

• Higher-resolution and selective-resolution transducers.

G. Planning

Methods for incorporating vision into robotic planning.

70 . . "

XVI. Future Trends

As the field of computer vision unfolds, we expect to see the following future trends*.

A. Techniques

Though most industrial vision systems have used binary representations, we can expect increased use of gray scales because of their potential for handling scenes with cluttered backgrounds and uncontrolled lighting. Recent theoretical work on monocular shape interpretation from images (shape from shading, texture, etc.) make it appear promising that general mechanisms for generating spatial observations from images will be available within the next 2 to 5 years to support general vision systems.

Successful techniques (such as stereo and motion parallax) for deriving shape and/or motion from multiple images should also be available within 2 to 5 years

The mathematics of Image Understanding will continue to become more sophisticated.

Enlargement will continue of the links now growing between Image Understanding and Theories of Human Vision

1 Brady ( 1 9 8 B . p. 11) predicts that there will be a considerable advances in current vision ”... issues over the next few decades, probably resulting in changes in our conception of computing and vision at least as large as those which have occurred over the past decade .

B. Hardware and Architecture

We are now seeing hardware and software emerging that enables real-time operation in simple situations. Within the next 3 to 5 years we should see hardware and software that will enable similar real-time operation for robotics and other activities requiring recognition, and position and orientation information.

^The se tr ends have been largely derived from statements by Brady

( 1 1 1 8 1 98 A , 9 B) , Binford (1982), Kruger and Thompson (1981), Agin (1980), Arden (1980), Rosenfeld (1981), Hiatt (1981), and Barrow and Tenenbaum (1981).

71 . .

Fast raster-based pipeline preprocessing hardware to compute low-level features in local regions of an entire scene are now becoming available and should find general use in commercial vision systems in 2 to 4 years

As at virtually all visual levels, processing seems inherently parallel, parallel processing is a wave of the future (but not the entire answer). Parallel processing research hardware systems (such as ZMOB at the U. of MD, and the MPP for NASA Goddard) have already been built, and appropriate algorithms are being developed.

Three possible parallel processing architectures are array processing, pipeline processing and multi- processing. Multi-processing looks most promising as it allows data from several data streams of an image to interact with each other to yield a high-level representation

i Relaxation and constraint analysis techniques are on the increase and will be increasingly reflected in future architectures.

C. A . I . and General Vision Systems

Computer vision will be a key factor in achieving many artificial intelligence applications. The goal is to move from special-purpose visual processing to general-purpose computer vision. Work to date in model-based systems has made a tentative beginning. But the long-run goal is to be able to deal with unfamiliar or unexpected input*. Reasoning in terms of generic models and reasoning by analogy are two approaches being pursued.

However, it is anticipated that it will be a decade or more before substantial progress will be made.

*TT computer vision systems move toward this goal, they will increasingly incorporate Expert System components using multiple knowledge sources. Gevarter (1982B) provides An Overview of Expert Systems, in which ACRONYM and VISIONS are considered to be examples of Expert Systems.

72 .

Barrow and Tenenbaum (1981, p. 594) indicate that no general vision system now exists, but most of the ”... pieces have been experimentally demonstrated. Thus it would not be unreasonable to attempt to construct one within the current state-of-the-art.”

D. Modeling and Programming

• Now emerging is 3D modeling, arising largely from CAD/CAM technology. 3D CAD/CAM data bases will be integrated with industrial vision systems to realistically generate synthesized images for matching with visual inputs.

Illumination models, shading and surface property models will be increasingly incorporated into visual systems. Volumetric models which allow prediction and interpretation at the levels of volumes, rather than images, will see greater utilization.

• High level vision programming languages (such as Automatix’s RAIL) that can be integrated with robot and industrial manufacturing languages are now beginning to appear and will become commonplace within 5 years.

• Generic representations for amorphous objects (such as trees) have been experimentally utilized and should become generally available within 5 years.

E. Knowledge Acquisition

Strategies for indexing into a large database of models should be available within the next 2 to 5 years.

"Training by being told” will supplement "training by example" as computer graphics techniques and vision programming languages become more common.

F . Sensing

An important area of development is 3D sensing. Several current industrial vision systems are already employing structured light for 3D sensing. A number of new innovative techniques in this area are expected to appear in the next 5 years.

• More active vision sensors such as lidar are now being explored, but are unlikely to find substantial industrial application until the last half of this decade

73 .

A number of other innovative techniques in 3D sensing are now being developed. Among these are the use of multiple light sources, multiple views, and shape from motion. Some of these techniques may .see commercial application within the next two years.

• Kruger and Thompson (1981, p. 33) observe that "By taking several views from particular positions and with carefully controlled illumination, it is possible to separate and independently measure the different surface propert ies." Industrial vision systems for inspection that use this technique will probably appear within the next several years.

It is anticipated that within two years solid-state cameras and convolvers will become available that will make stereo machine vision a reality.

G. Industrial Vision Systems

We will see increased use of advanced vision techniques in industrial vision systems, including gray scale imagery.

We are now observing a shortening time lag between research advances and their applications in industry. It is anticipated that in the future this lag may be as little as one to two years.

Advanced electronics hard ware at reduced cost is increasing the capabilities and speed of industrial vision, while simultaneously reducing costs.

• Because of low start-up costs and the importance of vision to industrial and other applications, new companies and organizations are rapidly entering the vision field.

It has been estimated that more than 200 companies are now playing a role in the vision field. A shakeout appears likely as the field settles down, but innovation will continue to encourage new entrants.

It is anticipated that special lighting and active sensing will play an increasing role in industrial vision

Better human/machine interfaces simplifying user reprogramming are now appearing and will become dominant in sophisticated applications within 5 years.

• Common programming languages and improved interface standards will within the next 3 to 10 years enable

74 "

easier integration of vision to robots and into the industrial environment.

H. Future Applications

It is anticipated that about one quarter of all industrial robots will be equipped with some form of vision system by 1990.

Arden (1980, p. 487) observes that "Increasingly, computer-vision techniques are being applied to real- world problems. This is particularly true of device assembly, circuit board layout, and inspection in the field of industrial automation. Although much of the work is still going on, several convincing demonstration programs have been written, and it is expected that computer vision will soon begin to have a significant impact in industry. At the same time, the computer-vision approach will increasingly be applied to the analysis of images by computer, areas which up to now have been the domain of researchers in pattern recognition — for example, the analysis of handwriting, photomicrographs and radiographs, and satellite imagery.

It is likely that in the order of 90 % of all industrial inspection activities requiring vision will be done with computer vision systems within the next decade.

• New vision system applications in a wide variety of areas, as yet unexplored, will begin to appear within this decade. An example of such a system might be visual traffic monitors at intersections that could perceive cars, pedestrians, etc., in motion, and control the flow of traffic accordingly.

• Computer vision will play a large role in future military applications. The Defense Mapping Agency intends to achieve fully automated production for mapping, charting and geodesy by 1995, utilizing "expert system "-guided computer vision facilities. Other future computer vision military applications include autonomous navigation and guidance for vehicles and missiles, target detection, the interpretat ion of aerial images for general surveilance purposes and for local battlefield surveilance. Computer vision will also play a large role in future battlefield robots.

Table X gives Binford's (1982) forecast for computer vision system applications.

75 Table X*

Example Future Applications for Computer Vision Systems

Short term (1-2 years)

• Industrial Vision Systems

t Cartography: Semi -Automated stereo for terrain mapping

Mid term (2-3 years)

• Cartography; Semi -Automated stereo mapping of complex cultural sites

• Photointerpretation - Monitoring of selected objects in restricted situations

Long term (3-5 years)

• 3D Systems for:

-warehousing -handling unoriented parts -inspection of non-laminar parts

• Cartography - automatic feature classification

• Photointerpretation - Automatic classification of a greater variety of objects with greater detail

Greater than 5 years

• Robotic operations in hazardous environments

• Autonomous navigation

• Vehicle Guidance

• Medical

• Aids to handicapped

More than a decade

• Home robots

• General robotic activities

j”l Observations of extra-terrestrial bodies"^

*Based on Binford (1982)

76 I Conclusion

In conclusion, the amount of activity and the many researchers in the computer vision field suggest that within the next 5 to 10 years, we should see some startling advances in practical computer vision, though the availability of practical general vision systems still remains a long way off.

77 REFERENCES

The following abbreviations for some journals and conference proceedings are used:

AI Artificial Intelligence

CGIP Computer Graphics and Image Processing

T-PAMI IEEE Transactions on Pattern Analysis and Machine Intelligence

1AAAI Firest Annual National Conference on Artificial Intelligence, The American Association for Artificial Intelligence, Stanford University, August 1980

AAAI-82 Proceedings of the National Conference on Artificial Intelligence, Univ. of Pittsburg, August 1982

4ICPR Fourth International Joint Conference on Pattern Recognition, Tokyo, November 1978

3IJCAI Third International Joint Conference on Artificial Intelligence, Stanford University, August 1973

5IJCAI Fifth International Joint Conference on Artificial Intelligence, Cambridge, Mass., August 1977

6IJCAI Sixth International Joint Conference on Artificial Intelligence, Tokyo, August 1979

I JCAI-81 Seventh International Joint Conference on Artificial Intelligence, August 1981

78 References

Albus, 0., Kent, E., Nashman, M., Mansbach, P., and Palombo, L., "A 6-D Vision System," Proceedings of SPIE Technical Symposium^ East '82, Arlington. VA, M*y 3-7, 1982:

Aggarwal, J. K., Duda, R. 0., and Rosenfeld, A. (eds.). Computer Methods in

Image Analysis , IEEE Press & Wiley, 1977.

Agin, G. J. and Binford, T. 0. "Computer Descriptions of Curved Objects,"

3IJCAI , 1973, pp. 629-640.

Agin, G. J., "Computer Vision Systems for Industrial Inspection and Assembly,"

Computer , May 1980.

Arden, B. W., (eds.). What Can Be Automated? Cambridge: M.I.T., 1980, pp. 482-487.

Baird, M. L., "Sight-1: A Computer Vision System for Automated IC Chip

Manufacture," IEEE Trans. Sys. Man Cybern. , vol . SMC-8, 1978.

Ballard, D. H., Brown, C. M., and Feldman, J. A., "An Approach to Knowledge- Directed Image Analysis," in Hanson and Riseman 1978a, pp. 271-281.

Ballard, D. H. and Brown, C. M., Computer Vision , Englewood Cliffs: Prentice Hall, 1982.

Barrow, H. G. and Tenenbaum, J. M., "MSYS: A System For Reasoning About Scenes", SRI AI Center, Tech. Note 121, March 1976.

Barrow, H. G. and Tenenbaum, J. M., "Recovering Intrinsic Scene Characteristics from Images," in Hanson and Riseman, 1978a, pp. 3-26.

Barrow, H. G., "Artificial Intelligence: State-of-the-Art". SRI International, Menlo Park, CA, Tech. Note 198, October 1979.

Barrow, H. G. and Tenenbaum, J. M. "Interpreting Line Drawings as

Three-Dimensional Surfaces," 1AAAI , 1980, pp. 11-14.

Barrow, H. G. and Tenenbaum, J. M., "Computational Vision", Proceedings of

the IEEE , Vol. 69, No. 5, May 1981, pp. 572-595.

Binford, T. 0. Inferring Surfaces from Images, Artificial Intelligence, Vol. 17(1-3), 1981, pp. 205-244.

Binford, T. 0., "Survey of Model-based Image Analysis Systems,"

Robotics Research , Vol. 1, No. 1, Spring 1982.

Bolles, R. C. "Verification Vision Within a Programmable Assembly System,"

AIM-295, STAN-CS-77-591 , Computer Science Dept., Stanford University, 1976.

Bolles, R. C., "Robust Feature Matching Through Maximal Cliques,"

SPIE , Vol. 182, Imaging Applications for Automated Industrial Inspection and Assembly, 1979.

Brady, M., "Computational Approaches to Image Understanding," M.I.T. A. I. Memo No. 653, October 1981 A. (Also in Computing Surveys, Vol. 14, No. 1, Mar. 1982, pp. 3-71).

Brady, M., "The Changing Shape of Computer Vision," Artificial I ntelligence , 17(1-3), 1981B, pp. 1-15.

79 . "

Brooks, R., Greiner, R. and Blnford, T. 0., The ACRONYM Model -Based Vision System, Proc. Int. Jt. Conf. Artificial Intelligence 1979, 6, 105-113.

Brooks, R., "Symbolic Reasoning Among 3-D Models and 2-D Images.9 AI_ 17, 1981, pp. 285-348.

Brooks, T. L. "Supervisory Manipulation Based on the Concepts of Absolute vs. Relative and Fixed vs. Moving Tasks," Proceedings of the

International Computer Technology Conference , sponsored by ASME, San Francisco, August 1 980, pp.185^196!

Chin, R. T., "Automated Visual Inspection Techniques and Applications:

A Bibliography," Pattern Recognition . Vol. 15 (4)., 1982, pp. 343- 357.

Inspection Techniques, Chin, R., Harlow, C. A., Dwyer, S. J., "Automated MI), Nov. 1977. In Proc. Assembly IV Conf. Exposition (Detroit,

2 79-116 - Clowes, M. B., "On Seeing Things," 1971, AI. * PP- Handbook A., Chap. XIII, Vision, The Cohen, P. R., and Feigenbaum, E. Vol. II, Los Altos, CA: Kaufmann, of Artificial Int elligence , T582, pp."T25-TFn

Draper, S. W., "The Use of Gradient and Dual Space In Line-Drawing Interpretation," AI 17, Aug. 81, pp. 461-508.

Duda, R. 0. and Hart, P. E., Pattern Recognition and Scene Analysis , Wiley, 1973.

Eberlein, R. B., "An Iterative Gradient Edge Detection Algorithm," CGIP 5, 1976, pp. 245-253.

Faugeras, 0., Price, K., "Semantic Description of Aerial Images Using

Stochastic Labelling", Proc . ARPA Image Understanding Workshop , Univ. of Md., April 1980, p.8?I

Fennema, C. L. and Thompson, W. B., "Velocity Determination In Scenes Containing Several Moving Objects," CGIP 9, 1979, pp. 301-315.

Fischler, M. A., "On the Representation of of Natural Scenes," in

Computer Vision Systems . Hanson and Riseman, (Ed.), (1978a), pp. 47-5 1

Fischler, M. A., "Computational Structures for Machine Perception," in Advanced Computer Concepts, J. C. Solinsky (ed.). La Jolla '7' ' ' ' Ins'i T . . , "T9 8 , pp 47-35

Fischler, M. A., and Bolles, R. C., "Random Sample Consensus: A paradigm for model fitting with applications to image analysis and

automated cartography," CACM , Vol. 24(6), June 1981, pp. 381-395.

Fischler, M. A., Tenenbaum, J. M., and Wolf, H. C., "Detection of roads and linear structures in low-resolution aerial imagery

using a multisource knowledge integration technique," CGIP . Vol. 15(3), March 1981, pp. 201-223.

Fischler, M. A., Barnard, S. T., Bolles, R. C., Lowry, M., Quam, L., Smith, G., and Witkin, A., "Modeling and Using Physical

Constraints in Scene Analysis," AAAI-82 . Aug. 1982, pp. 30-35.

Garvey, T. D., "Perceptual Strategies for Purposive Vision", SRI AI Center Tech. Note 117, 1976.

Jarvis, J. F., "Automated Visual Inspection of Printed Wiring Boards by

Local Pattern Matching," IEEE Trans. Patt. Anal. Machine Intelligence , vol. 2, pp. 77-82, Jan. 1950.

Gennery, D. B., "Modelling the Environment of an Exploring Vehicle by Means of Stereo Vision," AIM-339, STAN-CS-80-805, Computer Science Dept., Stanford University, 1980.

Gennery, et al., "Computer Vision", JPL Publ . 81-92, Nov. 1, 1981.

80 ,

D1 ag ra " of the Huma" Brain as ior Artificial „ „ a Model Intelligence," Proc. of the IEEF Tntar r nn r C£b. end Society Wash, D.C., SepH T?77Tppr¥9f?£iTr —

Gevarter, W. B., An Overview of Artificial Intelligence and Robotics ,

Vol . II - Robotics , NBSIR 82-2479, National Bureau of Standards, Wash., D.C., March iy«ZA.

Gevarter, W. B., An Overview of Expert Systems , NBSIR 82-2505, National Bureau of Standards, Wash., D.C., May 1982B.

Gilbert, A. L. Giles, M. K., Flachs, G. M. Rogers, R. B., and U.Y.H., "A Real-Time Video Tracking System," T-PAMI 2, 1980, pp. 47-56.

Griffin, M. D., Cunningham, R. T., and Eskenazi, R., "Vision-Based a C ® 0f an Automated Roving Vehicle," Sr^ r J paper 78-1294 Proceedinos Guidance and Control 9~ MM Conference . Palo Alto, August 1978^

Hanson, A. J. and Fischler, M. A., "The DARPA/DMA Image Understanding Testbed, Proceedings of the Image Understanding Workshop, Stanford U., Sept. 1962, pp. 3^42-351 .

Hanson, A. R. and Riseman, M. (eds.) E. Computer Vision Systems , New York: Academic Press, 1978a.

Hanson, A. R. and Riseman, E. M. (1978b), "Segmentation of Natural Scenes," in Hanson and Riseman (1978a), pp. 129-163.

Hanson, A. R. and Riseman, E. M., (1978c), "VISIONS: A Computer System for Interpreting Scenes," in Hanson and Riseman (1978a), op. 303-333.

Herman, M., Kanade, T., and Kuroe, S., "Incremental Acquisition of a Three-dimensional Scene Model from Images," Proc. of the DARPA

I.U. Workshop , Stanford U., Sept. 1982, pp. 175-192.

Hiatt, B., "Toward Machines that See," Mosaic , Nov/Dec 1981, pp. 2-8.

Hirzinger, G. and Snyder, W., "Analysis of Time-Varying Imagery for Tracking Moving Objects," Proceedings of the International Computer

Technology Conference , sponsored by ASME, San Francisco, August 1980, pp. 26-33.

Holland, S. W., Rossol , L. and Ward, M. R., "Consight-I: A Vision Controlled Robot System for Transferring Parts from Belt Conveyors,"

Computer Vision and Sensor-Based Robots , G. G. Dodd and L. Rossol, Eds. New York: Plenum Press, 1979, pp. 81-100.

Horn, B. K. P., "Artificial Intelligence and the Science of Image

Understanding," in Computer Vision and Sensor-Based Robots , G. G. Dodd and L. Rossol (Eds.), N.Y.: Plenum Press, 1979, op. 69-77.

Horn; B. K. P. and Schunck, B. G. "Determining Optical Flow,"

Artificial Intelligence , 1981, 17(1-3).

Hsieh, Y. Y. and Fu, K. S., "A Method for Automatic IC Chip Alignment , and Wi re Bondi ng " in Proc. IEEE Computer Soc. Conf . Pattern Recognition and Image Processing , (Chicago, IL) , 1979.

Hsieh, Y. Y. and Fu, K. S., "An Automatic Visual Inspection System for Integrated Circuit Chips," Comput. Graph Image Processing, vol. 14, 1980, pp. 293-343.

Huffman, D. A., "Impossible Objects as Nonsense Sentences," in Machine Intelligence 6, B. Meltzer and D. Michie (eds.), Halsted Press, 1971 pp. 295-323. (Also in Aggarwal et al_. (1977), pp. 338-366,)

Ikeuchi, K. and Horn, B. K. P., "Numerical Shape from Shading and

Occluding Boundaries," Artificial Intelligence , 1981, 17.

81 Jarvis, J. F., "A method for automating the visual inspection of printed wiring boards, in Proc. SITEL-LILG Seminar Pattern Recognition (Univ. Leige, Sart-Telraan, Belgium), Nov. 1977.

Kanade, T., "Model Representations and Control Structures in Image Understanding," Proc. of IJCAI-77 . Cambridge, Aug. 1977, 1074-1082

Kanade, T., "Recovery of The Three-Dimensional Shape of An Object from a Single View," AI. 17, Aug. 81, pp. 409-460.

Kashioka, S., Fjlrl, M., and Sakamoto, Y., "A Transistor Wire Bonding System Utilizing Multiple Local Pattern Machine," IEEE Trans. Svst.1 Mar. Cybern ., vol. SMC-6, 1976.

Kelly, M.D., "Edge Detection in Pictures by Computer Using Planning," In Machine Intelligence6 . B. Meltzer and D. Mlchie (eds.), Halsted' Press, 1971, pp. 397-409. (Also In Aggarwal et al.. (1977), pp. 228-244.)

Krakauer, L. and Pavlidis, T., "Visual Printed Wiring Board Fault Detection by a Geometrical Method," in Proc. COMSAC (Chicago, IL). Nov. 1979, pp. 260-265.

Kruger, R. P. and Thompson, W. B., "A Technical and Economic Assessment of Computer Vision for Industrial Inspection and Robotic Assembly,"

Proceedings of the IEEE , Vol. 69, No. 12, Dec. 1981, pp. 1524-1538.

Landgrebe, D. A., "Analysis Technology for Land Remote Sensing,"

Proc. of the IEEE , Vol. 69, No. 5, May 1981, pp. 628-642.

Levine, M., "A Knowledge-Based Computer Vision System", In Computer

Vision Systems , Hanson, A., Riseman, E., eds.. Academic Press, New York, 1978.

Lowe, D. G., and Binford, T. 0., "The Interpretation of Geometric

Structure from Image Boundaries," Proc. Image Understanding Workshop , ed., Lee Baumann S., 39-46, 1981.

MacVicar-Whelan, P. J., and Binford, T. 0., "Line Finding With Subpixel Precision," Proc. Image Understanding Workshop, ed., Lee Baumann S, 26-31, 1981.

Marr, D., "Representing Visual Information," in Computer Vision Systems , A. Hanson and E. M. Riseman, Eds.; New York: Academic Press, 1978, pp. 61-80.

Marr, D. C., Vision , San Francisco: W. H. Freeman, 1982.

Marr, D. and Nishlhara, H., "Visual Information Processing: Artificial intelligence and the Sensorium of Sight," Technology Review, October 1978, pp. 28-47.

Marr, D. and Hildreth, E. "Theory of Edge Detection," AI Memo No. 518, Artificial Intelligence Laboratory, Mass. Institute of Technology, April 1979.

Martelli, A. "An Application of Heuristic Search Methods to Edge and Contour Detection," CACM 19, 1976, pp. 78-83. (Also in Aggarwal et al. 217-227. (1977), pp. ")

Minsky, M. L., "A Framework for Representing Knowledge," In. P. H. of Vision, New York: Winston (Ed), The Psychology~ Computer ." McGraw-Hill, 19757~p p £tT=STT.

Nagao, M., Matsuyama, T., Ikeda, Y., "Region Extraction and Shape Analysis of Aerial Photographs", Proc 4ICPR , 1978, p. 620.

Nagao, M., Matsuyama, T., A Structural Analysis of Complex Aerial

Photographs , Plenum, 1980.

Nagel, R. N., Vanderbrug, G. J., Albus, J. S., and Lowenfeld, E., "Experments in Part Acquisition Using Robot Vision," SME Tech. Paper MS79-784, 1979.

Nevatia, R. and Babu, K. R., "Linear Feature Extraction and Description,"

6IJCAI , 1979, pp. 639-641.

82 . ,

Nevatia, R., Machine Perception . Englewood Cliffs, NJ: Prentice-Hall,

Nltzan, 0., Rosen, C., Agin, G., Bolles, R., Gleason, G., Hill, J., McGhie, D., Prajoux, R., Park, W., and Sword, A., "Machine Intelligence Research Applied to Industrial Automation" 9th Report, SRI International, August, 1979.

Nudd, G. R., "Image Understanding Architectures ," Proc. of the

National Computer Conf . , 1980, pp. 377-386.

Ohta, Y., "A Region-Oriented Image-Analysis System by Computer", Thesis Dept, of Information Science, Kyoto University, 1980.

Parma, C. C., Hanson, A. M., Rlseman, E. M., "Experiments in Schema-Driven Interpretation of a Natural Scene", Univ. of Mass. COINS Tech. Rept. 80-10, 1980.

Perkins, W. "Model-Based Vision System for Industrial Parts," A. ,

IEEE , Trans . Comput. , vol. C-27, 1978.

Pinkney, H. F. L., "Theory and Development of an On-Line 30 Hz Video Photogrammetry System for Real-Time 3-Dimensional Control," Proceedi nqs of the ISP Symposium on Photogrammetry for Industry , Stockholm, August, 1978.

Pratt, W. K., Processing , New York: Wiley, 1978, pp. 568-587.

Prewitt, J.M.S., "Object Enhancement and Extraction," in Picture Processing and Psychopictorics (B. S. Lipkin and A. Rosenfeld, eds.) New York: Academic Press, 1970, pp. 75-149.

Reddy, R. and Newell, A., "Image Understanding: Potential Research Approaches." ARPA Image Understanding Workshop, Washington, DC, 1975.

Roach, J. W. and Aggarwal, J. K., "Computer Tracking of Objects Moving in Space," T-PAMI 1, 1979, pp. 127-135.

Roberts, L. G., "Machine Perception of Three-Dimensional Solids," in Tippitt et al Optical and Electroptical Information -Processing , J. T. . Eds, Cambridge, MA, MlT Press, 1965.

Rosenfeld, A., "Image Pattern Recognition", Proceedings of the IEEE , Vol. 69, No. 5, May 1981, pp. 596-605.

Rosenfeld, A., "Picture Processing, 1981, "Computer Vision Lab Rept. TR-1134, U of MD, College Park, Jan. 1982. (Also in CGIP, May 1982.)

Rubin, S., "The ARGOS Image Understanding System", Proc ARPA IU

Workshop , Nov. 1978, also "The ARGOS Image Understanding System", Thesis, Carnegi e-Mel Ion University, 1978.

Sarris, V., "The Development of Robot Vision", Charles River Associates, Boston, Mass., 1982.

Saund, E., Gennery, D. B., and Cunningham, R. T. (1981), "Visual

Tracking in Stereo," Joint Automatic Control Conference , sponsored by ASME, University of Virginia, June 1981

83 .

Shapiro, L. 6., Morlarty, J. D., Mulgoankar, P. 6., and Harallck, R. M., "Sticks, Plates, and Blobs: A Three-Dimensional Object Representation for Scene Analysis," 1AAAI , 1980 pp. 28-30.

Shlral, Y., "Analyzing Intensity Arrays Using Knowledge About Scenes," In Winston (1975), pp. 93-114.

Shlral, Y., "Recognition of Real-World Objects Using Edge Cue," In Hanson and Rlseman (1978a), pp. 353-362.

Stevens, K. A., "The Visual Interpretation of Surface Contours," ““AI 17, August 81, pp. 47-74.

Tenenbaum, et al., "Prospects for Industrial Vision", In Computer Vision and Sensor-Based Robots", 6. G. Dodd and L. Rossol (Eds.), WYTPTenUm Press, 1979, pp7"ZT9-259.

Tsugawa, S., Yatabe, T., HI rose, T., and Matsumoto, S., "An Automobile with Artificial Intelligence," IJCAI , 1979, pp. 893-895.

Ullman, S., The Interpretation of Visual Motion, Cambridge, Mass: MIT Press, 157?.

Vamos, T., Bathor, M., and Mero, L., "A Knowledge-Based Interactive Robot-Vision System," 6IJCAI, 1979, pp. 920-922.

Waltz, D. L., "Generating Semantic Descriptions from Drawings of Scenes with Shadows", M.I.T. Tech. Rept. AI-TR-271, Cambridge, MA, Nov. 1972.

Waltz, D., "Understanding Line Drawings of Scenes with Shadows," In

Winston (1975) • pp. 19-91.

Wesley, M. A., Lozano-Perez, T., Liberman, L. I., Lavln, M. A., and Grossman, D. D., "A Geometric Modeling System for Automated Mechanical Assembly," IBM Journal of Research and Development 24, 1980, pp. 64-74.

Winston, P. H. (ed.), The Psychology of Computer Vision , McGraw-Hill, 1975.

Woodham, R. J., "Analyzing Curved Surfaces Using Reflectance Map Techniques,"

In Artificial Intelligence : An MIT Perspective . P. H. Winston and 575", R. H. Brown (eds. ) , MIT Pre s slH pp .1 61-1 82 Woodham, R. J., "Analyzing Images of Curved Surfaces," AI 17, August 81,

Yachlda, M. and Tsujl, S., "A Versatile Machine Vision System for Complex

Industrial Parts," IEEE Trans. Comput. , vol. C-26, 1977.

84 • 4aHUTA^-H JC3V3.J WOJ t'-lfeS Mil;

*?r

'

. .

' i

.

APPENDICES

;

.

/ i W * , qJ ,1 e-i i _

1

• t . , 'V ;

,

' .

APPENDIX A*

LOW LEVEL FEATURES

The scene to be analyzed is usually sensed by a digital camera or other similar device, the output of which is normally a digitized image having an array of brightness values. For some purposes these brightness values can be operated upon directly to obtain desired information about the scene, but it is usual to extract low level features for further computer processing. The following sections describe the low level features usually considered for extraction.

A. Pixels (Picture Elements)

Pixels are the individual elements in a digitized array.

They usually represent brightness and perhaps color in a projection from a three dimensional scene, but could also represent distance in a range image.

B. Texture

Texture is a local variation in pixel values that repeats in a regular or random way across a portion of an image or object.

Texture can sometimes be used to identify the object being sensed, or it can be used for approximating range and surface orientation in a known object. However it can also be a noise source in processing the image.

C . Regions

A region is a set of connected pixels that show a common property such as average gray level, color or texture in an image

^This a p p e n d i x is based largely on Gennery et al. (1981).

86 D. Edges and Lines

An edge is a step in pixel values (exceeding some threshold) between two regions of relatively uniform values. A line is defined as a thin region of roughly uniform pixel values between two regions of different but roughly equal pixel values. Line representations are extracted from edges.

E . Corners

A corner is an abrupt change in direction of a curve.

Corners are useful in data compression approaches to representing straight edges, and as points for feature matching.

87 APPENDIX B

EXTRACTING EDGES AND AREAS

A natural first step in analyzing a scene is to convert it into a sketch, that is, find the edges that separate regions of differing brightnesses. Edges correspond to abrupt changes in brightness. Such changes can be identified as places where the first derivative of the brightness is suddenly high or the second derivative is zero (see Figure 8). There are various schemes for doing this, all in some way related to taking brightness differences between adjacent points.

A. Extracting Edges

The basic methods for extracting edge and line elements from images are*:

1 . Linear Matched Filtering:

Successively convolve** image windows with a template of the desired feature and seek the maximum value.

2. Non-linear Filtering:

Convolve windows in the image with a local operator

(weighting function that approximates first or second derivatives by first or second differences). Examples of operators for doing this are shown in Table I. In general, each point in the image is convolved with directional operators in as many directions as

*This section is based largely on methods described in Rosenfeld (1981, p. 601), Gennery, et al., (1981, pp. 2-8 to 2-14), and Brady (1981A). Additional material can be found in Ballard and Brown (1982), Binford (1981), and Nevatia (1982).

**Convolve means superimposing a nxn operator over a nxn pixel area(window) in the image, multiplying corresponding points together and summing the result.

88 Intensity: I

i

Figure 8 Intensity Variations at Step Edges.

89 needed. The resultant outputs at each point are combined to determine the gradient vector (the orientation and magnitude of the intensity changes).

90 e.

Table 1

Ex amp 1 es o f Non-L inear Filtering* for Extract ina Edge and. LJJiS. E^erngnts

Appr oach Edge Criteria Aemar fcs

I . Edge Operators

Detect first derivative of brightness: af dx Sobel Operators pixels wh i ch operators -1 0 1 yield max t uned for

) imi ted -2 0 2 Edge values Mask range of -1 0 1 oper a t i on

For each pixel,

-100 -100 0 100 100 f i nd ang 1 oper- ators that yield similar operators -100 -100 0 100 100 maximum value. used for each -100 -100 0 100 100 Then thin, 30 ' angle. threshold and -100 -100 0 100 100 1 ink (line fit). -100 -100 0 100 100

*Nevatia and Babu Operators

I l . Bar Operators 2 Detect second derivative d f a?

-1 2 -1 look for zero sensative to noise

-1 2 -1 crossing Bar Mask -1 2 -1

-1 2 -1

•"Convolving a image window (about a pixel) with operators such as those indicated. Operators shown are for finding vertical lines.

91 .

3. Local Thresholding

Apply local thresholding and discard responses that do not lie on borders (between upper and lower threshold regions) and link responses that do.

4. Surface Fitting - The Hueckel Operator

Fit a surface to neighborhood of each pixel and compute maximum gradient of the surface. Consider as edge points those pixels having surface maximum gradients above a selected threshold value. This approach was first devised by Prewitt

(1970). The Hueckel Operator is a popular method for doing this.

5. Rotationally Insensitive Operators :

The Laplacian Operator (71 , related to the magnitude of the derivative of the intensity gradient) is insensitive to the direction of a line and yields edge elements at pixel points where the Laplacian is zero. Thus discrete approximations to the Laplacian have proved useful in line finding

6. Line Following :

Shirai (1975) devised a line following method that used a pair of parameters that varied according to how continuously and smoothly elements were found. These parameters determined thresholds for accepting a new element according to how close it was to the linear continuation of the current line being tracked.

7. Global Methods

Martelli (1976) devised a global hueristic search that

operates directly on the brightness values. A cost function

is optimized depending on the curvature of the candidate

92 "

line and the degree to which the candidate line succeeds in dividing the image into regions of different brightnesses.

Kelly (1971) used a hierarchical refinement approach, first finding lines in a coarse image and using the results to guide line finding in a higher resolution image.

Eberlein (1976) utilized a relaxation approach for linking edges found by a local detector, depending on how the edges agreed with their local neighbors. This was a parallel method that merged the elements into a continuous line.

The Hough Transform (Duda and Hart, 1973), is a global parallel method for finding straight or curved lines. For a straight line, using results from local edge detectors, the perpendicular distance (p) from the line element to the origin and the angle (6) of the normal to the line is determined and mapped into (p,e) space. Peak clusters in

(p, ©) space are considered to be straight lines.

Fischler, Tenenbaum and Wolf (1981) describe a new paradigm for detectingandprecisely deliniating roads and similar

"line-like" structures appearing in low-resolution aerial imagery: The approach combines "local information from multiple, and possibly incommensurate, sources, including various line and edge detection operators, map knowledge about the likely path of roads through an image, and generic knowledge about roads (e.g. connectivity, curvature, and width constraints). The final interpretation of the scene is achieved by using either a graph search or dynamic programming techniques to optimize a global figure of merit .

93 B. Edge Finding Variations

There are many approaches which can be considered to be variations, combinations and extensions of the basic approaches to edge finding considered in section A.

For example, Marr and Hildreth (1979) utilized the fact that different edges are found depending upon the size of the edge masks. They also observed that bar masks seem to give more reliable information than edge masks. They used bar masks of different panel widths and combined their outputs to reduce effects of noise and to compute the fuzziness of an edge. They extended this method based on their observations that intensity changes are localized in space and in (spatial) frequency. They note that using a Gaussian filter* optimized localization in both domains simultaneously. They thus convolved the original image with the Laplacian of the Gaussian smoothing filter for each spatial frequency used. Edges were considered to occur where zero crossings from several spatial frequency channels concurred.

C. Linking Edge Elements , and Thinning Resultant Lines

Due to imperfections in edge element finding techniques, situations where edges are poorly defined and noise in the image, the primal sketch will usually consist of discontinuous and somewhat scattered edge elements. Various schemes (heuristics) exist to connect these edge elements together to form lines.

*An averaging procedure about a pixel in which the influence of neighboring pixels fall off with distance, according to a Gaussian distribution.

94 Long edges or lines can be found either by using an edge detector (as discussed _in the previous section) and linking the resultant edge elements into a long smooth curve (filling in gaps and ignoring stray elements), or by a procedure which accomplishes a similar result by operating directly on the image data. In either case, if the algorithm operates sequentially by proceeding along the curve as it links edge elements or pixels, it often is called a line follower (or tracker), edge follower, or curve follower. However algorithms have also been devised that operate on an effectively parallel or gestalt basis.

Eberlein's (1976) relaxation method yields a thin line naturally upon convergence. Nevatia and Babu's (1979) approach accepts as edge portions those candidate edge elements found that have a maximal gradient value compared to adjacent pixels with a similar gradient orientation.

When deriving curves from edge data, it is often desirable to thin the resulting contours. Thinning methods reduce the contours to a single-pixel width by discarding redundant edges while maintaining the continuity of the contours. Some methods such as Eberlein’s or Nevatia and Babu's include thinning as an inherent part of their operation.

D. Remarks on Edge Finding

Binford (1981) states that it is important to distinguish between detection of an intensity change and its subsequent localization. Thus, he considers the zero crossing of the second derivative of the intensity good for localization of feature points but not for detection; while the maximum of the first derivative is good for detection, but not for localization.

Combining the two effects and using linear interpolation,

95 MacVicar-Whelan and Binford (1981) report being able to localize edges to sub-pixel accuracy.

Gennery, et al., (1981) state that for poor quality images, the performance of all the various detectors degrade, but in different ways. None can be considered to be the last word in edge detectors.

E . Extracting Regions

Many of the edge finding approaches are designed to perform best when the edges can be approximated reasonably by a series of linked straight lines. In natural scenes, this approach can lead to difficulties.

An alternative approach to edge finding is to partition an image into regions of approximately uniform brightness corresponding to surfaces. Unlike edge linking, "region growing" does not require the assumption that the boundaries are straight.

Region growing can be accomplished by initially partitioning the image into elementary regions of constant brightness, and then successively merging adjacent regions having sufficiently small brightness differences, until only boundaries with strong contrast remain. The merging can be done somewhat in parallel by computing merge merits for all pairs of adjacent regions, and merging all pairs that have mutually highest merit. Another advantage of region growing over edge finding is that this technique generalizes more readily to characteristics other than brightness, such as texture, color, size and shape, which are important in natural scenes.

The simplest vision systems use a global threshold to obtain a binary image - an approach commonly used in industrial vision

96 .

systems. Thresholding can also be local or even dynamic. When histograms* of image pixel intensitites are used, it is usual to dissect the image by thresholding at a value in a valley of the histogram so as to give strong peaks on either side of the threshold value. This region splitting approach can be applied recursively until no more regions can be split. Ohlander et al.

(1978) used this approach, computing histograms in each of nine colors and thresholding on the parameter that yielded the best histogram for splitting.

NASA has employed spectral analysis for segmenting regions in LANDSAT imagery (c.f., Landgrebe, 1981).

^Frequency counts of the occurrence of each intensity in an image

97 .

APPENDIX C

SEGMENTATION AND INTERPRETATION

A. The Computer Vision Paradigm

Starting with an image of a scene, the goal of a computer vision system is to identify the objects and their relationships in the scene. To accomplish this, it is customary for the system to segment the image into surfaces or edges associated with the objects, and then use the resulting information, together with domain knowledge, to generate the desired scene description. In this appendix we will review techniques used to do this segmentation

B. An Early Bottom-up System

A landmark program in machine perception was developed by

Roberts (1965) to recognize various three dimensional polyhedral object configurations. Roberts employed an image-dissector camera to look at blocks-world scenes involving blocks, wedges, hexagonal prisms, or objects formed by sticking these together.

His program could determine the location, orientation, and dimensions of the objects. The program could demonstrate its

"under stand ing ,” by displaying a drawing of the scene observed from any desired viewpoint.

Roberts’ program first found the places in the images where brightness or shading changed abruptly, corresponding to points on the edges of the object. Then by linking these points, it produced a line drawing of the scene. The line drawing was interpreted by finding triangles, quadrilaterals, and hexagons, which suggested possible objects (triangles suggest wedges, etc.) and eventually accounted for all the lines and junctions as

93 edges and corners of objects. From the resulting appearance of the object in the image, the program was able to compute its dimensions, location, and orientation.

C . Problems with Bottom-Up Systems

Barrow and Tenenbaum (1981, p. 570) note that a major problem with sequential program organization used by Roberts and many of his successors:

...is the inherent unreliability of segmentation. Some surface boundaries may be missed because the contrast across them is low, while shadows, reflections, and markings may introduce extra lines and regions. The interpretation phase, when presented with a corrupted segmentation, may be unable to produce an explanation, and hence cause the entire system to fail.

Partitioning an arbitrary image into regions corresponding to objects or object surfaces is fundamentally impossible without exploiting scene models. First, there is no basis for deciding which image features are significant at the level of objects and which are not. Second, there is no good pictorial criterion for filling in missing features. Third, the very notion of an object is ill defined, being largely determined by convention and experience.

D o Interpretation-Guided Segmentation

Several research teams tried to overcome these problems by integrating the segmentation and interpretation phases. One simple approach used was to try to recognize objects from partial matches obtained using models and then to try to verify the results by attempting to find evidence that supported image features previously missed.

Techniques were also developed for region-based systems. The general approach being:

1. For regions with uniform attributes such as intensity,

color or texture, assign sets of possible object

99 .

interpretations based on knowledge of possible object

surfaces and the contextural contraints associated with

assignments in adjacent regions. For instance, a road

cannot be surrounded by sky.

2. Merge adjacent regions with comparable interpretations.

3. Reevaluate interpretations based on contextual

constraints associated with the new adjacent regions.

4. Continue alternating merging and interpreting until all adjacent regions have disjoint interpretations

unviolated by contextual constraints.

Both line-based and region-based interpretation-guided segmentation systems have been devised that have performed well in a variety of complex scene domains. However, the approach is not suitable for a general-purpose vision system as it depends on prior knowledge of expected objects. Unknown objects cannot be recognized or even described. Thus for unknown objects, levels of scene descriptions below the level of complete objects are needed

E. Use of General World Knowledge to Guide Segmentation

Marr and Nishira (1978) observe that as the primal sketch is typically a large and unwieldy collection of data, the next step is to decode it--traditionally by " ... a process called segmentation whose purpose is to divide a primal sketch, or more generally an image, into regions that are meaningful, perhaps as physical objects." It makes sense to use any general knowledge that might help in the interpretation. An example of such

100 . .

knowledge is information on the physical nature of edges of objects

Huffman (1971) and Clowes (1971) devised an approach to enable the interpretation of perfect line drawings of polyhedral objects without having to resort to heuristics. They recognized that each line in the picture represented either a convex edge, a concave edge, or an occluding edge in a three-dimensional scene. From this, they constructed a catalog of possible vertices with allowable line labellings. A scene could then be analyzed by starting at one vertex and proceeding through the line drawing performing a tree search, limiting the number of possible line labellings at each step according to the catalog, until a consistent labelling for the entire scene is obtained. Waltz

(1975) further extended this technique to include shadows and cracks. His catalog included several thousand possible vertex types. He used a relaxation-type procedure to decide on the correct labelling for each line according to the possibilities in the catalog. The resultant procedure converges rapidly (usually to a unique interpretation) regardless of the complexity of the scene

A move toward a more general approach to the problem of interpreting feature point segments as lines and edges has recently been made by Binford (1981) and Lowe and Binford (1981).

In their scheme, a segment is interpreted as a space curve, and constraints are formulated based on coincidence, and those situations in which a curve corresponds to a true edge or bounding contour.

101 .

A comprehensive approach to deriving a physical sketch of a scene from one or more images has been taken by Fischler et al.

(1982). They use a priori knowledge of global and extended constraints to guide the segmentation and interpretation process.

Their approach involves modeling physically meaningful information such as the imaging process, the scene geometry and elements of the scene content. They utilize knowledge about such factors as

the camera model, vanishing points, geometric distortion, ground plane, geometric horizon, skyline, semantic context (urban or rural scene, etc.) physical surface models and edge classification

102 APPENDIX D

2-D REPRESENTATION, DESCRIPTION AND RECOGNITION

This appendix presents a number of 2-D representations and descriptions useful for further processing and recognition.

A . Pyramids

A pyramid data structure represents an image at several levels of resolution simultaneously. The base of the pyramid is the original full resolution image, usually assumed to be a n x n square array. The next level of the pyramid is typically formed by partioning the image into non-overlapping 2 by 2 cells and mapping (usually by average gray level) the four pixels in each cell to a single pixel in the next level*. This is repeated, level by level, until the image is compressed into a single pixel at the top level. The usefulness of pyramids lies in being able to extract features at an appropriate level of resolution.

B . Quadtrees

A quadtree representation of a n x n image is obtained in a top-down manner by recursively splitting the image into quadrants, the quadrants into subquadrants, etc. The process continues until all pixels in a quadrant are uniform with respect to some feature (such as gray level). The terminal leaves of a quadtree are uniform regions of varying sizes, thus being a useful first phase in segmenting a image into regions.

C . Statistical Features of _a Region

Once an image has been segmented, a description of each region, or blob, can be generated as a list of statistical

*Other par t ion ings and mappings are common.

103 features. These features typically include perimeter, area, c.g.,

first and second order moments, color, etc. The individual blob descriptors are linked to form a tree data structure which

represents nesting relationships. The parent of any blob in the

tree is the adjacent blob which completely surrounds it.

Recognition is performed by matching the statistical features

with those of stored prototypes. The SRI Vision Module and GM’s

CONSIGHT use this approach.

D. Boundary Curves

The boundary of a region can be represented by a chain of

straight lines and arcs. The resulting compressed boundary descriptions are sometime referred to as ’’chain codes" or

"concurves." Gennery et al. (1981, p. 3-2) note that "The main advantage of the concurve representation is that objects may be recognized on the basis of partial views by matching a subset of

the lines and arcs in a model concurve with the image data."

E. Run-Length Encoding

For a binary image, it is possible to segment the image into

edges and regions by sequentially scanning the image and

recording the edge points (where pixels change from zero to one

or vica versa). This process of reducing a binary image to a set of edge points is called run-length encoding, and has been

successfully used in the SRI Vision Module and a number of

sophisticated commerical vision systems derived from that module.

F. Skeleton Representations and Generalized Ribbons

In this approach, a planar region is represented by a

skeleton which consists of the medial line (locus of points

104 . .

equidistant from the boundaries of the region) and the perpendicular distance from the boundary for each point on the medial line. In some cases, a complex region can be constructed as the union of these generalized ribbons (the 2-D version of generalized cones described in Appendix F)

G . Representation by a Concatenation of Primitive Forms

A region can be built up from a collection of squares, rectangles or other shapes. The "Maximal Block" approach uses a union of squares of various sizes.

H. Relational Graphs

An image that has been segmented into regions can be described in terms of a relational graph, whose nodes represent regions and whose arcs represent properties (such as shape and size) and relations (such as "infront of" and "adjacent to").

Corresponding views of known objects can be similarly represented, and recognition can be achieved by matching the graphs

I . Recognition

Recognition consists of matching a description derived from an image to a description of a stored model. Recognition can be accomplished by correlation, which for binary data reduces to template matching. A more elaborate approach is statistical pattern classification using features such as described in

Section C. Relaxation and syntactic analysis approaches

(described elsewhere) have also been used. Fischler and Bolles

(1982) suggest "random sample consensus" as a paradigm for selecting the model that provides the best match to the data and for computing the best values of the free parameters.

105 APPENDIX E

RECOVERY OF INTRINSIC SURFACE CHARACTERISTICS

A. Basic Approach

As indicated earlier, it is helpful in many cases to assist in finding 3-D surfaces and volumes for interpretation, to go beyond the 2-D representation of edges and regions to a representation proposed by Marr (1978) of MIT, called the 2.5-D sketch consisting of surface distances and orientations. Such a sketch can be constructed from the surface characteristics which are intrinsic to the scene and are not dependent upon idiosyncracies of viewpoint of the sensor.

Barrow and Tenenbaum (1981, pp. 581-582) indicate that these intrinsic characteristics of surfaces are appropriately represented as a set of arrays in registration with the image array. Each array corresponds to a particular intrinsic characteristic such as surface reflectance, surface orientation, incident illumination and range. Each array contains values for its intrinsic characteristic at the surface element visible at the corresponding point in the sensed image. It also explicitly indicates boundaries due to discontinuties in value or gradient of the characteristic. Such arrays have been referred to as intrinsic images.

Figure 9 is an artist’s conception of one possible set of intrinsic images, corresponding to a monochrome image of a simple scene. The images are shown as line drawings, but in fact would contain values at every point. The solid lines represent discontinuities in the scene characteristic; the dashed lines

106 represent discontinuities in its derivative. The distance image gives the line of sight range from the center of projection to each visible point in the scene. The reflectance image gives the albedo (the ratio of total reflected to total incident illumination) at each point. The orientation image consists of vectors representing the direction of the surface normal at every point. The integrated incident illumination from all sources is given by the illumination image.

107 Figure 9: A set of intrinsic images derived from a single monochrome intensity image.

(d)

Source: Barrow and Tenebaum, 1981, p. 582.

108 ; . ; ;

The central problem in recovering the intrinsic characteristics from the image is that the desired information is confounded in the sensory data. The observed light intensity at a single point could result from an infinitude of combinations of illumination, reflectance and orientation. The key to recovery lies in exploiting constraints derived from assumptions about the nature of the scene and the physics of the imaging process.

For example, as surfaces are continuous except at boundaries, we can expect surface characteristics (reflection, orientation and range) to also be continuous. Similarly, incident illumination also varies smoothly over a scene except at shadow boundaries.

Barrow and Tenenbaum (1981, p. 589) propose the following four-step model for using interacting constraints in a relaxation type process for simultaneously recovering the primary intrinsic characteristics from a brightness image:

1) find the brightness discontinuities in the input image 2) determine the physical nature of the discontinuity 3) assign boundary values for intrinsic characteris- tics along the edges, based on the physical interpretation 4) propagate from these boundary values into the interiors of regions, using continuity assumptions

Many different approaches to recover shape from image characteristics have been explored as represented by the following sections.

B. Shape from Shading

Barrow and Tenenbaum (1978) describe a low level method of estimating relative distance and surface orientation from a

109 single image. They use heuristics based on the rate of change of brightness across the image.

Ikeuchi and Horn (1981) have formulated a second order differential equation which Horn calls the "image irradiance equation." This equation relates the orientation of the local surface normal of a visible surface, its surface reflectance characteristics, and the lighting, to the intensity value recorded at the corresponding point in the image.

C. Stereoscopic Approach

Gennery et al., (1981, pp. 6-1 to 6-4) describe various stereoscopic approaches to finding range. They observe that the basic stereo approach uses triangulation between two or more views from different positions to determine distance. However, stereo techniques differ in the way in which matching is done between pictures, particularly in the kind of entities that are matched. The two major approaches are area correlation and matching lines of maximum intensity changes (edge-based stereo).

They report (p. 6-3) that, "Scenes of man-made objects often are not highly textured but contain sharp brightness edges at boundaries of objects and at intersections of planar faces. For such scenes, area correlation does not work very well. Instead, it is usually better to detect features in each image and to match these features."

D. Photometric Stereo

In this approach, the light source illuminating the scene is moved to different known locations, and the orientation of the surfaces deduced from the resulting intensity variations

(Woodham, 1979, 1981).

110 . 1

E Shape from Texture

C 8 1 Brady 1 9 A , p. 88) reports that, "Of the modules which seem to bridge the gap between the Primal Sketch and the Surface

Orientation Map, none has received quite as much attention from

Psychologists as the computation of surface orientation and depth from texture gradients." Various methods for computing texture gradients are possible and from this orientation can be deduced.

F . Shape from Contour

Barrow and Tenenbaum [1980] have suggested a method for interpreting curved line drawings as three-dimensional surfaces.

To interpret a two-dimensional curve, a three-dimensional curve projecting to it is computed that minimizes a combination of variation in curvature and departure from planarity. Other approaches to this problem are given by Draper (1981), Kanade

(1981) and Stevens (1981).

G . Shape and Velocity from Motion

( 1 9 8 1 Brady A , p. 96) provides a review of efforts to recover shape from motion for the case of rigid bodies. He reports that

Ullman (1978) was the first to treat this issue. He considered the problem of establishing a correspondence between the Primal

Sketches in two successive image frames. Ullman also studied the problem of computing the structure of a rigid body from the correspondences of a small number of points in a number of views and found that remarkably few of each are required to compute rigid three-dimensional structure.

( 1 8 1 Brady 9 A , p. 70) defines "optical flow" as the distribution of velocities of apparent movement caused by

1 1 smoothly changing brightness patterns. Horn and Schunck (1981) have proposed a method for computing "optical flow" by differentiating the brightness distribution in successive images with respect to time."

112 . .

APPENDIX F

HIGHER LEVELS OF REPRESENTATION

The basic form for the higher levels of representation is the 3-D model. This is an object-centered representation that describes the object in a convenient way, as in the following examples

A. Volumetric Models

1. Generalized Cones

Agin and Binford (1973) introduced the concept of generalized cones (also called generalized cylinders). A generalized cone is defined by a space curve, called the spine or axis, and a planar cross section normal to the axis. A ’’sweeping rule” describes how the cross section changes along the axis.

Complicated objects can often be represented by a concatenation of generalized cones.

2. Wire Frame Models

Various investigators have represented 3-D objects by means of wire frame models in which the wires correspond to edges or boundaries of cross sections. Stick figure models are a related representation

3. Polyhedral Models

Wesley et al., (1980) report on a geometric modeling system developed at IBM to describe complicated mechanical parts. The object is represented by polyhedral primitives which are combined as required by the operations of union, difference and

intersection. In the IBM system, objects and assemblies are represented in a graph structure that indicates part-whole

113 .

relationships, attachment, constraint, and assembly. Also included are physical properties of objects and positional relationships between objects. The system can determine the appearance of an object for an arbitrary view. This information provides the potential for use by a computer vision recognition system to guide the search for features to match an image to the model

4. Combining ID, 2D, and 3D Primitives

Shapiro et al., (1980) describe objects in terms of the primitives: sticks, plates and blobs. Relations are given on how the parts connect, their size, and spatial relationships.

5. Planes and Ellipsoids

Gennery (1980) produced a method for describing 3-D outdoor scenes. The ground surface was approximated by one or more planes or paraboloids, and objects lying on the ground were approximated by ellipsoids.

6. Sets of Prototype Volumes

Efforts in Computer Aided Design and Computer Aided

Manufacturing (CAD/CAM) often represent objects by combining a small set of prototype volumes such as spheres, blocks and triangular prisms.

B. Symbolic Descriptions

The various parts of an object in a scene may be represented by graphs in which the nodes are the objects and the arcs are the relations (such as above, to the right of, behind, surrounded by, part of, larger than, etc.) and intrinsic attributes (e.g., small, flat, etc.).

Barrow and Tenenbaum (1981, p. 576) observe that, "Symbolic

114 models are appropriate for natural objects (e.g., trees) that are better defined in terms of generic characteristics (e.g., larger, green, leafy) than their precise shape."

C . Procedural Models

Rosenfeld (1981, p. 604) defines a procedural model as any process that generates or recognizes images. An important class of such models are grammatical or syntactic models. Pratt (1978, pp. 574-578) discusses such syntatic processes. He observes that syntatic methods have been proved feasible for simple models, but notes that it is not clear yet whether or not these techniques can be extended to general classes of images.

115 APPENDIX G

HIGHER LEVELS OF INTERPRETATION

Barrow and tenenbaum (1981, pp. 591-593) outline how interpretation might proceed based on intrinsic images. They observe that intrinsic images provide scene information on a point by point basis in a viewer-centered coordinate frame.

Higher levels of interpretation, such as object recognition, require a more global representation in a viewpoint-independent coordinate frame. Surfaces and volumes are obvious candidates for representaions following from intrinsic images.

An interpretation-guided segmentation approach based on structural prototypes is a possible mechanism for deriving 3-D surfaces and volumes from intrinsic images.

Once a scene description has been obtained in terms of surface and volume primitives, geometric models can be used to generate similar primitives, which can then be matched by a search process to obtain object recognition and location. It is often convenient to use graph structures for representing scene parts. As scene descriptions are typically fragmented and include many objects, some of which may be occluded, it is necessary to match parts of the scene graph with parts of object graphs. As such subgraph matching can be combinator ially explosive, much work has been done on algorithms to handle such matching in complex scenes.

Barrow and Tenenbaum suggest that perhaps the best way to defeat the combinatorics of search is to decompose object models hierarchically into components. These components can then be

116 independently matched, and combined and checked for consistancy afterward. Using this approach, the complexity of matching tends to increase additively rather than exponentially.

117 .

APPENDIX H

TRACKING

Gennery et al. (1981, pp. 5-1 to 5-3) survey the real-time tracking problem, observing:

The goal of object tracking is to process sequences of images in real time to describe the motion of one or more objects in a scene. Often real time implies processing every image from a TV camera operating at 30 Hz. In other words, an image is digitized, features are extracted from the image, the object or objects are located in the image, and position and velocity estimates are updated 30 times a second, although in practice slightly slower rates are sometimes used. At the present time, the approaches which achieve real-time operation rely on simplifying assumptions about the nature of the scene, track very few objects in a given scene, and incorporate varying levels of special-purpose hardware designed for the particular tracking algorithm...

Since successive images are only 1/30 second apart in time, the appearance of the object will change very little from image to image. The object can be modelled adaptively as it was last seen by the tracker, with the expectation that a good match between the object model and the features in the current image is available. Furthermore, the location of the object in the image can be predicted very accurately by using the latest available position and velocity estimates coupled with the short elapsed time between images. As a result, the search window need only be large enough to contain the object up to a few pixels uncertainty. This limits the required computation to a manageable level and, more importantly, greatly reduces the probability of a false match occurring...

Real-time implementations typically rely on features which can be computed directly from the image without resorting to actual 3-D measurements of object features

Table IX summarizes the various approaches surveyed by

Gennery et al. It will be observed that a variety of approaches are possible using either area correlation or feature matching.

However, no final optimum system has yet been devised.

118 1^t i 1 1 1 i 11 1i1 . > i i >

o ip 1 IP., p o 3 p p P • • p a-

changes P 3 X •iH i— p p 4H P o p O 3 ip O £ C CJ P . CD P • IS X o rH O y 60 *H rH 3 o C p P background P tH •H p S •H XI X 3 IS P P X P o P p 60 •H o 60 o P • E P £c •H P rH CO 3 to X p rH o U 60 P CO tH p P P IS p y P P p p p P 4J CD •rt p P Immune X 3 3 3 »4H o P CT y 3 •H P O P y P 5= Pi P y

X pCO 3 rH y CD P P P p P P •H p current P 3 o •H p •H P *9 •H P 3 3 P g P • CD P p o P E i •H P P P »W P. o X P P P CO o P O O CO Uh p 60 rH tH • in O p P 60 p CU Pu p P P CD CD 3 p 3 s 3 e rH P •H U P P O X P P M P P 3 o CD *H O 60 X P 3 s P 3 P P. 3 3 •H 2 P P p O P •H CO p P fi P 3 O o CO window • tp n P 60 P P •H S P y CP p 4.1 X P P 3 &. P P p CD o O X o E •H P P CD g p P •H • P p X X P P 3 p >* P P tH a •r-> p p o P U O CO O o P 3 P P P X p > p 2 P p •H

CO CO p P y P p E P D. P p P X P o P o P > X CD P P p P 3 60 > rH P 4J *3 •H O • 3 P P a p P P 3 P P P p P X H P P O tH > C4M P O P P »— P rH p •H •H X P o p P rH P 3 r— P •H g •H P p P 3 3 P P p P p < CD y correlation co E P P o o P B P 3 *H •H P P p p P P • p E p p P P CD P P 1— 00 P P P P X P P P CD P • P X P -*-) o CO p u y P P P P 3 p P o B P •H p o O CD •H > P O 60 *H • •H iH o P p •H i— O P •H P P p P 60 P p CO iH 3 y M P P E P P P P P p P S •H P P P C0 P P •H X •H > P •H o P P > P > P •H P *H p B level •H 13 co P y P P O 3 P P P P P CO P co •H • (— rH P P P P •H p P CO »4H P P P o p X CD CD CO •H > P o p P o P O p a CO X P X 3 •H i— P P co co X p p y P E y P P 3 P 3 3 O p p 3 p •*-> P O P o o P E P X P Gray p P CO 2 P p CD rH X p rH O •H P •H CD O co •H P> E P w o X PQ P IP X H 60 a 3 P

p CD a • r— P

for P p . p O o P p 3 rH iH P p p P • 3 P P 4J CO P o •r-> P P •rH P •H P P P P p X O O •H 60 3 O X 3 o P o p O P X P 60 X P X! ip P p p o CD 3 •H 3 > CO •r-j p 60 X p P rH X tracking a. P p X tH p. P u y P y o P P 44 o O p o o •H p iH P O p o P P O p o p p > x P •<-> $ p T— X > p 60 • CO I— ca 2 3 •H X p ! o i— I— P 3 •H p *— E P P O CD 3 X X o 3 P > p 3 P P p P P-i p P- a. x p P* X cp CD X U y co 3 CO p *H co O p p y 3 y o p Object o P p P p P Cu p P 3 P rH •<— <+4 ip T i P o P p P 3 p O C P X X P o 2 o B 60 P CO o H cp •H H = o p p a. i— o P rH S • QJ P > i— P p P CH r-s O CH CT. 00 P < r"» P r-v P ON O ON O'- Ch CO r s <-<3 aj to r->. I— a to o P f'- S a. r v'— r '-H C CN ^ 00 o E i— ' U-l i— X 1 c ON N rH CJ C o C w O rH P W P e X U •H u ^ r4 o a) h a CM X 11Q Pi Pm I 1 ^li ! li I I Il1 1tl 1iv 1 1 i I 1

0 ‘ 0 0 t3 3 1 0 0 • ex 0 d d i— 0 d 0 0 0 0 0 0 0 o 40 O « ' 4-1 d3 •x) i— 0 0 O •rH 0 O 0 o 0 0 4-> 0 o 0 E 0 o 03 03 d 0 o 0 0 4-1 O •H E •rH cx O >. 40 E 4-1 n3 0 42 1 e*. 0 40 0 0 0 0 i— <0 o 0 0 •H d T3 0 •H o • 0 d ex 0 0 o 0 0 40 0 0 0 0 0 0 d 0 0 X O 0 0 0 *H o o 0 0 •rl o 0 X 0 4-1 d T3 o -0 0 0 0 0 E 0 0 O CJ i— 00 r-H 0 0 0 0 0 0 ex •rl 0 •H 0 o ex •H o E 0 > rH 0 0 0 0 O 0 0 0 40 v-) 0 H ex -h o < i— 0 £ 0 o Pd 0

X3 0 0 • rH 0 0 0 rH 00 0 O 0 0 43 0 E cn 0 0 O 0 o o 0 o 0 X3 o 0 •H 0 • 0 d "O 1— 00 d3 o 40 X3 O 0 00 o 0 £ 0 • • 0 0 0 0 o 0 E 4-4 03 0 0 O 03 0 0 •H 42 d d d d 0 0 cx 4J 03 0 0 & & 0 0 0 0 rH 0 0 0 0 0 o O r— >> 0 d rH 0 O •rH 0 a rH 4-1 1— o rH E 0 0 0 — ! d 0 >4 o £ 03 i 1 O 0 — d d d 0 0 0 •H 0 O d 0 0 00 0 4-1 00 0 40 4H 0 0 00 0 40 E 0 •H 0 "0 0 0 0 0 •H d 0 0 'd 0 0 0 cn 0 0 0 •H O 0 0 •H o 0 0 0 •rl O 0 0 03 O 4-4 d r^ 0 00 N d d 0 0 O 0 •H O O r-H i— 0 •H g 0 0 0 4-1 d 0 0 d 0 o 0 0 0 d r-! d3 0 •H 0 tH 0 0 0 X 1— O O 4H Approaches •rH 0 O 0 0 0 o d •rH 0 40 •H O 0 o • > 0 0 00 0 E 0 ex 42 • 0 0 0 4-J 0 CX •rH 0 CJ 0 0 E 0 0 d 0 ex •H 00 e 0 i—1 0 0 0 d >> 0 rH 0 0 0 d 00 4H o 1 rH 42 *n 0 0 0 4H o 0 00 o 0 T3 0 rH o 0 4-1 d 40 X o 00 •H 0 X 0 0 0 d •H 0 40 d 0 C 0 E O 0 0 o 0 •H d X) 40 0 u 0 0 00 0 E ex 0 0 cx 0 0 O to O 0 d 0 0 *H 0 d 0 0 o 0 d © 00 O 0 0 d o 0 0 0 0 40 40 0 *rl i— 0 >4 d 00 d d 0 0 0 0 03 00 0 0 0 40 0 0 4-4 E E cr 4-1 •rl •rH 5-7). Tracking CX O o •H 0 oo 0 0 j 0 cx 0 •0 0 O 0 4-1 4-1 E 0 o 0 0 0 0 o 0 < •H 0 03 0 ex o O E 0 00 d 0 0 0 •rH 0 0 0 0 •H 0 0 0 0 0 0 d 0 0 0 0 > 0 d 0 03 0 o 0 d d3 rH 0 d 0 d 0 0 0 o •H O 4H 00 0 0 *H -0 rH 0 d 0 to 0 0 00 0 0 -H O 4-4 0 d E 0 d 0 0 rH 0 0 o 5 03 0 0 0 e o 0 O d > 0 4H Visual 5-1 4-1 0 i— 03 •rH "rH 0 0 ex •H 0 0- 0 40 0 •H d 4-1 £ 0 0 0 0 0 0 0 •rH 0 0 0 0 O 0 0 • • • • 0 o -0 •h ex 0 03 rH CM CO

> i— 0 0 03 i— O 0 0 0 •H i— (1981, o 0 4-1 •H 0 •rH d 0 0 0 0 4-4 0 •rH •rH O E 4-4 E •— a) o al. 0 0 0 00 4-J 4-4 00 o O 0 0 0 O 0 •H •rl cx •H 0 0 B et 0 0 . -X T3 00 +J ,X 0 •H O 0 0 •rH a 4H o 0 0 0 0 •H d d CX o 0 0 0 0 0 o. 0 0 E U 0 0 0 0 •H ctJ 0 Gennery 0 0 4-1 H U •rH 0 4-» 0 0 0 a 1 d u 0 0 03 0 rH 0 0 0 •0 d d3 •r~> 0 0 0 -C3 0 C 0 42 10 OO o PC ctf d O o from

0 0 CX o 1— i— Derived » I— 0 d 0 rH > 0 4-* d 0 0 0 0 , N 0 0 N 0 1— 0 C7> CO 0 o 0 o 0 03 > rj r~~. £ oc 0 CO O' X3 rH a 0 G' •H C3~> X ! \ oc i tj £ r— 43 — 0 u Source: V S 0 0 0 CO l— 0 >4 cn •rH •rH d CO H ' Cd C3 120 C/3 .

Real-time tracking is important for manipulation; for guidance for applications such as recovering satellites or free- flying payloads; for grasping moving objects (such as parts in an industrial environment); for assembling objects such as machinery or electrical appliances; and for building space structures. It is also important for target acquisition and tracking or for locking onto a feature in situations such as planetary flybys and astronomical or earth observations. And, as to be expected, it is also applicable for vehicle and missile guidance

121 APPENDIX I

Additional Tables of

Model-Based Vision Systems

122 CO

S~ fO E QJ CC

Systems

a

Vision III-

Table c Model-Based o

*-> ro +->c QJ CO QJ i~ CL OJ cc ©O ccn

QJ •oO 2:

x-rays

Rochester

Images

docks

chest of at about in

Univ, and ships ribs

Queries

(1978), Brown sz Locating Locating o ofO u Cl Answering Ballard, Feldman

Domains:

Developer:

Purpose: Sample —— “

to

s- to E cd or

to E CD -4-> to >> co C c o E 4-> CD IO +> o to ]<- >"> ai 4- CO o •I— 5 CD 5- c to c/> OJ o e to O CD •r- I +-> r— IO CD +J D c O CD CO CD L. CL CD CC ©O cn c aCD o

c o

to M -M S- CO to Q..*: to C- r— — o O CO to 5 L- to o 4-> I •t- >> C C j— o cn to -C o c E •i O to CO o to •r- co «o (_> > 4— 4- -r- 3 O -t-J •a c fO c to C E to to s; O O r- •r- +-> cn C CD 4-> 3 O > IO IO o U to +-> +-> o c o CO a -i- —1 to t- a> CD +-> CL CL CD CL CO CL

CD »—i a: cCO •r— to E S- o cd o D- CD O co CD o "ai CL > s- cd 3 to o CL to in

t- «oE

Systems

Vision

ARGOS

-Based o •f— +J Model to 4c->

0) "Oo

scene

city a in

Images

buildings in

Objects

Labelling

Identify

.

Domains:

Purpose: Sample in

Systems re

OJ or

Vision

Model-Based

c o •r* +-> re +-> c QJ in

OJ •oo

Image

an

*> in c a> cE o Objects s- > c OL)

a> Known o -C u re 4— o o s_ CL Q. Locate c

cin r- Ere Qo

aj

Purpose: 'cl E re oo —

to

s_ *3 E cu CD

to E CU 4-> to >1 OO CL)

I

to >- CU to X3 S CU ro C/> fO CD

I c o 'aj XI +-> o (O 27 -*->c cu to a> i- CL CU CD ©8 CD C

CU X>o

cu

to >1 to

£Z CO O CD i •M •O CU 4-> cu C c CU M cu E « u a> CD cn to E > CU o •>- to CO O S- LO C- x> xj r- tz cu cn •r— to c 3 +-> •r- X3 CD u CU I cu £ to C «3 O 3 o JD -O cu C o •r-5 O +-> CL) JO to CO C c O I— 0) 2 (C cu o s_ CL o cu s_ J= c cu u (O Cl t/1 «0 >i >i c o 2 I— 00 •r- S- o o. s- c CL s- cu fO x> CO

s_ o CL) o CL cu o to cu o s_ E CL) 3 (C o Cl OO to E a> 4-» If) >) oo c o

a> •o j=> a) 1C to «c cn i , aj TJ O

ca> ua> to

I=> O£ ns

to +-> cn o QJ •<->X

s- (U Cl a) O to o OJ CL > oa) '

large for reliable shape numan - primarily is system not color

multi for identification from of not objects or shape aerial Remarks on -interpretation — to use

important -crafted only general suitable to shadows Interpretation areas scale which E Segmentation -weak CL) photographs -3D 4-J dependent spectral 00 tailored general >7 Shadow Well 1/7 not C c cn •r* +-> to c | •r- QJ > O to t—4 ro >7 CO QJ +-> u r~) TD X S- M- •r— i— to QJ 3 -o ro QJ O 4-3 ro QJ CL) to O ro to > s- -o •r- E r— ro JO e •>- to QJ E JO 4-7 T- jd CO ro to 3 S- o Cl ro 4-J ro E 4-> * •f— •1“ *0 fO 1 e o o E QJ 3 1— r— e O JO W 4-7 E to 3 o TD O QJ a; e o •1— -o ro CL XJ 07 •i— S* Q_ td o •r- 07 e ro i~ O OJ o to ro o •r— +j QJ o o i— O to JO QJ QJ JO 2 +j ro X E X r— ro 3 o 4-> E to 10 ro E E r— E -X E 3 1 4-> X OJ O QJ ro +J aj 4-> O a) O E E o X (_> -O E U E 4-> .e ro E -O EE Q. OJ o r— *r- ro >7 O to E E QJ ro O •r- to -C JD c •r- ro (_> 4-J *r- •r— 07 ro to qj 07 to o CJ E O 07 07 QJ +-> C CO TD QJ ro *i— to OJ OJ OJ ©3 X jo o *i— ro QJ E OJ 074-7 E 3 E E o 4-> QJ CL) •r— E QJ E •r- cr +-> •r- # r-5 <4- 4-> CJ> 07 s- ro e QJ QJ JO 07 c e 3 jo -o ro •r— •r— *r- > 07 E +->•!- +-> •>- ro TO O E E •r- JD E 1 JD -t-J 3 ro ro ro ro O O Q) QJ 07 4-J lO •r- ^ to JO o JO 07 X 4-» 4-J E TD TD E O QJ to 4J O 07 TD CO E QJ QJ JO •r— *r- O OJ to ro E TD r- QJ • ro O TD 07-0 4-J 07 to to E E 3 QJ ro ro to QJ JO O i— E QJ QJ <0 *r— QJ OJ 4-> *r- O E 4-7 -E =7 JO 1/7 +j lu ro > x 3 no E CC to “O no ro to to Jz CL ro DO O j: 4-> u o to to -E 3 Q) to CL o E i— ro 3 e O OJ 4-7 r— E •p— •r— .X to 4- ro ro •l— 4-7 ro o •r- 7 0) E O •r- 4-> ro 4-7 OJ *4-7 4-3 IO ro a -o OJ JD O JD E ro JO r— -o •r- (— O > 3 c c QJ o to r— *r— ro o •r- r“ ro JO 4-J 4- 07 4-J 4- -p X E ro JO ro to • • O O E • QJ U o to S- x 07 QJ to to QJ t— o to E ro 4-7 >> +j +-> 3 c JO x e C 07 O E o u u JO •p- o JO o o JO Q) o ro E QJ aj Cl 1/7 4-J .x 07 07 E Q- > 07 E 4-7 E .»— <4- JD 07 o 3 OJ QJ QJ O E >D i— QJ O to o o o «S o > x S- •1— ro 4-J o O E 4- >> 7 C to QJ TD E o r— c x •r- o CL 3 to c E jo o QJ E QJ QJ * 4-7 •<— ro aj to 07 •r- ro o ro •r— •r— i— to -o > >, c 07 -O OJ JJ QJ E 07 4-7 O QJ O Qj to CO 07 CL) x •r- QJ to ro 4-> S- ro E O OJ E 4-> JD JO <— to r-'- ro to 4-7 JO > X QJ CJ to E 3 •r- E OJ ro o i— QJ •r* 4-> cn QJ e u x -o ro i to 07 CL to i— QJ O I— X E 3 ro OJ O 4J c x to to 3 E X OJ o o CL) E 0) JD o rO »r- o o to 4-J X ro 4-J 3 E O O QJ E 4-> E •r- 3 E ro E o X QJ E QJ X oo-o •r- 4-7 CL e ro r— CL o «— e Cl x •r- CL to CD OJ *r- ro 07 QJ CJ 4-> JO <- ro 3 4- 4-> 10 OJ CL) 1 * •!— cr> JO JJ «£ CO S- ro QJ QJ to E E to O 4- O i— i— JD 07 fT3 <13 ro QJ QJ CL x -o 07 S_ o ro •r— to O 3 QJ E _J -t-> 07 07 07 E o -o E *i— E JO o >7 E E JD 4-J QJ • • td ro i— o ro E -O E O 074-> O QJ E Q) 4-J i— ro - E to OJ E to +J JOl o qj ro •i- QJ E ro cl ro 4-J E ro r— E e I- E to JO 4-> 4-> E O QJ to E co O QJ ro TD •r— O 4-> •1— f“ ro 3 ro O E > a TD tu O E ro -o +-> <_> JO ro QJ 07 4J o> lO r— ro o E E 3 segmentation • • E C QJ s_ 07 E TD CL) QJ JO ni ro to -O E o JD E X O 4-> OJ CL 07 +-> E o ro 074-> 07 >7 E 3 qj 4-> ro o 4-J O) • * o to E to C CJ re i— -e qj ro •!— l— 07 TD to -»-> *r- •»- •r- Cl QJ X 07 •r- QJ I— QJ 10 > 3 -E ro o QJ to TD »— 07 QJ O to •r* QJ E (O CL B 1 1 1 1 1 E E QJ O >7 c aj X E

h Vision

III-

Table

Model-Based

Q)U CL Q) CC

i. j: re o *5 * 1

Images

color

In (/> c— CB) >0> •t- 0) TD f—

regions

o U. si u 4-> of •p- 5 E Kyoto o u) u 0> M- c scenes O) “O labelling U OJ X 0) C i- .C (1980), <0 > o •p «3 outdoor L. (/) o S «0 t- Q. Semantic CL

Ohta

B oO

Developer:

Purpose:

130 D

to to JkC E s- CD to 4-1 E 10 CD cc tO C o

•a I cd t—i to t-H (O

' cc I

4-J

CD to CD s- c. CD CC •oc to cer.

CD •O E O t- 4- • CD to CD 4-> rc o E ai •!- J3 C O CO c c O •»— E E o u CD> o CD

>1 to ^ gO >*4-> O a> o +-> i- (D £ 03 cr-f-j - O ^ JO

t-3 O o to CD O. r •r- -a O • c 4-> 4-> ' —~ to •i- I 00 CO -V c c to CJ cr> o a» ct $- o r: CD •«- a cn c si cd i- CD to cr — «n

E(O O CD .. o a. cd CD o to o T CL 'cl s- E 1 to CD 3 C D_ to to -X S- EfO to OJ E or CL) 4-> to >> oo c o

'If 1 ' -o

—i oj

•—I to to CL) CD X> r— to OJ J— -o O c o •r~ +J IO +J c OJ to OJ L. CL OJ CL eS cn

to OJ a; *o •i— o CD OJ +-> (O Mi- 00 O 4-> to E co OJ >, 4-> i— to (O >> c 00

c OJ o s- 3 3 to +J nJ C •I- O > •*- to 3 Cl. OJ S- c QJ +-> OJ -t-> C o 3 CL) oo CL i_ E ,0J J- O 4- o C_> 4- o o o to o s_ o OJ s »o 3 c i— x: O 0) 3 4-> o 3 *r- C 00 CO O 3 to I"". s: X) OJ cn M s- u to c 3 *r- o OJ X) 4- to CL E 3 4— o cu O -t- OO O s- c I— s- CL oj a) CL > > CL

to

s- o O) o Cl OJ O to OJ o 'oj CL CL > s_ OJ 3 to o CL oo Systems

Project k Vision

III-

Mosaic

Table

Model-Based ^-D

complex

a

of

CMU

model

3~D (1982),

a

Kuroe

acquire

images.

and

from

Kanade

incrementally scene

Herman, urban To

s

Developers

Purpose CO E OJ +-> CO >> co

O co •r“ CO $- TO E OJ TD cc

I QJ *—i (O

I—( TO t—i CD

I (LI i— i— OJ Xi *C to o

c o •I— +J MTO c QJ CO a; s- c. OJ cc -o c TO CO c

OJ QJ *c CO o ro s: E

TO

i- OJ CO TO 01 CO c TO TO E *— > c Or-'^ CO TO CO O 5- U “ OJ QJ —' «=£ T-) O J3 CO *4-- O OO O r— C ^ c 3 o o * OJ 1- c (J 4-> •I- D- 1- T- S- o_ S- o a 4- T3 CO c OJ x: to Cl cj

CO o TO XL TO ‘r— OJ U S- +J CO TO (L) C O CTiTO i- 3 E O- TO OJ O- Lu CO CC cCO

s- o OJ •• c. Cl OJ O CO QJ i— O r—~ aja a > J- E QJ 3 TO CD CL. CO APPENDIX J

Tables of

Commercially Available Vision Systems

135

n 1 1

uoxq.i2xndTUT2W x

U0Tq.TUS003H x uoTqxadsup X

UOT^-BOTJTOiaA x

h0 e • G - 0 0

1 ft G -p 1 0 0 0 0 hO 0 o ©EG ** G * G H X G co hO G 0 G ft Q •H • C G o ft > o w ft X 3 0 0 G G O 0 G • G G ft ft 0 O G W 0 0 G G G 0 G O ft\X O ft •H -p O X -P 0 O £ rH X G 0 O G O ft G O ft ft ft 1 w X ft G 0 ft X £ X a g G G £ •H Q G 0 C X G G G G ft G ft O • O (D G *H x G X E 0 G O 0 hDft G ho •h ft ft ft G G X • CO ftO G ft O G •H 0 *H *H ft o 0 O O E G ft G G X ft rH 0 X G= ft 0 X 00 • ft O ft G ft -p \ • O •<-) Oft • ft 0 O ft £ G G G 5 xT G e CO 0 0 XI ft G ^~n O ft X- G O O G 0) 0 g rH > O X G G 0 G N 0 0 O G ft -p 3 £ -p X G £ G O •O O X O 0 rH w £ 0 0 O 0 ft 0 o ft ft ft5 C X X o ft 0 G o ft O G 'ft G X ft o C\] 0 hO G ft ft CO X u ft S 3 0 = O X CX ft x cti *H a?. X S ft ft O X G 0 G G G M O i Pc ^H O G G ft X X G ft G pack. O ft 0 O 0 G G ft U 6 W M own 0 -p •H ft ft G G X 0 O © 0 » rH G O 0 G hO G G X < ft £ G X M ft “i H G l hO 0 0 ft Eh O O X O G G > £ > G 0 0 G £ «H ft O G 0 ft ft rH O G ft £ ft 0 s ft | 0 i W X G •H 0 o • H X G hO hO o ft G X G rH G « e G G G G 0 G G • O G O ft > X X cd co X G ft G £ £ O G G ft £ < G 0 ft ft X 0 ft G ft hO G O G ft Eh -P G 0 ft G > G O G 0 £ O G >) ^ Cti ftX G G G G ho O ft £ O 0 rH ft o £ O G 0 O O X G CO 1 =8 E rH •H O hO O CD i— ft O ft O G ft 0 -p O ft ft ft hO ft i rH 0 1 •H G X — PQ 1 1 O 05 •H 0 ft —1 — © ft •H • o rH fi 65 X G 3 © ft • G 2 ft G ft ft ft 0 O O ft 0 Ss X 0 £ o 0 ft CO o G ft X £ rH 0 ft o £ CO G ft ft O O G o O 0 ft G ft O 0 ft Tj 0 O X £ G X G X G G ft o G 0X0 G a3 ctf G £ 0 G ft * ft O 0 o o 0 G - hO O G 0 G ft ft 0 0 ft 0 •H qO CO 0 O 0 •H ft G ft ft £ rH G £ rH ft 0 G O X G 0 G rH 0 G X ft G G cx G CO Q O 0 ft ft ft © ft © ft CM •H © £ £ 0 G £ 0 G ft £ , X \ p -rH x G X O X G 0 X G O 1 G G O Eh o ft g e CO O ft CO o QXr. o w G ON G O O • O >x £ •H o ON G •H G- 0 0 C\J m- O 0 -ee- ft ft C\2 ft X G •H • 0 > G ctf G 0 > 0 X O G G ft e x co rH ft o X fl) £ G O 0 O O G G ft £ ft ft

>5 hO C x 5 (0 rH £ U oo OpqSo 139

uoi}e [nd lubw

uoj4Lu6oDay

UOL^DOdSUI

UOJ4 BDLJ. LJd/[

00 - 1— re r— •P 03 J3) S- re re • o -C 1 c M •p o • QJ 0) CD S- CD -P -O o o cr> « q> o tO *1— re <—• S E E Eg (d +-> •r* O "H CD c S- • a ij c •f— CD CD O Cl. O o 5 r- O —' « r- CL O -O re- w re cl re X CD a) 1 CD c o S- tj£)T3 U C -r- o (D T) C re re- re IT) _Son -p re -p >> S_ « oo re S- cd c/o oo re M-

Systems TO 03 • CD <_> CD hO -P S3 • £3 O E • re •l- £3 -r- •PEG S- o re CD CO •!- £3 -PS- w CL 3 CD re 3 CD n) O CJ CD CL o S- 4- re CJ E o 3 cj >,*i- to o 00 (0 i— X3 J3 o TO •<“ E r— 3 +-> s- w CD re cj c CD CD 4-> CJ 0 p re- i— Industrial cj -o -P CO re r— O flj E ST © o E — • 1 «H p c £3 00 • 3 -r~ r~'- CO O CD •c S- re O *i- +-> £3 CD = O i— co re O -P c I— +-> re t- i— a 3 £3 O VI C > 3 CD C. re o co OWE C -P Available i— O S * C o re i— u O O CD CD Table o •i- re g o i o E •r— 4-> i- o -P O 5 3 o +-> CD to re >> E un -P QJ QJ CO •!- +-> c o Pi— 00 i— re S- •<- >1 4- Commercially QJ JH CO » S- CD CD > re CD 4-> CL £3 Q. CD -P o O re -£= etc £3 Sys‘ can CJ C_3 U -P •( to

1 JX! x ta w o LO o s_ s- -p o •p •p r— CM QJ O C Cx i 3 c S- faO •to- -P J ro E E 0 -€fc- E CD CD o 03 *r— F O E -P to • • U £3 o QJ • CJ >> CD £3 CO O 00 X X O o S- F X r— S- O >1 CD QJ o O -S3 £ o 3 QJ o 03 qj -a 3 -i- CO T3 a 4-> S- S- -P CM CO -P S- -P QJ CO 4-> • 1- O •r— Cl CL-r- T3J 1 fD CO CL S- CD -P re re CL > s- c CL as c s: QJ >> CL o r— 03 CD +-> > CL 33 s: oo c 4- CL E E oo oc

x: •p C>> re & Q. E I oO -p a 142

- J ' -

uoL^BLndiuBW

uoi4lu6od3h X

UO L)DddSU X

UO L^BDLJ. LUd/\ X

to CO re a CU 3 X X aj fC L- E a; a) o CO s- E to O to 3 3 3 3 to to fC re i CU X X •f" >5 * X 3 cu 3 3 — to O 3 c O re co >> CJ re o CD 3 to 2 r- >, O 2d • X re r- 3 . X 3 1— 3 re • >> re cu 'tj' r— X M 3 3 •i- re • re +-> 0) 4- 3 X 5" X 3 CT a) -i- re U <1) +-> •«- > E CU X x re 3 X 3 tO to D. X 3 to rc ' E I X o re re cu , to +-> O X -1- >> 1 C CD 3 3 X to to X re 3 re CD ^1— X M- E •r- CJ X CD->- 3 C O r- X X 3 X o re cu •r- o x to cu to co CU r— > o X 3 >, CU 3 ^ O • CO I— 3 X x CU cu fC X c a> re X o •I- X) c +-> • X O 3 3 cu re x 3 to re o re x 3 o -D 1 GJ i— CU 3 O to 3 c r- to O. E •r- rc •— cu J • o i— E 3 + X •—i o 1 3 3 CJ X >, Cl. ai — 1 CU O X r— O E re •<-> •r— X t-H x o • x cu 3 r- 03 > 3 CO O. 3 o re r— t_) X 3 (— X 3 CU LU L (Or- X fC r- C X (U i— to > X) cu X X •>- •i- X C re re > x ai s 3 1— •f— rex — E >> to S- c. re CU 3 r— cu cu 3 3 X X n— to *o cu cn re to fC 3 N CD O >> 3 •r— 3 •I- X to O u E re c to X 3 Q) E CD >~> 3 o cu x re o i— re r- -u E CO X o x E CO CU E >i re cu re 3 fC to o to o 3 X X CO 3

CM ° E o CL 3 CO re X O >> o et X to

• w ta J} i S 3>> lO X G CL © o E C -P O & s u oo mo 144

APPENDIX K

GLOSSARY

146 • . -

APPENDIX K

GLOSSARY *

. Artificial Intelligence (Al) Approach! An approach that has its emphasis on symbolic processes for representing and manipulating knowledge in a problem solving mode.

• Bar Operators: masks to detect second derivatives of image brightness in particular directions.

• Binary Image: A black and white image represented as seros and ones, in which objects appear as silhouettes.

• Blackboard Approach: A problem solving approach whereby the various system elements communicate with each other via a common working data storage called the blackboard.

• Blob: A connected region in a binary inage.

• Blocks World: Scenes consisting of three dimensional polyhedral object configurations. A simple artificial world used to explore computer vision concepts.

• Bottom Up (Data Driven): Refers to the sequential processing by a vision system, beginning with the input image and terminating in an interprets! ion

• CAD/CAM: Computer-aided design / computer-aided manufacture

• Chain Code: A boundary representation which starts with an initial point and stores a chain of directions to successive points.

• Computer Vision (Computational or Machine Vision): Perception by a com- puter, based on visual sensory input, in which a concise description is developed of a scene depicted in an image. It is a knowledge based, expectation-guided process that uses models to interpret sensory data. Used somewhat synonymously with image understanding and scene analysis.

•. Concurve: A boundary representation consisting of a chain of straight lines and arcs.

• Convolve: Superimposing a mxn operator over a mxn pixel area (window) in the image, multiplying corresponding points together and summing the result.

• Corner: An abrupt change in direction of a curve.

• Correlation: A correspondence between attributes in an image and a reference image.

* As yet no standard definitions exist, so that the definitions listed here can he considered to be somewhat imprecise.

147 • Description i A symbolic representation of the relevant information, e.g., a list of statistical features of a region*

• Digitized Image: A representation of an image as an array of bright- ness values.

• Domain: The sphere of concern* The task world. A set of allowable inputs.

• Edge: A change in pixel values (exceeding some threshold) between two regions of relatively uniform values. Edges correspond to changes in brightness which can correspond to a discontinuity in surface orientation, surface reflectance or illumination.

• Edge Operators: Templates for finding edges in images.

•t Edge-Based Stereo: A stereographic technique based on matching edges in two or more views of the same scene taken from different positions.

• Features: Simple image data attributes such as pixel amplitudes, edge point locations and textural descriptors, or somewhat more elaborate image patterns such as boundaries and regions.

• Feature Vector: A set of features of an object (such as area, number of holes, etc.) that can be used for its identification.

• Feature Extraction: Determining image features by applying feature detectors.

• Gaussian Filtering: A convolution proceedure in which the weighting of pixels in the template fall off with distance according to a Gaussian distribution.

• General Purpose Vision System: A vision system that is universally applicable. A system that is based on generic rather than specific knowledge (cf. Bevat la, 1982, p 188). A system that can deal with unfamiliar or unexpected input.

• Generalized Cone (Generalized Cylinder): A volumetric model defined by a space curve, called the spine or axis, and a planar cross section normal to the axis. A "sweeping rule" describes how the cross section changes along the axis.

. Generalized Bibbon (See Skeleton Representation) : A pl anar region approx-

imated by a medial line (axis) and the perpendicular distances to . the boundary. The 2-D version of a generalized cone.

• Global Method: A method based on non-local aspects, e.g., region split- ting by thresholding based on an image histogram.

• Goal Driven: Top-down approach.

148 • Gradient Spaces A coordinate system (p,q) in which p and q are the rates of change in depth (gray value) of the surface of an object in the scene along the x and y directions (the coordinates in the image plane). Thus (p,q, l) has the direction of the surface normal.

. Gradient Vectors The orientation and magnitude of the rate of change in intensity at a point in the image.

• Graph (Also Relational Graph) s An image representation in which nodes represent regions and arcs between nodes represent properties of and relations between these regions.

• Gray Levels A quantized measurement of image irradlance (brightness) , or other pixel property.

. Heterarchical Approach s An image interpretation control structure in which no processing stage is in sole command, but in which each stage can control other stages to its needs as required.

M N 4 Heuristics s Rules of thumb, knowledge or other techniques used to help guide a problem solution.

• Hierarchical Approach s An approach to vision based on a series of ordered processing levels in which the degree of abstraction in- creases as we proceed from the image level to the interpretation level.

4 Higher Levels * The interpretative processing stages such as those in- volving object recognition and scene description, as opposed to the lower levels corresponding to the image and descriptive stages.

• Histograms Frequency counts of the occurrence of each intensity (gray level) in an image.

• Hough Transforms A global parallel method for finding straight or curved lines, in which all points on a particular curve map into a single location in the transform space.

• Hueckel Operators A method for finding edges in an image by fitting an intensity surface to the neighborhood of each pixel and selecting surface gradients above a chosen threshold value.

• Iconics Image-like.

• Images A projection of a scene into a plane. Usually represented as an array of brightness values.

• Image Processing: Transformat ion of an input image into an output image with more desirable properties, such as increased sharpness, less noise, and reduced geometric distortion. Signal processing is a 1 -D analog.

149 .

• Image Understanding (lU)s Employs geometric modeling and the AI techniques of knowledge representation and cognitive processing

> 1 to develop scene interpretations from image data. IU has dealt extensively with JD objects. IU usually operates not on an image but on a symbolic representation of it. IU is somewhat synonymous with computer vision and scene analysis.

• Irradiancej The brightness of a poiirt in the. scene..

• Isomorphic Representation : A representation in which there is a one to one correspondence between the scene and its representation, (e.g., an image or a map).

• Interpretations Establishing a correspondence between the scene and a set of models. Assigning names to objects in a scene.

• Interpretation-Guided Segmentations Using models to help guide image segmentation, by the process of extending partial matches.

• Intrinsic Characteristics t Properties inherent to the object, such as surface reflectance, orientation, incident illumination and range.

• Intrinsic Images t A set of arrays in registration with the image array. Each array corresponds to a particular intrinsic characteristic

• Laplacian Operators The sum of the second derivatives of the image intensity in the x and y directions Is called the laplacian. The laplacian operator is used to find edge elements by finding points where the laplacian is zero.

• Lines A thin connected set of points contrasting with neighbors on both sides. Line representations are extracted from edges.

• Line Detectors: Oriented operators for finding lines in an image.

• Line Followers: Techniques for extending lines currently being tracked.

• Low Level Features: Pixel-based features such as texture, regions, edges, lines, corners, etc.

• Model-based Vision Systems A system that utilizes a priori models to derive a desired description of the original scene from an image.

, Modules A processing unit in a vision system.

• Monocular: Pertaining to an image taken from a single viewpoint.

150 • Optical Flow* The distribution of velocities of apparent movement in an image caused by smoothly changing brightness patterns.

• Pattern Recognition* A technique that classifies images into pre- determined categories, usually using statistical methods.

» Perception » An active process in which hypotheses are formed about the nature of the environment, or sensory information is sought to confirm or refute hypotheses.

• Fhotometr ic Stereo* An approach in which the light source illum- inating the scene is moved to different known locations, and the orientation of the surfaces deduced from the resulting intensity variations.

• Pixel (Picture Element)* The individual elements in a digitized image array.

e Primal Sketch* A primitive description of the intensity changes in an image. It can be represented by a set of short line segments separating regions of different brightnesses.

• Pyramid* A hierarchical data structure that represents an image at several levels of resolution simultaneously.

• Quadtree* A representation obtained by recursively splitting an image into quadrants, until all pixels in a quadrant are uniform with respect to some feature (such as gray level).

• Recognition* A match between a description derived from an image and a description obtained from a stored model*

• Reflectance (Albedo)* The ratio of total reflected to total incident illumination at each point.

• Region* A set of connected pixels that show a common property such as average gray level, color or texture, in an image.

• Region Growing* Process of initially partitioning an image into elementary regions with a common property (such as gray level) and then successively merging adjacent regions having sufficiently small differences in the selected property, until only regions with large differences between them remain.

• Registration* Processing images to correct geometrical and intensity distortions, relative translational and rotational shifts, and magnification differences between one image and another or between an image and a reference map. When registered, there is a one to one correspondence between a set of points in the image and in the reference.

• Relaxation Approach* An iterative problem solving approach in which initial conditions are propagated utilizing constraints until all goal conditions are adequately satisified.

151 Relational Graphs See "graph

• Representations A symbolic description or model of objects in the image or scene domain*

• Run-length Encodings A data compression technique in which an image is raster-scanned and only the lengths of runs of consecutive pixels with the same property are stored*

• Scenes The 3-D environment from which the image is generated.

• Scene Analysis s The process of seeking information about a 3-V scene from information derived from a 2-D image. It usually involves the transformation of simple features into abstract descriptions*

• Segmentations The process of breaking up an image into regions (each with uniform attributes) usually corresponding to surfaces of objects or entities in the scene.

• Semantic Interpretations Producing an application-dependent scene description from a feature set (representation) derived from the image*

• Semantic Networks A representation of objects and relationships between objects as a graph structure of nodes and labeled arcs. See "graph."

• Skeleton Representation (See "Generalized Ribbons") s A representation of a 2-D region by the medial line and the perpendicular distance to the boundary at each point along it.

• Sketch Maps A rough line drawing of a scene.

• Sobel Operators -A popular convolution operator for detecting edges. Similar to other difference operators such as the Prewitt Operator.

• Spectral Analysis s Interpreting image points in terms of their response to various light frequencies (colors).

• Splines (B-Splines) s Piecewise continuous polynomial curves used to approximate a curve.

• Steroscopic Approach s Use of triangulation between two car more views. obtained from different positions, to determine range or depth.

• Structured Light i Sheets of light and other projective light con- figurations used to directly determine shape and/or range from the observed configuration that the projected line, circle, grid, etc. makes as it intersects the object.

152 •

• Symbolic Description t Non-iconic scene descriptions such as graph representations

• Syntactic Analysis* Recognizing images by a "parsing" process as being built up of primitive elements.

• Template* A prototype iconic model that can be used directly to match to image characteristics for object recognition or inspection.

« Template Matching* Corellating an object template with an observed image field - usually performed at the pixel level.

• Texture* A local variation in pixel values that repeats in a regular or random way across a portion of an image or object.

• Thresholdings Separating regions of an image based on pixel values above or below a chosen (threshold) value.

• Top Down Approach (Goal Directed)* An approach in which the inter- pretation stage is guided in its analysis by trial or test descriptions of a scene. Sometimes referred to as "Hypothesize and Test."

• Tracking* Processing sequences of images in real time to derive a description of the motion of one or more objects in a scene.

• Vertex: The point on a polyhedron common to three or more sides.

• Viewpoint: The position (or direction) from which the scene is observed.

« Vision* The process of understanding the environment based on image data.

• Wireframe Model* A 3-D model, similar to a wireframe, in which the object is defined in terms of edges and vertices.

• Window: A selected portion (usually square or rectangular) of an image.

• 2-D* Two dimensional.

• 2.5-D Sketch* A scene representation proposed by Marr (1978), consisting of surface distances and orientations.

• 3-D* Three dimensional.

153 APPENDIX L

SOME PUBLICATION SOURCES FOR FUTHER INFORMATION

f.

154 APPENDIX L

SOME PUBLICATION SOURCES FOR FURTHER INFORMATION

A. Recent Books

Ballard, D.H., and Brown, C.M., Computer Vision , Englewood Cliffs: Prentice Hall, 1982.

Brady, M. (fid.) Computer Vision . Amsterdam: North Holland, 1981.

Cohen, P.R., and Feigenbaum, E.A., "Vision," The Handbook of Artificial

Intelligence , Vol. Ill, Los Altos, CA: Kaufman, 1982, pp. 125-321.

Haralick, R. (Ed.) Picture Data^ ^ Analysis , Berlin: Springer-Verlag, 1982 .

Marr, D.C., Vision , San Francisco: Watt. Freeman, 1982.

Nevatla, R., Machine Perception , Englewood Cliffs: Rrentice Hall, 1982.

Rosenfeld, A., and Kak, A.C., , 2nd Ed., Vols. 1 and 2, New York: Acad. Pr., 1982.

Ih.vlidis, T., Algorithms for Graphics and Image Processing , Rockville, Md.:

Computer Science Press t 1982.

Hord, R. M., Digital Image Processing of Remotely Sensed Data . New York: Acad. Pr., 15©2

155 B. Periodic Conference and Workshop Proceedings

DARPA Image Understanding Workshops - Scl. Applications Inc,

International Conferences on Pattern Recognition - Tiara Computer Society

National Conferences on Artificial Intelligence - AAAI

IEEE Workshops on Computer Vision - IEEE Computer Society

International Joint Conferences on Artificial Intelligence

International Conferences on Robot Vision - The Journal and Sensor Review

Workshops on Industrial Applications of Computer Vision - IEEE Computer Society

SP3E Technical Symposia - Society of Photo-Optical Instrumentation Engineers

NSF Workshops (Aperiodic workshops on various topics in computer vision).

156 C. Periodicals

Computer Graphics and Image Processing

Artificial Intelligence

IEEE Transactions on Bittern Analysis and Machine Intelligence

Hittern Recognition

International Journal of Robotics Research

IEEE Transactions on Systems, Man and Cybernetics

157 D. Some Recent Bibliographies, Surveys and Synthesefe

Ahuja, N. and Schacter, B. , "Image Models," ACM Computing Surveys, Vol. 13, No. 4, Dec. 1981, pp. 373-398.

Barrow, H.G. and Tenenbaum, J. M., "Computational Vision," Proc. of the

IEEE , Vol. 69, No. 5, May 1981, pp. 572-595.

B inford, T.O. , "Survey of Model-Based Image Analysis Systems," Robotics Research , Vol. 1, No. 1, Spring 1982, pp 18-64.

Brady, M. , "Computational Approaches to Image Understanding? Computer Surveys . Vol. 14, No. 1, March 1982, pp. 3”71.

Chin, R. T., "Automated Visual Inspection Techniques and Applications! A

Bibliography," fettern Recongltlon . Vol. 5, No. 4., 1982, PP. 328-357.

Genery, D., Cunningham, R«, Saund, E., High, J., and Rouff, C«,

Computer Vision . JPL 81-92, JFL, Pasadena, CA, Nov, 1, 1981.

Kruger, R.P. and Thompson, V.B., "A Technical and Economic Assessments of Computer Vision for Inspection and Robotic Assembly," Proc. of the

IEEE . Vol. 69, No. 12, Dec. 1981, pp. 1524-1538.

Rosenfeld, A., Picture Processing . U. of Md. C.S. Center, College Burk, Md., A yearly bibliography of computer processing of pictorial information.

Srihari, S. N. "Representation of 3 D Digital Images," ACM Computing

Surveys . Vol. 13, No. 4 Dec. 1981., pp. 399-424.

158 NBS-114A (REV. 2-80 U.S. DEPT. OF COMM. 1. PUBLICATION OR 2. Performing Organ. Report No. 3. Publication Date REPORT NO. jj BIBLIOGRAPHIC DATA SHEET (See instructions) NBSIR82 September 1982 4. TITLE AND SUBTITLE AN OVERVIEW OF COMPUTER VISION

5. AUTHOR(S) William B. Gevarter

6. PERFORMING ORGANIZATION (If joint or other than N BS, see instructions) 7. Contract/Grant No.

national bureau of standards DEPARTMENT OF COMMERCE 8. Type of Report & Period Covered WASHINGTON, D.C. 20234

9. SPONSORING ORGANIZATION NAME AND COMPLETE ADDRESS (Street. City. State. ZIP)

10. SUPPLEMENTARY NOTES

Q2] Document describes a computer program; SF-185, FIPS Software Summary, is attached.

11. ABSTRACT (A 200-word or less factual summary of most significant information. If document includes a significant bibliography or literature survey, mention it here)

This report provides on overview of computer vision. The emphasis is on image understanding and scene analysis, though pertinent aspects of pattern recognition are treated. Image processing for sensor correction, rectification, image en- hancement, etc., is not covered.

This report reviews the basic approach to computer vision systems, the techniques utilized, applications, the current existing systems and state-of-the-art, issues and research requirements, who is doing it and who is funding it, and finally future trends and expectations. The intent is to provide an overall perspective of this vital field with its many participants, that will be useful to engineering and research managers, potential users and others who will be impacted by this fielji as it unfolds.

12. KEY WORDS (Six to twelve entries; alphabetical order; capitalize only proper names; and separate key words by semicolons) Artificial intelligence; Automation; Computational; Computer perception; Computer vision: Forecasting; Industrial vision systems; Image understanding; Pattern recognition; Scene analysis; Vision; Vision systems

13. AVAILABILITY 14. NO. OF PRINTED PAGES |^~1 Unlimited

For Official Distribution. Do Not Release to NTIS 170 | 1 Order From Superintendent of Documents, U.S. Government Printing Office, Washington, D.C. 20402. 15. Price

$ [Xl Order From National Technical Information Service (NTIS), Springfield, VA. 22161 15.00

USCOMM-DC 6043-P80 » ..

. .. )

t i 1 )