<<

Today

• Course overview • Requirements, logistics Computer Vision • Image formation

Thursday, August 30

Introductions Introductions

• Instructor: Prof. Kristen Grauman grauman @ cs TAY 4.118, Thurs 2-4 pm • TA: Sudheendra Vijayanarasimhan svnaras @ cs ENS 31 NQ, Mon/Wed 1-2 pm • Class page: Check for updates to schedule, assignments, etc.

http://www.cs.utexas.edu/~grauman/courses/378/main.htm

Computer vision Why vision?

• Automatic understanding of images and • As image sources multiply, so do video applications • Computing properties of the 3D world from – Relieve humans of boring, easy tasks visual data – Enhance human abilities • Algorithms and representations to allow a – Advance human-computer interaction, machine to recognize objects, people, scenes, and activities. – Perception for robotics / autonomous agents • Possible insights into human vision

1 Some applications Some applications

Navigation, driver safety Autonomous robots Assistive technology Surveillance Factory – inspection Monitoring for safety (Cognex) (Poseidon)

Visualization License plate reading Visualization and tracking Medical (the Matrix)

Some applications Why is vision difficult?

• Ill-posed problem: real world much more complex than what we can measure in

Situated search images Multi-modal interfaces –3D Æ 2D • Impossible to literally “invert” image formation process

Image and video databases - CBIR Tracking, activity recognition

Challenges: robustness Challenges: context and human experience

Illumination Object pose Clutter

Occlusions Intra-class Viewpoint Function Dynamics appearance Context cues

2 Challenges: complexity Why is vision difficult? • Thousands to millions of pixels in an image • 3,000-30,000 human recognizable object categories • Ill-posed problem: real world much more • 30+ degrees of freedom in the pose of articulated complex than what we can measure in objects (humans) images • Billions of images indexed by Google Image Search –3D Æ 2D • 18 billion+ prints produced from digital camera images • Not possible to “invert” image formation in 2004 process • 295.5 million camera phones sold in 2005 • Generally requires assumptions, • About half of the cerebral cortex in primates is constraints; exploitation of domain- devoted to processing visual information [Felleman specific knowledge and van Essen 1991]

Related disciplines Vision and graphics

Artificial Vision intelligence Images Model , Pattern physics recognition Image vision Cognitive processing Algorithms Inverse problems: analysis and synthesis.

Research problems vs. application areas Goals of this course • Feature detection • Industrial inspection and quality control • Contour representation • Introduction to primary topics • Segmentation • Reverse engineering • Stereo vision • Surveillance and security • Hands-on experience with algorithms • Shape modeling • Face, gesture recognition • Color vision • Road monitoring • Views of vision as a research area • Motion analysis • Autonomous vehicles • Invariants • Military applications • Uncalibrated, self- • Medical image analysis calibrating systems • Image databases • Object detection • Virtual reality • Object recognition

List from [Trucco & Verri 1998]

3 Topics overview We will not cover (extensively)

• Image formation, cameras • Image processing •Color • Human visual system • Features • Particular machine vision systems or • Grouping applications • Multiple views • Recognition and learning • Motion and tracking

Image formation Features and filters

• Inverse process of vision: how does light in 3d world project to form 2d images?

Transforming and describing images; textures and colors

Grouping Multiple views

Lowe

[fig from Shi et al] Clustering, Hartley and Zisserman segmentation, Multi-view geometry and fitting; what parts matching, stereo belong together? Tomasi and Kanade

4 Recognition and learning Motion and tracking

Tracking objects, video analysis, low level motion

Shape matching, recognizing objects and categories, learning techniques Tomas Izo

Requirements

• Biweekly (approx) problem sets – Concept questions – Implementation problems • Two exams, midterm and final • Current events (optional)

In addition, for graduate students: • Research paper summary and review • Implementation extension

Grading policy Due dates

Final grade breakdown: • Assignments due before class starts on • Problem sets (50%) due date • Midterm quiz (15%) • Lose half of possible remaining credit • Final exam (20%) each day late • Class participation (15%) • Three free late days, total

5 Collaboration policy Current events (optional) You are welcome to discuss problem sets, but all responses and code must be • Any vision-related piece of news; may written individually. revolve around policy, editorial, technology, new product, … • Brief overview to the class Students submitting solutions found to be • Must be current identical or substantially similar (due to • No ads inappropriate collaboration) risk failing the course. • Email relevant links or information to TA

Miscellaneous Paper review guidelines • Check class website • Thorough summary in your own words • Main contribution • Make sure you get on class mailing list • Strengths? Weaknesses? • No laptops in class please • How convincing are the experiments? Suggestions to improve them? • Feedback welcome and useful • Extensions? • 4 pages max

• May require reading additional references

Image formation

• How are objects in the world captured in an image?

6 Physical parameters of Radiometry image formation

• Photometric • Images formed – Type, direction, intensity of light reaching sensor depend on – Surfaces’ reflectance properties amount of light from light sources • Optical and surface – Sensor’s lens type reflectance – focal length, field of view, aperture properties (See • Geometric F&P Ch 4) – Type of projection – Camera pose – Perspective distortions

Light source direction Surface reflectance properties

Specular

[fig from Fleming, Torralba, & Adelson, 2004]

Lambertian

Image credit: Don Deering

Perspective projection

• Pinhole camera: simple model to approximate imaging process

In Latin, means ‘dark room’

"Reinerus Gemma-Frisius, observed an eclipse of the sun at Louvain on January 24, 1544, and later he used this illustration of the event in his book De Radio If we treat pinhole as a point, only one ray Astronomica et Geometrica, 1545. It is thought to be the first published illustration of a camera obscura..." from any given point can enter the camera Hammond, John H., The Camera Obscura, A Chronicle Forsyth and Ponce http://www.acmi.net.au/AIC/CAMERA_OBSCURA.html

7 Camera obscura Perspective effects

• Far away objects appear smaller

Jetty at Margate England, 1898.

An attraction in the late 19th century Around 1870s

http://brightbytes.com/cosite/collection2.html Adapted from R. Duraiswami Forsyth and Ponce

Perspective effects Perspective projection equations

• Parallel lines in the scene intersect in the image • 3d world mapped to 2d projection Image plane Focal length

Optical Camera axis frame

Board

Forsyth and Ponce Forsyth and Ponce

Perspective projection equations Projection properties

Image plane Focal • Many-to-one: any points along same ray length to same point in image

Optical •Points Æ points Camera axis frame • Lines Æ lines (collinearity preserved) • Distances and are not preserved • Degenerate cases: Non-linear – Line through focal point projects to a point. – Plane through focal point projects to line Image coordinates Scene point – Plane perpendicular to image plane projects to part of the image. Forsyth and Ponce

8 Perspective and Weak perspective • Use of correct perspective projection indicated in • Approximation: treat magnification as constant 1st century B.C. frescoes • Assumes scene depth << average distance to • Skill resurfaces in : artists develop camera systematic methods to determine perspective projection (around 1480-1515) • Makes perspective equations linear

Image World plane points:

Raphael Durer, 1525

Orthographic projection

• Given camera at constant distance from scene • World points projected along rays parallel to optical access • Limit of perspective projection as

Planar pinhole Orthographic

perspective projection From M. Pollefeys

Which projection model?

• Weak perspective: – Accurate for small, distant objects; recognition – Linear projection equations - simplifies math

• Pinhole perspective: – More accurate but more complex – Structure from motion

9 Pinhole size / aperture Pinhole vs. lens

Larger Brighter, blurrier

Dimmer, more focus

Dimmer, blur from defraction

Smaller

Cameras with lenses Field of view • Gather more light, while keeping focus; make pinhole perspective projection practical •As f gets smaller, image becomes more wide Rays entering parallel Thin lens (more world points on one side go through project onto the finite focus on other, and image plane) Left focus Right focus vice versa. •As f gets larger, image In ideal case – all rays becomes more from P imaged at P’. telescopic (smaller part of the world projects onto Field of view (portion the finite image plane)

Lens diameter d Focal length f of 3d space seen by camera) depends on d and f.

from R. Duraiswami

Focus and Focus and depth of field

• Depth of field: distance between image planes where blur is tolerable

Thin lens: scene points at distinct depths come in focus at different image planes. (Real systems have greater depth of field.)

“circles of confusion”

Shapiro and Stockman Image credit: cambridgeincolour.com

10 Depth from focus Camera parameters

• How do points in real world relate to positions in Images from same point of the image? view, different • Perspective equations so far in terms of camera parameters camera’s reference frame…

3d shape / depth estimates

[figs from H. Jin and P. Favaro, 2002]

Camera parameters Camera calibration

Need to estimate camera’s intrinsic and extrinsic • Extrinsic params: rotation matrix and translation parameters to calibrate geometry. vector • Intrinsic params: focal length, pixel sizes (mm), World frame image center point, radial distortion parameters Extrinsic: Camera frame ÅÆWorld frame

Intrinsic: • Knowing the relationship between real world and Image coordinates relative to Camera image coordinates useful for estimating 3d frame camera ÅÆ Pixel coordinates shape

More on this later

Articulated tracking

3d skeleton extraction

Demirdjian et al. Brostow et al, 2004

11 Human eye Sensors

• Often CCD camera: charge coupled device Pupil/Iris – control amount of light • Record amount of light reaching grid passing through lens photosensors, which convert light energy into Retina - contains voltage sensor cells, where • Read digital output row-by-row image is formed Fovea – highest camera CCD frame concentration of array computer cones grabber

Shapiro and Stockman

Digital images Digital images width Think of images as j=1 520 i=0 matrices taken from Intensity : [0,255] CCD array.

500 height

im[176][201] has value 164 im[194][203] has value 37

Resolution

Color images, • sensor: size of real world scene element a that RGB color images to a single pixel space • image: of pixels • Influences what analysis is feasible, affects best representation choice.

RGB [Mori et al]

12 Resolution Other sensors

…though not necessarily for the human • Stereo cameras visual system with familiar faces… • MRI scans •Xray • LIDAR devices…

[Jim Gasperini]

[Sinha et al]

geospatial-online.com

Summary Next

• Image formation affected by geometry, Problem set 0 due Sept 6 photometry, and optics. • Matlab warmup • Projection equations express how world • Image formation questions points mapped to 2d image. • Read F&P Chapter 1 • Lenses make pinhole model practical. • Imaged points related to real world Reading for next lecture: coordinates via calibrated cameras. • F&P Chapter 6

13