Today
• Course overview • Requirements, logistics Computer Vision • Image formation
Thursday, August 30
Introductions Introductions
• Instructor: Prof. Kristen Grauman grauman @ cs TAY 4.118, Thurs 2-4 pm • TA: Sudheendra Vijayanarasimhan svnaras @ cs ENS 31 NQ, Mon/Wed 1-2 pm • Class page: Check for updates to schedule, assignments, etc.
http://www.cs.utexas.edu/~grauman/courses/378/main.htm
Computer vision Why vision?
• Automatic understanding of images and • As image sources multiply, so do video applications • Computing properties of the 3D world from – Relieve humans of boring, easy tasks visual data – Enhance human abilities • Algorithms and representations to allow a – Advance human-computer interaction, machine to recognize objects, people, visualization scenes, and activities. – Perception for robotics / autonomous agents • Possible insights into human vision
1 Some applications Some applications
Navigation, driver safety Autonomous robots Assistive technology Surveillance Factory – inspection Monitoring for safety (Cognex) (Poseidon)
Visualization License plate reading Visualization and tracking Medical Visual effects imaging (the Matrix)
Some applications Why is vision difficult?
• Ill-posed problem: real world much more complex than what we can measure in
Situated search images Multi-modal interfaces –3D Æ 2D • Impossible to literally “invert” image formation process
Image and video databases - CBIR Tracking, activity recognition
Challenges: robustness Challenges: context and human experience
Illumination Object pose Clutter
Occlusions Intra-class Viewpoint Function Dynamics appearance Context cues
2 Challenges: complexity Why is vision difficult? • Thousands to millions of pixels in an image • 3,000-30,000 human recognizable object categories • Ill-posed problem: real world much more • 30+ degrees of freedom in the pose of articulated complex than what we can measure in objects (humans) images • Billions of images indexed by Google Image Search –3D Æ 2D • 18 billion+ prints produced from digital camera images • Not possible to “invert” image formation in 2004 process • 295.5 million camera phones sold in 2005 • Generally requires assumptions, • About half of the cerebral cortex in primates is constraints; exploitation of domain- devoted to processing visual information [Felleman specific knowledge and van Essen 1991]
Related disciplines Vision and graphics
Artificial Vision intelligence Images Model Geometry, Pattern physics recognition Computer Graphics Image vision Cognitive processing science Algorithms Inverse problems: analysis and synthesis.
Research problems vs. application areas Goals of this course • Feature detection • Industrial inspection and quality control • Contour representation • Introduction to primary topics • Segmentation • Reverse engineering • Stereo vision • Surveillance and security • Hands-on experience with algorithms • Shape modeling • Face, gesture recognition • Color vision • Road monitoring • Views of vision as a research area • Motion analysis • Autonomous vehicles • Invariants • Military applications • Uncalibrated, self- • Medical image analysis calibrating systems • Image databases • Object detection • Virtual reality • Object recognition
List from [Trucco & Verri 1998]
3 Topics overview We will not cover (extensively)
• Image formation, cameras • Image processing •Color • Human visual system • Features • Particular machine vision systems or • Grouping applications • Multiple views • Recognition and learning • Motion and tracking
Image formation Features and filters
• Inverse process of vision: how does light in 3d world project to form 2d images?
Transforming and describing images; textures and colors
Grouping Multiple views
Lowe
[fig from Shi et al] Clustering, Hartley and Zisserman segmentation, Multi-view geometry and fitting; what parts matching, stereo belong together? Tomasi and Kanade
4 Recognition and learning Motion and tracking
Tracking objects, video analysis, low level motion
Shape matching, recognizing objects and categories, learning techniques Tomas Izo
Requirements
• Biweekly (approx) problem sets – Concept questions – Implementation problems • Two exams, midterm and final • Current events (optional)
In addition, for graduate students: • Research paper summary and review • Implementation extension
Grading policy Due dates
Final grade breakdown: • Assignments due before class starts on • Problem sets (50%) due date • Midterm quiz (15%) • Lose half of possible remaining credit • Final exam (20%) each day late • Class participation (15%) • Three free late days, total
5 Collaboration policy Current events (optional) You are welcome to discuss problem sets, but all responses and code must be • Any vision-related piece of news; may written individually. revolve around policy, editorial, technology, new product, … • Brief overview to the class Students submitting solutions found to be • Must be current identical or substantially similar (due to • No ads inappropriate collaboration) risk failing the course. • Email relevant links or information to TA
Miscellaneous Paper review guidelines • Check class website • Thorough summary in your own words • Main contribution • Make sure you get on class mailing list • Strengths? Weaknesses? • No laptops in class please • How convincing are the experiments? Suggestions to improve them? • Feedback welcome and useful • Extensions? • 4 pages max
• May require reading additional references
Image formation
• How are objects in the world captured in an image?
6 Physical parameters of Radiometry image formation
• Photometric • Images formed – Type, direction, intensity of light reaching sensor depend on – Surfaces’ reflectance properties amount of light from light sources • Optical and surface – Sensor’s lens type reflectance – focal length, field of view, aperture properties (See • Geometric F&P Ch 4) – Type of projection – Camera pose – Perspective distortions
Light source direction Surface reflectance properties
Specular
[fig from Fleming, Torralba, & Adelson, 2004]
Lambertian
Image credit: Don Deering
Perspective projection Camera obscura
• Pinhole camera: simple model to approximate imaging process
In Latin, means ‘dark room’
"Reinerus Gemma-Frisius, observed an eclipse of the sun at Louvain on January 24, 1544, and later he used this illustration of the event in his book De Radio If we treat pinhole as a point, only one ray Astronomica et Geometrica, 1545. It is thought to be the first published illustration of a camera obscura..." from any given point can enter the camera Hammond, John H., The Camera Obscura, A Chronicle Forsyth and Ponce http://www.acmi.net.au/AIC/CAMERA_OBSCURA.html
7 Camera obscura Perspective effects
• Far away objects appear smaller
Jetty at Margate England, 1898.
An attraction in the late 19th century Around 1870s
http://brightbytes.com/cosite/collection2.html Adapted from R. Duraiswami Forsyth and Ponce
Perspective effects Perspective projection equations
• Parallel lines in the scene intersect in the image • 3d world mapped to 2d projection Image plane Focal length
Optical Camera axis frame
Board
Forsyth and Ponce Forsyth and Ponce
Perspective projection equations Projection properties
Image plane Focal • Many-to-one: any points along same ray length map to same point in image
Optical •Points Æ points Camera axis frame • Lines Æ lines (collinearity preserved) • Distances and angles are not preserved • Degenerate cases: Non-linear – Line through focal point projects to a point. – Plane through focal point projects to line Image coordinates Scene point – Plane perpendicular to image plane projects to part of the image. Forsyth and Ponce
8 Perspective and art Weak perspective • Use of correct perspective projection indicated in • Approximation: treat magnification as constant 1st century B.C. frescoes • Assumes scene depth << average distance to • Skill resurfaces in Renaissance: artists develop camera systematic methods to determine perspective projection (around 1480-1515) • Makes perspective equations linear
Image World plane points:
Raphael Durer, 1525
Orthographic projection
• Given camera at constant distance from scene • World points projected along rays parallel to optical access • Limit of perspective projection as
Planar pinhole Orthographic
perspective projection From M. Pollefeys
Which projection model?
• Weak perspective: – Accurate for small, distant objects; recognition – Linear projection equations - simplifies math
• Pinhole perspective: – More accurate but more complex – Structure from motion
9 Pinhole size / aperture Pinhole vs. lens
Larger Brighter, blurrier
Dimmer, more focus
Dimmer, blur from defraction
Smaller
Cameras with lenses Field of view • Gather more light, while keeping focus; make pinhole perspective projection practical •As f gets smaller, image becomes more wide Rays entering parallel Thin lens angle (more world points on one side go through project onto the finite focus on other, and image plane) Left focus Right focus vice versa. •As f gets larger, image In ideal case – all rays becomes more from P imaged at P’. telescopic (smaller part of the world projects onto Field of view (portion the finite image plane)
Lens diameter d Focal length f of 3d space seen by camera) depends on d and f.
from R. Duraiswami
Focus and depth of field Focus and depth of field
• Depth of field: distance between image planes where blur is tolerable
Thin lens: scene points at distinct depths come in focus at different image planes. (Real camera lens systems have greater depth of field.)
“circles of confusion”
Shapiro and Stockman Image credit: cambridgeincolour.com
10 Depth from focus Camera parameters
• How do points in real world relate to positions in Images from same point of the image? view, different • Perspective equations so far in terms of camera parameters camera’s reference frame…
3d shape / depth estimates
[figs from H. Jin and P. Favaro, 2002]
Camera parameters Camera calibration
Need to estimate camera’s intrinsic and extrinsic • Extrinsic params: rotation matrix and translation parameters to calibrate geometry. vector • Intrinsic params: focal length, pixel sizes (mm), World frame image center point, radial distortion parameters Extrinsic: Camera frame ÅÆWorld frame
Intrinsic: • Knowing the relationship between real world and Image coordinates relative to Camera image coordinates useful for estimating 3d frame camera ÅÆ Pixel coordinates shape
Articulated tracking
3d skeleton extraction
Demirdjian et al. Brostow et al, 2004
11 Human eye Sensors
• Often CCD camera: charge coupled device Pupil/Iris – control amount of light • Record amount of light reaching grid passing through lens photosensors, which convert light energy into Retina - contains voltage sensor cells, where • Read digital output row-by-row image is formed Fovea – highest camera CCD frame concentration of array optics computer cones grabber
Shapiro and Stockman
Digital images Digital images width Think of images as j=1 520 i=0 matrices taken from Intensity : [0,255] CCD array.
500 height
im[176][201] has value 164 im[194][203] has value 37
Resolution
Color images, • sensor: size of real world scene element a that RGB color images to a single pixel space • image: number of pixels • Influences what analysis is feasible, affects best representation choice.
RGB [Mori et al]
12 Resolution Other sensors
…though not necessarily for the human • Stereo cameras visual system with familiar faces… • MRI scans •Xray • LIDAR devices…
[Jim Gasperini]
[Sinha et al]
geospatial-online.com
Summary Next
• Image formation affected by geometry, Problem set 0 due Sept 6 photometry, and optics. • Matlab warmup • Projection equations express how world • Image formation questions points mapped to 2d image. • Read F&P Chapter 1 • Lenses make pinhole model practical. • Imaged points related to real world Reading for next lecture: coordinates via calibrated cameras. • F&P Chapter 6
13