Computer Vision Today Introductions Introductions Computer Vision Why
Total Page:16
File Type:pdf, Size:1020Kb
Today • Course overview • Requirements, logistics Computer Vision • Image formation Thursday, August 30 Introductions Introductions • Instructor: Prof. Kristen Grauman grauman @ cs TAY 4.118, Thurs 2-4 pm • TA: Sudheendra Vijayanarasimhan svnaras @ cs ENS 31 NQ, Mon/Wed 1-2 pm • Class page: Check for updates to schedule, assignments, etc. http://www.cs.utexas.edu/~grauman/courses/378/main.htm Computer vision Why vision? • Automatic understanding of images and • As image sources multiply, so do video applications • Computing properties of the 3D world from – Relieve humans of boring, easy tasks visual data – Enhance human abilities • Algorithms and representations to allow a – Advance human-computer interaction, machine to recognize objects, people, visualization scenes, and activities. – Perception for robotics / autonomous agents • Possible insights into human vision 1 Some applications Some applications Navigation, driver safety Autonomous robots Assistive technology Surveillance Factory – inspection Monitoring for safety (Cognex) (Poseidon) Visualization License plate reading Visualization and tracking Medical Visual effects imaging (the Matrix) Some applications Why is vision difficult? • Ill-posed problem: real world much more complex than what we can measure in Situated search images Multi-modal interfaces –3D Æ 2D • Impossible to literally “invert” image formation process Image and video databases - CBIR Tracking, activity recognition Challenges: robustness Challenges: context and human experience Illumination Object pose Clutter Occlusions Intra-class Viewpoint Function Dynamics appearance Context cues 2 Challenges: complexity Why is vision difficult? • Thousands to millions of pixels in an image • 3,000-30,000 human recognizable object categories • Ill-posed problem: real world much more • 30+ degrees of freedom in the pose of articulated complex than what we can measure in objects (humans) images • Billions of images indexed by Google Image Search –3D Æ 2D • 18 billion+ prints produced from digital camera images • Not possible to “invert” image formation in 2004 process • 295.5 million camera phones sold in 2005 • Generally requires assumptions, • About half of the cerebral cortex in primates is constraints; exploitation of domain- devoted to processing visual information [Felleman specific knowledge and van Essen 1991] Related disciplines Vision and graphics Artificial Vision intelligence Images Model Geometry, Pattern physics recognition Computer Graphics Image vision Cognitive processing science Algorithms Inverse problems: analysis and synthesis. Research problems vs. application areas Goals of this course • Feature detection • Industrial inspection and quality control • Contour representation • Introduction to primary topics • Segmentation • Reverse engineering • Stereo vision • Surveillance and security • Hands-on experience with algorithms • Shape modeling • Face, gesture recognition • Color vision • Road monitoring • Views of vision as a research area • Motion analysis • Autonomous vehicles • Invariants • Military applications • Uncalibrated, self- • Medical image analysis calibrating systems • Image databases • Object detection • Virtual reality • Object recognition List from [Trucco & Verri 1998] 3 Topics overview We will not cover (extensively) • Image formation, cameras • Image processing •Color • Human visual system • Features • Particular machine vision systems or • Grouping applications • Multiple views • Recognition and learning • Motion and tracking Image formation Features and filters • Inverse process of vision: how does light in 3d world project to form 2d images? Transforming and describing images; textures and colors Grouping Multiple views Lowe [fig from Shi et al] Clustering, Hartley and Zisserman segmentation, Multi-view geometry and fitting; what parts matching, stereo belong together? Tomasi and Kanade 4 Recognition and learning Motion and tracking Tracking objects, video analysis, low level motion Shape matching, recognizing objects and categories, learning techniques Tomas Izo Requirements • Biweekly (approx) problem sets – Concept questions – Implementation problems • Two exams, midterm and final • Current events (optional) In addition, for graduate students: • Research paper summary and review • Implementation extension Grading policy Due dates Final grade breakdown: • Assignments due before class starts on • Problem sets (50%) due date • Midterm quiz (15%) • Lose half of possible remaining credit • Final exam (20%) each day late • Class participation (15%) • Three free late days, total 5 Collaboration policy Current events (optional) You are welcome to discuss problem sets, but all responses and code must be • Any vision-related piece of news; may written individually. revolve around policy, editorial, technology, new product, … • Brief overview to the class Students submitting solutions found to be • Must be current identical or substantially similar (due to • No ads inappropriate collaboration) risk failing the course. • Email relevant links or information to TA Miscellaneous Paper review guidelines • Check class website • Thorough summary in your own words • Main contribution • Make sure you get on class mailing list • Strengths? Weaknesses? • No laptops in class please • How convincing are the experiments? Suggestions to improve them? • Feedback welcome and useful • Extensions? • 4 pages max • May require reading additional references Image formation • How are objects in the world captured in an image? 6 Physical parameters of Radiometry image formation • Photometric • Images formed – Type, direction, intensity of light reaching sensor depend on – Surfaces’ reflectance properties amount of light from light sources • Optical and surface – Sensor’s lens type reflectance – focal length, field of view, aperture properties (See • Geometric F&P Ch 4) – Type of projection – Camera pose – Perspective distortions Light source direction Surface reflectance properties Specular [fig from Fleming, Torralba, & Adelson, 2004] Lambertian Image credit: Don Deering Perspective projection Camera obscura • Pinhole camera: simple model to approximate imaging process In Latin, means ‘dark room’ "Reinerus Gemma-Frisius, observed an eclipse of the sun at Louvain on January 24, 1544, and later he used this illustration of the event in his book De Radio If we treat pinhole as a point, only one ray Astronomica et Geometrica, 1545. It is thought to be the first published illustration of a camera obscura..." from any given point can enter the camera Hammond, John H., The Camera Obscura, A Chronicle Forsyth and Ponce http://www.acmi.net.au/AIC/CAMERA_OBSCURA.html 7 Camera obscura Perspective effects • Far away objects appear smaller Jetty at Margate England, 1898. An attraction in the late 19th century Around 1870s http://brightbytes.com/cosite/collection2.html Adapted from R. Duraiswami Forsyth and Ponce Perspective effects Perspective projection equations • Parallel lines in the scene intersect in the image • 3d world mapped to 2d projection Image plane Focal length Optical Camera axis frame Board Forsyth and Ponce Forsyth and Ponce Perspective projection equations Projection properties Image plane Focal • Many-to-one: any points along same ray length map to same point in image Optical •Points Æ points Camera axis frame • Lines Æ lines (collinearity preserved) • Distances and angles are not preserved • Degenerate cases: Non-linear – Line through focal point projects to a point. – Plane through focal point projects to line Image coordinates Scene point – Plane perpendicular to image plane projects to part of the image. Forsyth and Ponce 8 Perspective and art Weak perspective • Use of correct perspective projection indicated in • Approximation: treat magnification as constant 1st century B.C. frescoes • Assumes scene depth << average distance to • Skill resurfaces in Renaissance: artists develop camera systematic methods to determine perspective projection (around 1480-1515) • Makes perspective equations linear Image World plane points: Raphael Durer, 1525 Orthographic projection • Given camera at constant distance from scene • World points projected along rays parallel to optical access • Limit of perspective projection as Planar pinhole Orthographic perspective projection From M. Pollefeys Which projection model? • Weak perspective: – Accurate for small, distant objects; recognition – Linear projection equations - simplifies math • Pinhole perspective: – More accurate but more complex – Structure from motion 9 Pinhole size / aperture Pinhole vs. lens Larger Brighter, blurrier Dimmer, more focus Dimmer, blur from defraction Smaller Cameras with lenses Field of view • Gather more light, while keeping focus; make pinhole perspective projection practical •As f gets smaller, image becomes more wide Rays entering parallel Thin lens angle (more world points on one side go through project onto the finite focus on other, and image plane) Left focus Right focus vice versa. •As f gets larger, image In ideal case – all rays becomes more from P imaged at P’. telescopic (smaller part of the world projects onto Field of view (portion the finite image plane) Lens diameter d Focal length f of 3d space seen by camera) depends on d and f. from R. Duraiswami Focus and depth of field Focus and depth of field • Depth of field: distance between image planes where blur is tolerable Thin lens: scene points at distinct depths come in focus at different image planes. (Real camera lens systems have greater depth of field.) “circles