EE290T : 3D Reconstruction and Recognition Acknowledgement
Courtesy of Prof. Silvio Savarese. Introduction “There was a table set out under a tree in front of the house, and the March Hare and the Hatter were having tea at it.”
“The table was a large one, but the three were all crowded together at one corner of it …”
From “A M a d Tea-Party” Alice's Adventures in W onderla n by Lewis Ca rroll “There was a table set out under a tree in front of the house, and the March Hare and the Hatter were having tea at it.”
“The table was a large one, but the three were all crowded together at one corner of it …”
From “A M a d Tea-Party” Alice's Adventures in W onderla n by Lewis Ca rroll
Illustra tion by Arthur Ra ck ha m Computer vision
Image/ video
Object 1 Object N
- semantic -semantic … Computer vision
Image/ video
Object 1 Object N
- semantic … -semantic -geometry -geometry Computer vision
Image/ video
Object 1 Object N
- semantic … -semantic -geometry -geometry
spatial & temporal relations Computer vision
Image/ video
Object 1 Object N
- semantic … -semantic -geometry -geometry
spatial & temporal relations
Scene
-Sema ntic - geometry Computer vision
• Informatio n extraction Sensing device Computational • Interpretation device
1. Information extraction: features, 3D structure, motion flows, etc… 2. Interpretation: recognize objects, scenes, actions, events Computer vision and Applications
EosSystems
1990 2000 2010
21 Fingerprint biometrics Augmentation with 3D computer graphics
23 3D object prototyping
EosSystems Photomodeler 24 Computer vision and Applications
EosSystems Autostich
1990 2000 2010
25 Face detection Face detection Web applications
Photometria 28 Panoramic Photography
kolor 3D modeling of landmarks
30 Computer vision and Applications
• Efficient SLAM/SFM • Large scale image repositories • Deep learning (e.g. ImageNet)
EosSystems Autostich
1990 2000 2010
31 Computer vision and Applications
Kinect
A9 Google Goggles Kooaba EosSystems Autostich
1990 2000 2010
32 Image search engines Visual search and landmarks recognition
Google Goggles 34 Visual search and landmarks recognition
• Magic leap • Daqri • Meta • Etc…
36 Motion sensing and gesture recognition
37 Autonomous navigation and safety
Mobileye: Vision systems in high-end BMW, GM, Volvo models But also, Toyota, Google, Apple, Tesla, Nissan, Ford, etc….
Source: A. Shashua, S. Seitz Personal robotics
39 Computer vision and Applications
Assistive technologies Surveillance Factory inspection
Vision for robotics, space exploration Security Computer vision and Applications
Kinect
A9 Google Goggles Kooaba EosSystems Autostich
2010 1990 2000 41 Computer vision and Applications
3D EosSystems
Google Goggles 2D
2000 2010 1990 42 Computer vision and Applications
3D EosSystems
Google Goggles 2D
2000 2010 1990 43 Current state of computer vision
3D Reconstruction 2D Recognition • 3D shape recovery • Object detection • 3D scene reconstruction • Texture classification • Camera localization • Target tracking • Pose estimation • Activity recognition
44
Current state of computer vision
S
n
av
e
l
y
e t a t
3D Reconstruction l.
, , 06 - • 3D shape recovery 08 • 3D scene reconstruction • Camera localization • Pose estimation
Levoy et al., 00 Golparvar-Fard, et al. JAEI 10 Lucas & Kanade, 81 Pandey et al. IFAC , 2010 Chen & Medioni, 92 Hartley & Zisserman, 00 Dellaert et al., 00 Pandey et al. ICRA 2011 Debevec et al., 96 Savarese et al. IJCV 05 Levoy & Hanrahan, 96 Rusinkiewic et al., 02 Nistér, 04 Savarese et al. IJCV 06 Fitzgibbon & Zisserman, 98 Microsoft’s PhotoSynth Triggs et al., 99 Brown & Lowe, 04 Schindler et al, 04 Snavely et al., 06-08 Pollefeys et al., 99 Schindler et al., 08 Kutulakos & Seitz, 99 Lourakis & Argyros, 04 Colombo et al. 05 Agarwal et al., 09 45 Frahm et al., 10
Current state of computer vision
S
n
av
e
l
y
e t a t
3D Reconstruction l.
, , 06 - • 3D shape recovery 08 • 3D scene reconstruction • Camera localization • Pose estimation
Levoy et al., 00 Golparvar-Fard, et al. JAEI 10 Lucas & Kanade, 81 Pandey et al. IFAC , 2010 Chen & Medioni, 92 Hartley & Zisserman, 00 Dellaert et al., 00 Pandey et al. ICRA 2011 Debevec et al., 96 Savarese et al. IJCV 05 Levoy & Hanrahan, 96 Rusinkiewic et al., 02 Nistér, 04 Savarese et al. IJCV 06 Fitzgibbon & Zisserman, 98 Microsoft’s PhotoSynth Triggs et al., 99 Brown & Lowe, 04 Schindler et al, 04 Snavely et al., 06-08 Pollefeys et al., 99 Schindler et al., 08 Kutulakos & Seitz, 99 Lourakis & Argyros, 04 Colombo et al. 05 Agarwal et al., 09 Frahm et al., 10 Current state of computer vision
2D Recognition • Object detection • Texture classification • Target tracking • Activity recognition
Turk & Pentland, 91 Argawal & Roth, 02 He et al. 06 Poggio et al., 93 Ramanan & Forsyth, 03 Gould et al. 08 Belhumeur et al., 97 Weber et al., 00 Maire et al. 08 LeCun et al. 98 Amit Vidal-Naquet & Ullman 02 Felzenszwalb et al., 08 and Geman, 99 Shi Fergus et al., 03 & Malik, 00 Viola & Kohli et al. 09 Torralba et al., 03 Jones, 00 L.-J. Li et al. 09 Vogel & Schiele, 03 Felzenszwalb & Huttenlocher 00 Ladicky et al. 10,11 Barnard et al., 03 Belongie & Malik, 02 Gonfaus et al. 10 Fei-Fei et al., 04 Ullman et al. 02 Farhadi et al., 09 Kumar & Hebert ‘04 Lampert et al., 09 47 Current state of computer vision
Building person car Tree car Car Person Car Bike
Street 2D Recognition • Object detection • Texture classification • Target tracking • Activity recognition
Turk & Pentland, 91 Argawal & Roth, 02 He et al. 06 Poggio et al., 93 Ramanan & Forsyth, 03 Gould et al. 08 Belhumeur et al., 97 Weber et al., 00 Maire et al. 08 LeCun et al. 98 Amit Vidal-Naquet & Ullman 02 Felzenszwalb et al., 08 and Geman, 99 Shi Fergus et al., 03 & Malik, 00 Viola & Kohli et al. 09 Torralba et al., 03 Jones, 00 L.-J. Li et al. 09 Vogel & Schiele, 03 Felzenszwalb & Huttenlocher 00 Ladicky et al. 10,11 Barnard et al., 03 Belongie & Malik, 02 Gonfaus et al. 10 Fei-Fei et al., 04 Ullman et al. 02 Farhadi et al., 09 Kumar & Hebert ‘04 Lampert et al., 09 Current state of computer vision
3D Reconstruction 2D Recognition • 3D shape recovery • Object detection • 3D scene reconstruction • Texture classification • Camera localization • Target tracking • Pose estimation • Activity recognition
Perceiving the World in 3D!
49 Course overview
1. Geometry 2. Semantics
Geometry: - How to extract 3d information? - Which cues are useful? - What are the mathematical tools? Camera systems Establish a mapping from 3D to 2D How to calibrate a camera Estimate camera parameters such as pose or focal length
? Single view metrology Estimate 3D properties of the world from a single image
? Single view metrology Estimate 3D properties of the world from a single image Multiple view geometry Estimate 3D properties of the world from multiple views Mathematical tools
Epipola r geometry
Toma si & Kana de (1993) Photoconsistency Structure from motion
Courtesy of Oxford Visua l Geometry Group Structure lighting and volumetric stereo
Scanning Michelangelo’s “The David” • The Digital Michelangelo Project - http://graphics.stanford.edu/projects/mich/ • 2 BILLION polygons, accuracy to .29mm Course overview
1. Geometry 2. Semantics
Semantics: - How to recognize objects? - How to classify images or understand a scene? - How to segment out critical semantics - How to estimate 3D properties (pose, size, shape…) Object recognition and categorization Classification: Is this an forest?
No! Classification: Does this image contain a building? [yes/no]
Yes! Detection: Does this image contain a car? [where?]
car Detection: Which objects do this image contain? [where?]
Building
clock
person car Detection: Accurate localization (segmentation)
clock Detection: Estimating 3D geometrical properties
Building 45 degree
Car, side view Person, back Challenges: viewpoint variation
slide credit: Fei-Fei, Fergus & Torralba Challenges: illumination
image credit: J. Koenderink Challenges: scale
slide credit: Fei-Fei, Fergus & Torralba Challenges: deformation Challenges: occlusion
Magritte, 1957 slide credit: Fei-Fei, Fergus & Torralba Challenges: background clutter
Kilmeny Niland. 1995 Challenges: object intra-class variation
slide credit: Fei-Fei, Fergus & Torralba A t.. . l , 11t'-:. • \J I .""' I • < Course overview
1. Geometry 2. Semantics
Joint recovery of geometry and semantics! Visua l processing in the bra in The ventral stream (also known as the "what pathway") is involved with object and visual where pathway identification and recognition. (dorsal stream)
V1
The dorsal stream (or, "where pathway") is involved what pathway with processing the object's (ventral stream) spatial location relative to the viewer and with speech78 repetition. Visua l processing in the bra in
where pathway (dorsal stream)
Pre-frontal V1 cortex
what pathway (ventral stream) 79 Joint reconstruction and recognition
Input images …
Car Person Tree80 Sky Street Building Else I n p u J t o im i a n g t e s
… r ec o ns t r uc ti on an d r ec o Ca St gn r r e e t P iti e r B s o u n il on d i n T g r e e 81 E lse Sky “There was a table set out under a tree in front of the house, and the March Hare and the Hatter were having tea at it.”
“The table was a large one, but the three were all crowded together at one corner of it …”
From “A M a d Tea-Party” Alice's Adventures in W onderla nd by Lewis Ca rroll P r o L j e ec c 16 11 15 14 13 12 10 9 8 7 6 5 4 3 2 1 t t p u r r e ese n 3D 3D V S O I D Fi V Str M E Si C C I t n n c p a o am am is e b t n t t u en ti ti i l r r t u ual j g po u S O lti ect cla oduc o ec ons n c l metri e e cene e e b t g - t l r r t u vie o ar j a a vie R unde or and or ect a r ep R n e c m ti g w ali e d S w on unde c f ss eo ode r r c r e M g e y om st b ogni r met ifi s e st met c r e a e ll ogn a o D ca l a r t n s ti met r e e c m nd abu r s t t t on h s o ry o a t ion; ion o iti c a i t l i n ogy rip ng ti ion nd r on g y o II O t n i & T ng o L s / opi b r ea s s j S ect cla eg L r c A ning m M e n ss b t a y ifi t ion NNs ca t ion I
3D geometry Recognition