EE290T : 3D Reconstruction and Recognition Acknowledgement

Courtesy of Prof. Silvio Savarese. Introduction “There was a table set out under a tree in front of the house, and the March Hare and the Hatter were having tea at it.”

“The table was a large one, but the three were all crowded together at one corner of it …”

From “A M a d Tea-Party” Alice's Adventures in W onderla n by Lewis Ca rroll “There was a table set out under a tree in front of the house, and the March Hare and the Hatter were having tea at it.”

“The table was a large one, but the three were all crowded together at one corner of it …”

From “A M a d Tea-Party” Alice's Adventures in W onderla n by Lewis Ca rroll

Illustra tion by Arthur Ra ck ha m

Image/ video

Object 1 Object N

- semantic -semantic … Computer vision

Image/ video

Object 1 Object N

- semantic … -semantic -geometry -geometry Computer vision

Image/ video

Object 1 Object N

- semantic … -semantic -geometry -geometry

spatial & temporal relations Computer vision

Image/ video

Object 1 Object N

- semantic … -semantic -geometry -geometry

spatial & temporal relations

Scene

-Sema ntic - geometry Computer vision

• Informatio n extraction Sensing device Computational • Interpretation device

1. Information extraction: features, 3D structure, motion flows, etc… 2. Interpretation: recognize objects, scenes, actions, events Computer vision and Applications

EosSystems

1990 2000 2010

21 Fingerprint biometrics Augmentation with 3D

23 3D object prototyping

EosSystems Photomodeler 24 Computer vision and Applications

EosSystems Autostich

1990 2000 2010

25 Face detection Face detection Web applications

Photometria 28 Panoramic Photography

kolor of landmarks

30 Computer vision and Applications

• Efficient SLAM/SFM • Large scale image repositories • Deep learning (e.g. ImageNet)

EosSystems Autostich

1990 2000 2010

31 Computer vision and Applications

Kinect

A9 Google Goggles Kooaba EosSystems Autostich

1990 2000 2010

32 Image search engines Visual search and landmarks recognition

Google Goggles 34 Visual search and landmarks recognition

35

• Magic leap • Daqri • Meta • Etc…

36 Motion sensing and

37 Autonomous navigation and safety

Mobileye: Vision systems in high-end BMW, GM, Volvo models But also, Toyota, Google, Apple, Tesla, Nissan, Ford, etc….

Source: A. Shashua, S. Seitz Personal robotics

39 Computer vision and Applications

Assistive technologies Surveillance Factory inspection

Vision for robotics, space exploration Security Computer vision and Applications

Kinect

A9 Google Goggles Kooaba EosSystems Autostich

2010 1990 2000 41 Computer vision and Applications

3D EosSystems

Google Goggles 2D

2000 2010 1990 42 Computer vision and Applications

3D EosSystems

Google Goggles 2D

2000 2010 1990 43 Current state of computer vision

3D Reconstruction 2D Recognition • 3D shape recovery • Object detection • 3D scene reconstruction • Texture classification • Camera localization • Target tracking • Pose estimation • Activity recognition

44

Current state of computer vision

S

n

av

e

l

y

e t a t

3D Reconstruction l.

, , 06 - • 3D shape recovery 08 • 3D scene reconstruction • Camera localization • Pose estimation

Levoy et al., 00 Golparvar-Fard, et al. JAEI 10 Lucas & Kanade, 81 Pandey et al. IFAC , 2010 Chen & Medioni, 92 Hartley & Zisserman, 00 Dellaert et al., 00 Pandey et al. ICRA 2011 Debevec et al., 96 Savarese et al. IJCV 05 Levoy & Hanrahan, 96 Rusinkiewic et al., 02 Nistér, 04 Savarese et al. IJCV 06 Fitzgibbon & Zisserman, 98 Microsoft’s PhotoSynth Triggs et al., 99 Brown & Lowe, 04 Schindler et al, 04 Snavely et al., 06-08 Pollefeys et al., 99 Schindler et al., 08 Kutulakos & Seitz, 99 Lourakis & Argyros, 04 Colombo et al. 05 Agarwal et al., 09 45 Frahm et al., 10

Current state of computer vision

S

n

av

e

l

y

e t a t

3D Reconstruction l.

, , 06 - • 3D shape recovery 08 • 3D scene reconstruction • Camera localization • Pose estimation

Levoy et al., 00 Golparvar-Fard, et al. JAEI 10 Lucas & Kanade, 81 Pandey et al. IFAC , 2010 Chen & Medioni, 92 Hartley & Zisserman, 00 Dellaert et al., 00 Pandey et al. ICRA 2011 Debevec et al., 96 Savarese et al. IJCV 05 Levoy & Hanrahan, 96 Rusinkiewic et al., 02 Nistér, 04 Savarese et al. IJCV 06 Fitzgibbon & Zisserman, 98 Microsoft’s PhotoSynth Triggs et al., 99 Brown & Lowe, 04 Schindler et al, 04 Snavely et al., 06-08 Pollefeys et al., 99 Schindler et al., 08 Kutulakos & Seitz, 99 Lourakis & Argyros, 04 Colombo et al. 05 Agarwal et al., 09 Frahm et al., 10 Current state of computer vision

2D Recognition • Object detection • Texture classification • Target tracking • Activity recognition

Turk & Pentland, 91 Argawal & Roth, 02 He et al. 06 Poggio et al., 93 Ramanan & Forsyth, 03 Gould et al. 08 Belhumeur et al., 97 Weber et al., 00 Maire et al. 08 LeCun et al. 98 Amit Vidal-Naquet & Ullman 02 Felzenszwalb et al., 08 and Geman, 99 Shi Fergus et al., 03 & Malik, 00 Viola & Kohli et al. 09 Torralba et al., 03 Jones, 00 L.-J. Li et al. 09 Vogel & Schiele, 03 Felzenszwalb & Huttenlocher 00 Ladicky et al. 10,11 Barnard et al., 03 Belongie & Malik, 02 Gonfaus et al. 10 Fei-Fei et al., 04 Ullman et al. 02 Farhadi et al., 09 Kumar & Hebert ‘04 Lampert et al., 09 47 Current state of computer vision

Building person car Tree car Car Person Car Bike

Street 2D Recognition • Object detection • Texture classification • Target tracking • Activity recognition

Turk & Pentland, 91 Argawal & Roth, 02 He et al. 06 Poggio et al., 93 Ramanan & Forsyth, 03 Gould et al. 08 Belhumeur et al., 97 Weber et al., 00 Maire et al. 08 LeCun et al. 98 Amit Vidal-Naquet & Ullman 02 Felzenszwalb et al., 08 and Geman, 99 Shi Fergus et al., 03 & Malik, 00 Viola & Kohli et al. 09 Torralba et al., 03 Jones, 00 L.-J. Li et al. 09 Vogel & Schiele, 03 Felzenszwalb & Huttenlocher 00 Ladicky et al. 10,11 Barnard et al., 03 Belongie & Malik, 02 Gonfaus et al. 10 Fei-Fei et al., 04 Ullman et al. 02 Farhadi et al., 09 Kumar & Hebert ‘04 Lampert et al., 09 Current state of computer vision

3D Reconstruction 2D Recognition • 3D shape recovery • Object detection • 3D scene reconstruction • Texture classification • Camera localization • Target tracking • Pose estimation • Activity recognition

Perceiving the World in 3D!

49 Course overview

1. Geometry 2. Semantics

Geometry: - How to extract 3d information? - Which cues are useful? - What are the mathematical tools? Camera systems Establish a mapping from 3D to 2D How to calibrate a camera Estimate camera parameters such as pose or focal length

? Single view metrology Estimate 3D properties of the world from a single image

? Single view metrology Estimate 3D properties of the world from a single image Multiple view geometry Estimate 3D properties of the world from multiple views Mathematical tools

Epipola r geometry

Toma si & Kana de (1993) Photoconsistency

Courtesy of Oxford Visua l Geometry Group Structure lighting and volumetric stereo

Scanning Michelangelo’s “The David” • The Digital Michelangelo Project - http://graphics.stanford.edu/projects/mich/ • 2 BILLION polygons, accuracy to .29mm Course overview

1. Geometry 2. Semantics

Semantics: - How to recognize objects? - How to classify images or understand a scene? - How to segment out critical semantics - How to estimate 3D properties (pose, size, shape…) Object recognition and categorization Classification: Is this an forest?

No! Classification: Does this image contain a building? [yes/no]

Yes! Detection: Does this image contain a car? [where?]

car Detection: Which objects do this image contain? [where?]

Building

clock

person car Detection: Accurate localization (segmentation)

clock Detection: Estimating 3D geometrical properties

Building 45 degree

Car, side view Person, back Challenges: viewpoint variation

slide credit: Fei-Fei, Fergus & Torralba Challenges: illumination

image credit: J. Koenderink Challenges: scale

slide credit: Fei-Fei, Fergus & Torralba Challenges: deformation Challenges: occlusion

Magritte, 1957 slide credit: Fei-Fei, Fergus & Torralba Challenges: background clutter

Kilmeny Niland. 1995 Challenges: object intra-class variation

slide credit: Fei-Fei, Fergus & Torralba A t.. . l , 11t'-:. • \J I .""' I • < Course overview

1. Geometry 2. Semantics

Joint recovery of geometry and semantics! Visua l processing in the bra in The ventral stream (also known as the "what pathway") is involved with object and visual where pathway identification and recognition. (dorsal stream)

V1

The dorsal stream (or, "where pathway") is involved what pathway with processing the object's (ventral stream) spatial location relative to the viewer and with speech78 repetition. Visua l processing in the bra in

where pathway (dorsal stream)

Pre-frontal V1 cortex

what pathway (ventral stream) 79 Joint reconstruction and recognition

Input images …

 Car  Person  Tree80  Sky  Street  Building  Else I n p u J t o im i a n g t e s

… r ec o ns t r uc ti on an d r ec o   Ca St gn r r e e t  P iti  e r B s o u n il on d  i n T g r e e  81 E  lse Sky “There was a table set out under a tree in front of the house, and the March Hare and the Hatter were having tea at it.”

“The table was a large one, but the three were all crowded together at one corner of it …”

From “A M a d Tea-Party” Alice's Adventures in W onderla nd by Lewis Ca rroll P r o L j e ec c 16 11 15 14 13 12 10 9 8 7 6 5 4 3 2 1 t t p u r r e ese n 3D 3D V S O I D Fi V Str M E Si C C I t n n c p a o am am is e b t n t t u en ti ti i l r r t u ual j g po u S O lti ect cla oduc o ec ons n c l metri e e cene e e b t g - t l r r t u vie o ar j a a vie R unde or and or ect a r ep R n e c m ti g w ali e d S w on unde c f ss eo ode r r c r e M g e y om st b ogni r met ifi s e st met c r e a e ll ogn a o D ca l a r t n s ti met r e e c m nd abu r s t t t on h s o ry o a t ion; ion o iti c a i t l i n ogy rip ng ti ion nd r on g y o II O t n i & T ng o L s / opi b r ea s s j S ect cla eg L r c A ning m M e n ss b t a y ifi t ion NNs ca t ion I

3D geometry Recognition