Zero-Shot Learning Problem Definition Semantic Labels Visual Feature Space
Total Page:16
File Type:pdf, Size:1020Kb
Semi-supervised Yanwei Fu Vocabulary- Leonid Sigal informed Learning Research Supervised Learning Problem Definition Supervised Learning Problem Definition Visual feature space x1 x2 Supervised Learning Problem Definition Semantic labels Visual feature space x1 airplane car unicycle tricycle x2 Supervised Learning Learning Semantic labels Visual feature space x1 airplane car unicycle tricycle x2 Supervised Learning Learning Semantic labels Visual feature space x1 airplane car unicycle tricycle x2 Supervised Learning Learning Semantic labels Visual feature space x1 airplane car unicycle tricycle x2 Supervised Learning Learning Semantic labels Visual feature space x1 airplane car unicycle tricycle x2 Supervised Learning Inference Semantic labels Visual feature space x1 airplane car unicycle tricycle x2 Supervised Learning Inference Semantic labels Visual feature space x1 airplane car unicycle tricycle x2 Supervised Learning Inference Semantic labels Visual feature space x1 airplane car unicycle tricycle x2 Zero-shot Learning Problem Definition Semantic labels Visual feature space x We do not have any visually labeled 1 instances of what these look like bicycle truck x2 Zero-shot Learning Learning Semantic label space Visual feature space x1 v1 airplane unicycle bicycle tricycle v2 car v3 truck x2 Zero-shot Learning Learning f(x) Semantic label space Visual feature space x1 v1 Word vector-feature category mapping: airplane v = f(x) unicycle bicycle tricycle v2 car v3 truck x2 Zero-shot Learning Inference f(x) Semantic label space Visual feature space x1 v1 airplane unicycle bicycle tricycle v2 Word vector-feature category mapping: v = f(x) car v3 truck x2 Zero-shot Learning Inference f(x) Semantic label space Visual feature space x1 v1 airplane unicycle bicycle tricycle v2 Word vector-feature category mapping: v = f(x) car v3 truck f( ) x2 Zero-shot Learning Inference f(x) Semantic label space Visual feature space x1 v1 airplane unicycle bicycle tricycle v2 Word vector-feature category mapping: v = f(x) car v3 truck f( ) x2 Zero-shot Learning Inference f(x) Semantic label space Visual feature space x1 v1 airplane unicycle bicycle tricycle v2 Word vector-feature category mapping: v = f(x) car v3 truck f( ) x2 Zero-shot Learning Inference f(x) Semantic label space Visual feature space x1 v1 airplane unicycle bicycle tricycle v2 Word vector-feature category mapping: v = f(x) car v3 truck f( ) x2 Key Question: How do we define semantic space? Semantic Label Vector Spaces Spaces Type Advantages Disadvantages Manual annotation Good interpretability of each Semantic Supervised dimension: Attributes Limited vocabulary airplane := fixed wing, propelled, has pilot Semantic Word Good vector representation for Limited interpretability of Vectors Unsupervised millions of vocabulary each dimension (e.g. word2vec) v (Berlin) v (Germany) = v (Paris) v (France) − − Semantic Label Vector Spaces Spaces Type Advantages Disadvantages Manual annotation Good interpretability of each Semantic Supervised dimension: Attributes Limited vocabulary airplane := fixed wing, propelled, has pilot Semantic Word Good vector representation for Limited interpretability of Vectors Unsupervised millions of vocabulary each dimension (e.g. word2vec) v (Berlin) v (Germany) = v (Paris) v (France) − − Zero-shot Learning Inference f(x) Semantic label space Visual feature space x1 v1 airplane unicycle bicycle tricycle v2 car v3 truck f( ) x2 Open-set Recognition Problem Definition f(x) Semantic label space Visual feature space x1 v1 airplane unicycle bicycle tricycle v2 car v3 truck f( ) x2 Open-set Recognition Problem Definition f(x) Semantic label space Visual feature space copilot flight learjet x jet 1 front-landing- v1 crop_duster tailwhip dirigible gear pedaling wheelie airplane biplane pogo_stick roller-skate twin-engine- helicopter unicycle aircraft tri-motor jump_rope horse cock- velomobile segway bicyclecargo_bikes tricycle v2 penny-farthing motor-scooter mopped trike motorcycle recumbent- motorbike car bicycle suv mini-van v3 tow-truck flat- truck tyres f( ) x2 Summary: Supervised Learning: [Fei-Fei et al. TPAMI’06], [Deng et al. ECCV’14], [Torralba et al. TPAMI’08 ], [Weston et al. IJCAI’11] Pros: Very good quantitative performance Cons: Relatively small vocabulary (~1,000 classes) Requires manual labeling of all the data Summary: Supervised Learning: [Fei-Fei et al. TPAMI’06], [Deng et al. ECCV’14], [Torralba et al. TPAMI’08 ], [Weston et al. IJCAI’11] Pros: Very good quantitative performance Cons: Relatively small vocabulary (~1,000 classes) Requires manual labeling of all the data Zero-shot Learning: [Palatucci et al. NIPS’09],[Lampert et al. CVPR’09], [Farhadi et al. CVPR’09], [Rohrbach et al. CVPR’10] Pros: Does not require instance labeling for target classes Cons: Typically limited to recognition with target classes only Relatively small vocabulary (~50-200 classes typically) Summary: Supervised Learning: [Fei-Fei et al. TPAMI’06], [Deng et al. ECCV’14], [Torralba et al. TPAMI’08 ], [Weston et al. IJCAI’11] Pros: Very good quantitative performance Cons: Relatively small vocabulary (~1,000 classes) Requires manual labeling of all the data Zero-shot Learning: [Palatucci et al. NIPS’09],[Lampert et al. CVPR’09], [Farhadi et al. CVPR’09], [Rohrbach et al. CVPR’10] Pros: Does not require instance labeling for target classes Cons: Typically limited to recognition with target classes only Relatively small vocabulary (~50-200 classes typically) Open-set Learning: [Scheirer et al. TPAMI’13], [Sattar et al. CVPR’15], [Bendale et al. CVPR’15] [Guadarrama et al. RSS’14] Pros: Does not require instance labeling for target classes Large vocabulary (up to 310K classes) Summary: Supervised Learning: [Fei-Fei et al. TPAMI’06], [Deng et al. ECCV’14], [Torralba et al. TPAMI’08 ], [Weston et al. IJCAI’11] Pros: Very good quantitative performance Cons: Relatively small vocabulary (~1,000 classes) Requires manual labeling of all the data Zero-shot Learning: [Palatucci et al. NIPS’09],[Lampert et al. CVPR’09], [Farhadi et al. CVPR’09], [Rohrbach et al. CVPR’10] Pros: Does not require instance labeling for target classes Cons: Typically limited to recognition with target classes only Relatively small vocabulary (~50-200 classes typically) Open-set Learning: [Scheirer et al. TPAMI’13], [Sattar et al. CVPR’15], [Bendale et al. CVPR’15] [Guadarrama et al. RSS’14] Pros: Does not require instance labeling for target classes Large vocabulary (up to 310K classes) Key Question: Can this large vocabulary be actually useful for recognition? Vocabulary-Informed Recognition v1 unicycle tricycle v2 v3 Vocabulary-Informed Recognition v1 f( Image) unicycle tricycle v2 v3 Vocabulary-Informed Recognition v1 f( Image) unicycle tricycle v2 v3 Vocabulary-Informed Recognition v1 f( Image) unicycle segway bicycle recumbent- cargo_bikes tricycle bicycle v2 farthing penny- v3 Vocabulary-Informed Recognition v1 f( Image) unicycle segway bicycle recumbent- cargo_bikes tricycle bicycle v2 farthing penny- v3 Formulation Semi-Supervised Vocabulary-Informed Learning (SS-Voc) Regression term Semantic label space f(x)=fW(x) Visual feature space copilot flight learjet x jet 1 front-landing- v1 crop_duster tailwhip dirigible gear pedaling wheelie airplane biplane pogo_stick roller-skate twin-engine- helicopter unicycle aircraft tri-motor jump_rope horse cock- velomobile segway cargo_bikes tricycle v2 penny-farthing motor-scooter mopped trike motorcycle recumbent- motorbike car bicycle suv mini-van tow-truck v3 flat-tyres x2 Semi-Supervised Vocabulary-Informed Learning (SS-Voc) Regression term Semantic label space f(x)=fW(x) Visual feature space copilot flight learjet x jet 1 front-landing- v1 crop_duster tailwhip dirigible gear pedaling wheelie airplane biplane pogo_stick roller-skate twin-engine- helicopter unicycle aircraft tri-motor jump_rope horse cock- velomobile segway cargo_bikes tricycle v2 penny-farthing motor-scooter mopped trike motorcycle recumbent- motorbike car bicycle suv mini-van tow-truck v3 flat-tyres x2 ✏(f( ,W),uairplane ) L X Semi-Supervised Vocabulary-Informed Learning (SS-Voc) Regression term Semantic label space f(x)=fW(x) Visual feature space x1 v1 airplane unicycle tricycle v2 car v3 x2 ✏(f( ,W),uairplane ) L X Semi-Supervised Vocabulary-Informed Learning (SS-Voc) Regression term Semantic label space f(x)=fW(x) Visual feature space x1 v1 airplane unicycle tricycle v2 car v3 x2 ✏(f( ,W),uairplane ) L X Semi-Supervised Vocabulary-Informed Learning (SS-Voc) Max-margin term Semantic label space f(x)=fW(x) Visual feature space copilot flight learjet jet front-landing- crop_duster x1 v1 dirigible gear tailwhip pedaling wheelie airplane pogo_stick roller-skate biplane twin-engine- helicopter unicycle aircraft tri-motor jump_rope horse cock- velomobile segway cargo_bikes tricycle v2 penny-farthing motor-scooter mopped trike motorcycle recumbent- motorbike car bicycle suv mini-van tow-truck v3 flat-tyres x 1 1 2 [C (f( ),uairplane)+ ✏(f( ),uhelicopter)] − 2L✏ 2L X Semi-Supervised Vocabulary-Informed Learning (SS-Voc) Max-margin term Semantic label space f(x)=fW(x) Visual feature space copilot flight learjet jet front-landing- crop_duster x1 v1 dirigible gear tailwhip pedaling wheelie airplane pogo_stick roller-skate biplane twin-engine- helicopter unicycle aircraft tri-motor jump_rope horse cock- velomobile segway cargo_bikes tricycle v2 penny-farthing motor-scooter mopped trike motorcycle recumbent- motorbike car bicycle suv mini-van tow-truck v3 flat-tyres x 1 1 2 [C (f( ),uairplane)+ ✏(f( ),uhelicopter)]