Visual Learning with Weak Supervision

Visual Learning with Weak Supervision Safa Cicek Committee: Prof. Stefano Soatto, Chair Prof. Lieven Vandenberghe Prof. Paulo Tabuada Prof. Guy Van den Broeck PHD Defense January 2021 UCLA ELECTRICAL AND COMPUTER ENGINEERING Table of Contents Introduction SaaS: Speed as a Supervisor for Semi-supervised Learning [1] Input and Weight Space Smoothing for Semi-supervised Learning [2] Unsupervised Domain Adaptation via Regularized Conditional Alignment [3] Disentangled Image Generation for Unsupervised Domain Adaptation [4] Spatial Class Distribution Shift in Unsupervised Domain Adaptation [5] Learning Topology from Synthetic Data for Unsupervised Depth Completion [6] Targeted Adversarial Perturbations for Monocular Depth Prediction [7] Concluding Remarks [1] Cicek, Safa, Alhussein Fawzi, and Stefano Soatto. Saas: Speed as a supervisor for semi-supervised learning. Proceedings of the European Conference on Computer Vision (ECCV). 2018. [2] Cicek, Safa, and Stefano Soatto. Input and Weight Space Smoothing for Semi-supervised Learning. Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops. 2019. [3] Cicek, Safa, and Stefano Soatto. Unsupervised domain adaptation via regularized conditional alignment. Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2019. [4] Cicek, Safa, Zhaowen Wang, Hailin Jin, Stefano Soatto, Generative Feature Disentangling for Unsupervised Domain Adaptation. Proceedings of the European Conference on Computer Vision (ECCV) Workshops. (2020). [5] Cicek, Safa, Ning Xu, Zhaowen Wang, Hailin Jin, Stefano Soatto, Spatial Class Distribution Shift in Unsupervised Domain Adaptation. Asian Conference on Computer Vision (ACCV). 2020. [6] Wong Alex, Safa Cicek, Stefano Soatto, Learning Topology from Synthetic Data for Unsupervised Depth Completion, IEEE Robotics and Automation Letters (RAL). 2021. [7] Wong Alex, Safa Cicek, Stefano Soatto, Targeted Adversarial Perturbations for Monocular Depth Prediction. Conference on Neural Information Processing Systems (NeurIPS). 2020. 2/129 Visual Perception [1] He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015. 3/129 Manual annotation is expensive. Image Classification [1] Siamese Cat French Bulldog [1] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009. 4/129 Manual annotation is expensive. Image Classification [1] Siamese Cat French Bulldog Semantic Segmentation [2] Image Segmentation map [1] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009. [2] Cordts, Marius, et al. "The cityscapes dataset for semantic urban scene understanding." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. 5/129 Manual annotation is expensive. Image Classification Sparse to Dense Depth Completion [1] Siamese Cat French Bulldog [3] Image Semantic Segmentation [2] Sparse Depth 100 Image Segmentation map Groundtruth 0 [1] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009. [2] Cordts, Marius, et al. "The cityscapes dataset for semantic urban scene understanding." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [3] J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, A. Geiger. Sparsity invariant cnns. 3DV 2017. 6/129 Unlabeled Real Data [1] [2] Image Image Image Sparse Depth [1] Cordts, Marius, et al. "The cityscapes dataset for semantic urban scene understanding." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [2] J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, A. Geiger. Sparsity invariant cnns. 3DV 2017. 7/129 Unlabeled Real Data + Labeled Virtual Data [2] [2] Image [1] Image Sparse Depth Segmentation map Groundtruth [1] Richter, Stephan R., et al. "Playing for data: Ground truth from computer games." European conference on computer vision. Springer, Cham, 2016. [2] Y. Cabon, N. Murray, M. Humenberger. Virtual KITTI 2. Preprint 2020. 8/129 Dependency of Unlabeled Data Labels and Model Parameters Discriminative supervised ● Shaded variables are fully observed. [1] Chapelle, Olivier, Bernhard Scholkopf, and Alexander Zien. "Semi-supervised learning (chapelle, o. et al., eds.; 2006)." IEEE Transactions on Neural Networks 20.3 (2009): 542-542. [2] Koller, Daphne, and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009. 9/129 Dependency of Unlabeled Data Labels and Model Parameters Discriminative supervised Discriminative unsupervised ● Shaded variables are fully observed. [1] Chapelle, Olivier, Bernhard Scholkopf, and Alexander Zien. "Semi-supervised learning (chapelle, o. et al., eds.; 2006)." IEEE Transactions on Neural Networks 20.3 (2009): 542-542. [2] Koller, Daphne, and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009. 10/129 Dependency of Unlabeled Data Labels and Model Parameters Discriminative Regularized Discriminative supervised Discriminative unsupervised ● Shaded variables are fully observed. [1] Chapelle, Olivier, Bernhard Scholkopf, and Alexander Zien. "Semi-supervised learning (chapelle, o. et al., eds.; 2006)." IEEE Transactions on Neural Networks 20.3 (2009): 542-542. [2] Koller, Daphne, and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009. 11/129 Max-margin (Cluster, Low-density) Assumption Data ● Large circles (4+4) are labeled samples. ● Small dots are unlabeled samples. 12/129 Max-margin (Cluster, Low-density) Assumption Data Learned Decision Boundaries ● Large circles (4+4) are labeled ● Without regularization, only samples. using labeled samples. ● Small dots are unlabeled samples. 13/129 Max-margin (Cluster, Low-density) Assumption Data Learned Decision Boundaries ● Large circles (4+4) are labeled ● Without regularization, only ● With regularization (e.g. VAT [1]), samples. using labeled samples. also using unlabeled samples. ● Small dots are unlabeled samples. [1] Miyato, T., Maeda, S.-i., Koyama, M., and Ishii, S. (2017). Virtual adversarial training: a regularization method for supervised and semi-supervised learning. arXiv preprint arXiv:1704.03976. 14/129 SaaS: Speed as a Supervisor for Semi-supervised Learning [1] Cicek, Safa, Alhussein Fawzi, and Stefano Soatto. Saas: Speed as a supervisor for semi-supervised learning. Proceedings of the European Conference on Computer Vision (ECCV). 2018. 15/129 Semi-supervised Learning Supervised Semi-supervised 16/129 SaaS: Speed as a Supervisor for Semi-supervised Learning 17/129 SaaS: Speed as a Supervisor for Semi-supervised Learning 18/129 SaaS ● Inner loop to measure ease of training for the current pseudo-labels. ● Outer loop to update the pseudo-labels. 19/129 SaaS ● Inner loop to measure ease of training for the current pseudo-labels. 20/129 Objective Function ● Cumulative loss: area under the loss curve up to a small number of epochs. 21/129 Degenerate Solutions to Cumulative Loss ● Supervision quality correlates with learning speed in expectation not in every realization. 22/129 Degenerate Solutions to Cumulative Loss ● Supervision quality correlates with learning speed in expectation not in every realization. ○ Posterior of label estimates should live in probability simplex. ○ Entropy minimization [1,2] ○ Cumulative loss should be small for augmented unlabeled data. [1] Grandvalet, Yves, and Yoshua Bengio. "Semi-supervised learning by entropy minimization." Advances in neural information processing systems. 2005. [2] Krause, Andreas, Pietro Perona, and Ryan G. Gomes. "Discriminative clustering by regularized information maximization." Advances in neural information processing systems. 2010. 23/129 Degenerate Solutions to Cumulative Loss ● Supervision quality correlates with learning speed in expectation not in every realization. ○ Posterior of label estimates should live in probability simplex. ○ Entropy minimization [1,2] ○ Cumulative loss should be small for augmented unlabeled data. ○ A strong network can fit to completely random labels [3]. ■ So, we measure the speed after a few epochs of training. This is not equivalent to our optimization. [1] Grandvalet, Yves, and Yoshua Bengio. "Semi-supervised learning by entropy minimization." Advances in neural information processing systems. 2005. [2] Krause, Andreas, Pietro Perona, and Ryan G. Gomes. "Discriminative clustering by regularized information maximization." Advances in neural information processing systems. 2010. [3] Zhang, Chiyuan, et al. "Understanding deep learning requires rethinking generalization." arXiv preprint arXiv:1611.03530 (2016). 24/129 SaaS ● Outer loop to update the pseudo-labels. 25/129 SaaS ● Learn the model weights from the final pseudo-labels. 26/129 Empirical Evaluations CIFAR10-4K SVHN-1K Error rate by supervised baseline on test data 17.64 ± 0.58 11.04 ± 0.50 Error rate by SaaS on unlabeled data 12.81 ± 0.08 6.22 ± 0.02 Error rate by SaaS on test data 10.94 ± 0.07 3.82 ± 0.09 ● Comparison to the baseline. 27/129 Empirical Evaluations ● The more unlabeled data the better generalization. 28/129 Empirical Evaluations ● M is the number of pseudo-label updates. ● SaaS finds labels training on which is faster. 29/129 Empirical Evaluations

Visual Learning with Weak Supervision

Generating Multi-Agent Trajectories Using Programmatic Weak

Constrained Labeling for Weakly Supervised Learning

Multi-Task Weak Supervision Enables Automated Abnormality Localization in Whole-Body FDG-PET/CT

Natural Language Processing and Weak Supervision

Towards a Perceptually Relevant Control Space

Self-Supervised Collaborative Learning for Medical Image Synthesis

Weakly Supervised Text Classification with Naturally Annotated Resource

Weakly Supervised Short Text Categorization Using World Knowledge

Strength from Weakness: Fast Learning Using Weak Supervision

Snuba: Automating Weak Supervision to Label Training Data

Multi-Modal Weakly Supervised Dense Event Captioning

A Weak-Supervision Method for Automating Training Set Creation in Multi-Domain Aspect Sentiment Classiﬁcation