Supervised, Semi, Weakly, Unsupervised) (30Mins

Learning Methods Supervised, Semi-supervised, Weakly-supervised, Unsupervised Learnings Hao Dong 2019, Peking University 1 Learning Methods • Supervised, Semi-supervised, Weakly-supervised, Unsupervised Learnings • Unsupervised Learning • Semi-supervised Learning • Weakly-supervised Learning • Summary 2 From Data Point of View: Supervised, Unsupervised, Semi-supervised, and Weakly- supervised Learning 3 From Data Point of View Data in both input � and output � Data in both input � and output � with known mapping (Learn the mapping �) (Learn the mapping �) � = �(�) � = �(�) Supervised Learning Unsupervised Learning • Image classification • Autoencoder • Object detection (when output is features) • … • GANs • … 4 Data in input only (Learn the self-mapping �) From Data Point of View Data in both input � and output � Data in both input � and output � with known partial mapping with known mapping for � (Learn the mapping �) (Learn the mapping � for another output �′) � = �(�) �′ = �(�) Semi-supervised Learning Weakly-supervised Learning • … • Learn segmentation via classification • … 5 From Data Point of View weakly-supervised learning 6 https://niug1984.github.io/paper/niu_tdlw2018.pdf Unsupervised Learning 7 Unsupervised Learning • Unsupervised learning is about problems where we don’t have labeled answers, such as clustering, dimensionality reduction, and anomaly detection. • Clustering: EM • Dimension Reduction: PCA • … 8 Unsupervised Learning • Autoencoder • In practice, it is difficult to obtain a large amount of labeled data, but it is easy to get a large amount of unlabeled data. • Learn a good feature extractor using unlabeled data and then learn the classifier using labeled data can improve the performance. 9 Unsupervised Learning • Autoencoder input layer hidden layer output layer Encoder Decoder � �# ! ! • The hidden units are usually less than the number of inputs • Dimension reduction --- Feature learning �" �1 �#" Given � data samples �# �2 �## ! 1 $ $ ' ℒ !"# = S �T − � ' � � �# � $ 3 $ $%& • It is trying to learn an approximation to the identity function � �4 �# % % so that the input is “compress” to the “compressed” features, discovering interesting structure about the data. �& �#& 10 Unsupervised Learning • Autoencoder input layer hidden layer • Autoencoder is an unsupervised learning method if we Encoder considered the features as the “output”. �! • Auto encoder is also a self-taught learning method which is a type of supervised learning where the training labels are �" �1 determined by the input data. �# �2 • Word2Vec is another unsupervised, self-taught learning extracted features example. � $ �3 Autoencoder for MNIST dataset (28×28×1, 784 pixels) � �% �4 �T �& 11 Unsupervised Learning • Sparse Autoencoder input layer hidden layer Encoder • Even when the number of hidden units is large �! Sigmoid (perhaps even greater than the number of input pixels), we can still discover interesting structure, �" �1 0.02 “inactive” by imposing other constraints on the network. • In particular, if we impose a ”‘sparsity”’ constraint �# �2 0.97 “active” on the hidden units, then the autoencoder will still discover interesting structure in the data, even if the number of hidden units is large. �$ �3 0.01 “inactive” �% �4 0.98 “active” �& 12 Unsupervised Learning • Sparse Autoencoder Given � data samples and Sigmoid activation function, the active ratio of a neuron ��: input layer hidden layer ! Encoder 1 � �g = S � ! � � � Sigmoid $%& The make the output “sparse”, we would like to �" �1 0.02 “inactive” enforce the following constraint, where � is a “sparsity parameter”, such as 0.2 (20% of the neurons) �# �2 0.97 “active” �g� = � The penalty term is as follow, where s is the � � 0.01 “inactive” $ 3 number of output neurons. * ℒ ( = ∑)%& ��(�||�g�) ( &,( � �4 * % 0.98 “active” = ∑)%&(�log + (1 − �)log ) (+! &,(+! The total loss: �& ℒ -.-/0 = ℒ !"# + ℒ ( 13 Unsupervised Learning • Sparse Autoencoder Smaller � == More sparse Autoencoders for MNIST dataset Input � Autoencoder �T Sparse Autoencoder �T 14 Unsupervised Learning • Sparse Autoencoder Visualizing the learned features One neuron == One feature extractor �! �" �1 �# �2 �$ �3 �% �4 �& 15 Unsupervised Learning • Sparse Autoencoder Method Hidden Reconstruction Loss Function Activation Activation Method 1 Sigmoid Sigmoid ℒ -.-/0 = ℒ !"# + ℒ ( Method 2 ReLU Softplus ℒ -.-/0 = ℒ !"# + � ℒ & on the hidden activation output 16 Unsupervised Learning • Denoising Autoencoder input layer hidden layer output layer Encoder Decoder �! �! �#! Applying dropout between the input and the first hidden layer �" �" �1 �#" • Improve the robustness �# �# �2 �## �$ �$ �3 �#$ �% �% �4 �#% �& �& �#& 17 Unsupervised Learning • Denoising Autoencoder Features of Sparse Autoencoder Denoising Autoencoder 18 Unsupervised Learning • Denoising Autoencoder Autoencoders for MNIST dataset Input � Autoencoder �T Sparse Autoencoder �T Denoising Autoencoder �T 19 Unsupervised Learning • Stacked Autoencoder input hidden 1 output Encoder Decoder Unsupervised �! �#! ! �" �! �#" The feature extractor for the input data ! �# �" �## ! �$ �# �#$ ! �% �$ �#% Red lines indicate the trainable weights � �# & & Black lines indicate the fixed/nontrainable weights 20 Unsupervised Learning • Stacked Autoencoder input hidden 1 hidden 2 output Encoder Encoder Decoder Unsupervised �! �#! ! " �" �! �! �#" The feature extractor for the first feature extractor ! " �# �" �" �## ! " �$ �# �# �#$ ! " �% �$ �$ �#% Red lines indicate the trainable weights � �# & & Black lines indicate the fixed/nontrainable weights 21 Unsupervised Learning • Stacked Autoencoder input hidden 1 hidden 2 hidden 3 output Encoder Encoder Encoder Decoder Unsupervised �! �#! ! " # �" �! �! �! �#" The feature extractor for the second feature extractor ! " # �# �" �" �" �## ! " # �$ �# �# �# �#$ ! " # �% �$ �$ �$ �#% Red lines indicate the trainable weights � �# & & Black lines indicate the fixed/nontrainable weights 22 Unsupervised Learning • Stacked Autoencoder input hidden 1 hidden 2 hidden 3 hidden 4 output Encoder Encoder Encoder Encoder Decoder Unsupervised �! �#! ! " # $ The feature extractor for the third feature extractor �" �! �! �! �! �#" ! " # $ �# �" �" �" �" �## ! " # $ �$ �# �# �# �# �#$ ! " # % �% �$ �$ �$ �$ �#% Red lines indicate the trainable weights � �# & & Black lines indicate the fixed/nontrainable weights 23 Unsupervised Learning • Stacked Autoencoder input hidden 1 hidden 2 hidden 3 hidden 4 output Supervised �! Learn to classify the input data by using the labels and high-level features ! " # $ �" �! �! �! �! ! " # $ �# �" �" �" �" % �! ! " # $ �$ �# �# �# �# ! " # $ �% �$ �$ �$ �$ Red lines indicate the trainable weights � & Black lines indicate the fixed/nontrainable weights 24 Unsupervised Learning • Stacked Autoencoder input hidden 1 hidden 2 hidden 3 hidden 4 output Supervised �! ! " # $ �" �! �! �! �! Fine-tune the entire model for classification ! " # $ �# �" �" �" �" % �! ! " # $ �$ �# �# �# �# ! " # $ �% �$ �$ �$ �$ Red lines indicate the trainable weights � & Black lines indicate the fixed/nontrainable weights 25 Unsupervised Learning • Variational Autoencoder, VAE input layer hidden layer output layer Encoder Decoder Can the hidden output be a prior distribution, e.g., Normal distribution? �! �#! Decoder �#! �" �1 �#" �1 �#" �# �2 �## �2 �## If yes, we can have a Generator that �$ �3 �#$ �(0, 1) can map �(0, 1) to data space �3 �#$ �% �4 �#% �4 �#% �& �#& �#& 26 Unsupervised Learning • Variational Autoencoder, VAE Data Space Latent Space � �(0, 1) Bidirectional Mapping 27 The penalty term is as follow: ℒ ( = ��(�(0, 1) ||�g) Unsupervised Learning • Generative Adversarial Network, GAN � fake �(0, 1) � �T real � Normal/uniform distribuon min max � �, � = ��~5 log �(�) + ��~5 log(1 − �(� � ) 1 2 '()( * Unidirectional Mapping ℒ 2 = − ��~5'()( log �(�) − ��~5* log(1 − �(� � ) ℒ 1 = − ��~5* log �(� � ) 28 Latent space Data space � � � � � � � � � � Semi-supervised Learning 29 Semi-supervised Learning • Motivation: • Unlabeled data is easy to be obtained • Labeled data can be hard to get • Goal: • Semi-supervised learning mixes labeled and labeled data to produce better models. • vs. Transductive Learning: • Semi-supervised learning is eventually applied to the testing data • Transductive learning is only related to the unlabelled data 30 Semi-supervised Learning • Unlabeled data can help Unlabeled data can help to find a better boundary �' �' �& �& 31 Semi-supervised Learning • Pseudo-Labelling 32 Semi-supervised Learning • Generative Methods • EM with some labelled data Cluster and then label 33 Semi-supervised Learning • GeneraEve Methods Unlabeled data may hurt the learning Wrong Correct 34 Multi-variables Gaussian model Semi-supervised Learning • Graph-based Methods 1. De~ine the similarity s(�7, �)) 2. Add edges 1. KNN 2. e-Neighborhood 3. Edge weight is proportional to s(�7, �)) 4. Propagate through the graph 35 Semi-supervised Learning • Low-density separation • Semi-supervised SVM (S3VM) == Transductive SVM (TSVM) 36 Weakly-supervised Learning 39 Weakly-supervised Learning • Weakly supervised learning is a machine learning framework where the model is trained using examples that are only partially annotated or labeled. 40 Weakly-supervised Learning • Attention CycleGAN: Learn the segmentation via synthesis 41 Unsupervised Attention-guided Image-to-Image Translation. Mejjati, Y. A., Richardt, C., & Cosker, D. NIPS, 2018 Weakly-supervised Learning • Semantic Image Synthesis: Learn the segmentation via synthesis �4 = �(�,

Load more