Learning Methods Supervised, Semi-supervised, Weakly-supervised, Unsupervised Learnings

Hao Dong

2019, Peking University

1 Learning Methods

• Supervised, Semi-supervised, Weakly-supervised, Unsupervised Learnings

• Semi-

• Weakly-supervised Learning

• Summary

2 From Data Point of View: Supervised, Unsupervised, Semi-supervised, and Weakly- supervised Learning

3 From Data Point of View

Data in both input � and output � Data in both input � and output � with known mapping (Learn the mapping �) (Learn the mapping �)

� = �(�) � = �(�) Supervised Learning Unsupervised Learning • Image classification • • Object detection (when output is features) • … • GANs • … 4

Data in input only (Learn the self-mapping �) From Data Point of View

Data in both input � and output � Data in both input � and output � with known partial mapping with known mapping for � (Learn the mapping �) (Learn the mapping � for another output �′)

� = �(�) �′ = �(�) Semi-supervised Learning Weakly-supervised Learning • … • Learn segmentation via classification • …

5 From Data Point of View

weakly-supervised learning

6 https://niug1984.github.io/paper/niu_tdlw2018.pdf Unsupervised Learning

7 Unsupervised Learning

• Unsupervised learning is about problems where we don’t have labeled answers, such as clustering, , and .

• Clustering: EM

• Dimension Reduction: PCA

• …

8 Unsupervised Learning • Autoencoder

• In practice, it is difficult to obtain a large amount of , but it is easy to get a large amount of unlabeled data.

• Learn a good feature extractor using unlabeled data and then learn the classifier using labeled data can improve the performance.

9 Unsupervised Learning • Autoencoder input layer hidden layer output layer Encoder Decoder � � ! ! • The hidden units are usually less than the number of inputs • Dimension reduction ---

�" �1 �" Given � data samples

�# �2 �# 1 ℒ = � − � � � � � $ 3 $

• It is trying to learn an approximation to the identity function � �4 � % % so that the input is “compress” to the “compressed” features, discovering interesting structure about the data.

�& �&

10 Unsupervised Learning • Autoencoder input layer hidden layer • Autoencoder is an unsupervised learning method if we Encoder considered the features as the “output”. �! • Auto encoder is also a self-taught learning method which is a type of supervised learning where the training labels are �" �1 determined by the input data.

�# �2 • is another unsupervised, self-taught learning extracted features example.

� $ �3 Autoencoder for MNIST dataset (28×28×1, 784 pixels)

� �% �4 �

�&

11 Unsupervised Learning • Sparse Autoencoder input layer hidden layer Encoder • Even when the number of hidden units is large �! Sigmoid (perhaps even greater than the number of input pixels), we can still discover interesting structure,

�" �1 0.02 “inactive” by imposing other constraints on the network.

• In particular, if we impose a ”‘sparsity”’ constraint �# �2 0.97 “active” on the hidden units, then the autoencoder will still discover interesting structure in the data, even if the number of hidden units is large. �$ �3 0.01 “inactive”

�% �4 0.98 “active”

�&

12 Unsupervised Learning • Sparse Autoencoder Given � data samples and Sigmoid , the active ratio of a neuron ��: input layer hidden layer Encoder 1 � � = � ! � � � Sigmoid The make the output “sparse”, we would like to �" �1 0.02 “inactive” enforce the following constraint, where � is a “sparsity parameter”, such as 0.2 (20% of the neurons) �# �2 0.97 “active” �� = � The penalty term is as follow, where s is the � � 0.01 “inactive” $ 3 number of output neurons. ℒ = ∑ ��(�||��) � �4 % 0.98 “active” = ∑(�log + (1 − �)log ) ! ! The total loss:

�& ℒ = ℒ + ℒ 13 Unsupervised Learning

• Sparse Autoencoder Smaller � == More sparse

Autoencoders for MNIST dataset

Input �

Autoencoder �

Sparse Autoencoder � 14 Unsupervised Learning • Sparse Autoencoder Visualizing the learned features One neuron == One feature extractor

�!

�" �1

�# �2

�$ �3

�% �4

�&

15 Unsupervised Learning • Sparse Autoencoder

Method Hidden Reconstruction Activation Activation

Method 1 Sigmoid Sigmoid ℒ = ℒ + ℒ

Method 2 ReLU Softplus ℒ = ℒ + �

ℒ on the hidden activation output

16 Unsupervised Learning • Denoising Autoencoder input layer hidden layer output layer Encoder Decoder �! �! �! Applying dropout between the input and the first hidden layer

�" �" �1 �" • Improve the robustness

�# �# �2 �#

�$ �$ �3 �$

�% �% �4 �%

�& �& �&

17 Unsupervised Learning • Denoising Autoencoder

Features of Sparse Autoencoder Denoising Autoencoder

18 Unsupervised Learning • Denoising Autoencoder

Autoencoders for MNIST dataset

Input �

Autoencoder �

Sparse Autoencoder �

Denoising Autoencoder �

19 Unsupervised Learning • Stacked Autoencoder input hidden 1 output Encoder Decoder Unsupervised �! �!

! �" �! �" The feature extractor for the input data

! �# �" �#

! �$ �# �$

! �% �$ �%

Red lines indicate the trainable weights � � & & Black lines indicate the fixed/nontrainable weights 20 Unsupervised Learning • Stacked Autoencoder input hidden 1 hidden 2 output Encoder Encoder Decoder Unsupervised �! �!

! " �" �! �! �" The feature extractor for the first feature extractor

! " �# �" �" �#

! " �$ �# �# �$

! " �% �$ �$ �%

Red lines indicate the trainable weights � � & & Black lines indicate the fixed/nontrainable weights 21 Unsupervised Learning • Stacked Autoencoder input hidden 1 hidden 2 hidden 3 output Encoder Encoder Encoder Decoder Unsupervised �! �!

! " # �" �! �! �! �" The feature extractor for the second feature extractor

! " # �# �" �" �" �#

! " # �$ �# �# �# �$

! " # �% �$ �$ �$ �%

Red lines indicate the trainable weights � � & & Black lines indicate the fixed/nontrainable weights 22 Unsupervised Learning • Stacked Autoencoder input hidden 1 hidden 2 hidden 3 hidden 4 output Encoder Encoder Encoder Encoder Decoder Unsupervised �! �!

! " # $ The feature extractor for the third feature extractor �" �! �! �! �! �"

! " # $ �# �" �" �" �" �#

! " # $ �$ �# �# �# �# �$

! " # % �% �$ �$ �$ �$ �%

Red lines indicate the trainable weights � � & & Black lines indicate the fixed/nontrainable weights 23 Unsupervised Learning • Stacked Autoencoder input hidden 1 hidden 2 hidden 3 hidden 4 output Supervised �! Learn to classify the input data by using the labels and high-level features ! " # $ �" �! �! �! �!

! " # $ �# �" �" �" �"

% �!

! " # $ �$ �# �# �# �#

! " # $ �% �$ �$ �$ �$

Red lines indicate the trainable weights � & Black lines indicate the fixed/nontrainable weights 24 Unsupervised Learning • Stacked Autoencoder input hidden 1 hidden 2 hidden 3 hidden 4 output Supervised �!

! " # $ �" �! �! �! �! Fine-tune the entire model for classification

! " # $ �# �" �" �" �"

% �!

! " # $ �$ �# �# �# �#

! " # $ �% �$ �$ �$ �$

Red lines indicate the trainable weights � & Black lines indicate the fixed/nontrainable weights 25 Unsupervised Learning • , VAE input layer hidden layer output layer Encoder Decoder Can the hidden output be a prior distribution, e.g., Normal distribution? �! �!

Decoder �!

�" �1 �"

�1 �"

�# �2 �#

�2 �# If yes, we can have a Generator that

�$ �3 �$ �(0, 1) can map �(0, 1) to data space

�3 �$

�% �4 �%

�4 �%

�& �&

�& 26 Unsupervised Learning • Variational Autoencoder, VAE

Data Space Latent Space � �(0, 1)

Bidirectional Mapping

27

The penalty term is as follow:

ℒ = ��(�(0, 1) ||�) Unsupervised Learning • Generative Adversarial Network, GAN

� fake �(0, 1) � � real

� Normal/uniform distribuon

min max � �, � = ��~ log �(�) + ��~ log(1 − �(� � ) '()( * Unidirectional Mapping

ℒ = − ��~'()( log �(�) − ��~* log(1 − �(� � )

ℒ = − ��~* log �(� � ) 28

Latent space Data space � � � � � � � � � � Semi-supervised Learning

29 Semi-supervised Learning

• Motivation: • Unlabeled data is easy to be obtained • Labeled data can be hard to get

• Goal: • Semi-supervised learning mixes labeled and labeled data to produce better models.

• vs. Transductive Learning: • Semi-supervised learning is eventually applied to the testing data • Transductive learning is only related to the unlabelled data

30 Semi-supervised Learning • Unlabeled data can help

Unlabeled data can help to find a better boundary

� �

� �

31 Semi-supervised Learning • Pseudo-Labelling

32 Semi-supervised Learning • Generative Methods • EM with some labelled data Cluster and then label

33 Semi-supervised Learning • Generave Methods

Unlabeled data may hurt the learning

Wrong Correct

34 Multi-variables Gaussian model Semi-supervised Learning • Graph-based Methods

1. Deine the similarity s(�, �) 2. Add edges 1. KNN 2. e-Neighborhood 3. Edge weight is proportional to s(�, �) 4. Propagate through the graph

35 Semi-supervised Learning • Low-density separation • Semi-supervised SVM (S3VM) == Transductive SVM (TSVM)

36 Weakly-supervised Learning

39 Weakly-supervised Learning

• Weakly supervised learning is a framework where the model is trained using examples that are only partially annotated or labeled.

40 Weakly-supervised Learning • Attention CycleGAN: Learn the segmentation via synthesis

41 Unsupervised Attention-guided Image-to-Image Translation. Mejjati, Y. A., Richardt, C., & Cosker, D. NIPS, 2018 Weakly-supervised Learning • Semantic Image Synthesis: Learn the segmentation via synthesis � = �(�, φ �̅ )

A bird with red head and breast. A bird with red head and breast. φ �̅ φ �̅ � �(�, φ �̅ )

Residual Transformation Unit

Encoder Decoder Discriminator

Generator

64x64 with VGG 256x256 Difference 64x64 with VGG 256x256 Difference

A black bird with The petals are purple a red head. with no visible stamens.

This small yellow The petals are white bird has grey and the stamens are wings, and a black yellow. bill. 42 Semantic Image Synthesis via Adversarial Learning. Dong, H., Yu, S., Wu, C., Guo, Y. 2017. ICCV Weakly-supervised Learning • More and more …

43 Summary

44 Learning Methods • Supervised, Semi-supervised, Weakly-supervised, Unsupervised Learnings

• Unsupervised Learning

• Semi-supervised Learning

• Weakly-supervised Learning

45 Learning Methods

• Exercise 1: • Implement Sparse Autoencoder on MNIST and visualize the learned features.

• Exercise 2: • Explain Variational Autoencoder in mathematical way • Implement it on MNIST (Optional)

• Exercise 3: (Optional) • Choice an application and implement it

Link: https://github.com/zsdonghao/deep-learning-note/

47

Link: https://github.com/tensorlayer/tensorlayer Questions?

48