Application to Transductive Semi-Supervised Learning
Total Page:16
File Type:pdf, Size:1020Kb
Learning Deep Representation with limited labels: application to transductive semi-supervised learning Dino Ienco Research Scientist, IRSTEA – UMR TETIS - Montpellier, [email protected] !1 Outline: Introduction, Motivation & Settings A recap on Supervised/Unsupervised Deep Learning Knowledge-aware data embedding (KADE) Learning with KADE Semi-Supervised Clustering Semi-Supervised Classification Conclusions and Trends !2 Introduction In many real world domains we need decisions: - Categorize an item into a category - Organize or structure information according to background knowledge !3 Introduction In many real world domains we need decisions: - Categorize an item into a category - Organize or structure information according to background knowledge Categorize Decide if a mail is SPAM or not Decide the category of an image Decide if a document is about politic or sport Decide if a patient is sick or not !3 Introduction In many real world domains we need decisions: - Categorize an item into a category - Organize or structure information according to background knowledge Categorize Organize Decide if a mail is SPAM or not Retrieved web queries Decide the category of an image Organize Omics data to recover gene/ protein families Decide if a document is about politic or sport Re-organizing textual or image archives given user preferences Decide if a patient is sick or not !3 Introduction - Each decision requires time to be taken - Each decision can also be expensive to take !4 Introduction - Each decision requires time to be taken - Each decision can also be expensive to take Cost and Time can be prohibitive when huge volume of data is considered !4 Introduction Machine Learning can do it for us Build Supervised predictive models to categorize/organize: • Automatically • Quickly !5 Introduction Machine Learning can do it for us Build Supervised predictive models to categorize/organize: • Automatically • Quickly but… no free lunch Supervised Machine Learning requires data label to train models !5 Settings General Machine Learning Scenario: • Training Data: n Unlabeled data (Xi, ?) i=1 { n } Labeled data (Xi,Yi) i=1 { } m n Both Labeled and Unlabeled data (Xi, ?) (Xi,Yi) { }i=1 [{ }i=m+1 • Test Data: Unlabeled new Data !6 Settings General Machine Learning Scenario: • Training Data: n Unlabeled data (Xi, ?) i=1 { n } Labeled data (Xi,Yi) i=1 { } m n Both Labeled and Unlabeled data (Xi, ?) (Xi,Yi) { }i=1 [{ }i=m+1 • Test Data: Unlabeled new Data Training data: employed to learn the Predictive Model Test data: new examples that need to be classified !6 Settings Training Data !7 Settings Training Data Unlabeled data !7 Settings Training Data Unlabeled data Supervised Learning !7 Settings Training Data Unlabeled data Supervised Learning Semi-Supervised Learning !7 Settings Training Data Unlabeled data Supervised Learning Semi-Supervised Learning Supervised Learning only exploits labelled data to build its model Semi-Supervised Learning (SSL) exploits both labeled and unlabeled data to build the model !7 Settings If we consider label information and availability at training time, we can have the following scenarios: Labeled data at Labeled and unlabelled Labeled and test data training time and test data available at training available at training data available later time and test data time and no new data available later available later Inductive Supervised Inductive Semi-Supervised Transductive Semi-Supervised Setting Setting Setting !8 Settings If we consider label information and availability at training time, we can have the following scenarios: Labeled data at Labeled and unlabelled Labeled and test data training time and test data available at training available at training data available later time and test data time and no new data available later available later Inductive Supervised Inductive Semi-Supervised Transductive Semi-Supervised Setting Setting Setting Semi-Supervised Semi-Supervised Clustering Classification Constraints-guided Label-guided !8 Settings If we consider label information and availability at training time, we can have the following scenarios: Labeled data at Labeled and unlabelled Labeled and test data training time and test data available at training available at training data available later time and test data time and no new data available later available later Inductive Supervised Inductive Semi-Supervised Transductive Semi-Supervised Setting Setting Setting Semi-Supervised Semi-Supervised In SS transductive learning Clustering Classification the result of the process is the decision and not the model Constraints-guided Label-guided !8 Recap on DL Today, when we talk about (supervised) Machine Machine Learning, we can not avoid to talk about Deep Learning Learning Deep Learning !9 Recap on DL Today, when we talk about (supervised) Machine Machine Learning, we can not avoid to talk about Deep Learning Learning Deep Deep Learning, nowadays, is used in many domains: Learning Computer Vision Natural Language Processing (NLP) and Speech Robotics and AI Music and the arts! !9 Recap on DL Traditional Machine Learning systems leverage feature engineering to represent the data: - Text Analysis: Bag of Words - Image Analysis: Hog (Histogram of Oriented gradient), SIFT (Scale Invariant Feature Transform) Simple Trainable Hand-Crafted Classifier Features (SVM, RF, NB, …) !10 Recap on DL Traditional Machine Learning systems leverage feature engineering to represent the data: - Text Analysis: Bag of Words - Image Analysis: Hog (Histogram of Oriented gradient), SIFT (Scale Invariant Feature Transform) Simple Trainable Hand-Crafted Classifier Features (SVM, RF, NB, …) Deep Learning approaches learn internal representations (new features) without necessity of hand-crafted features Trainable Feature Trainable Extractor !10 Deep Learning Recap on DL Deep Learning involves neural networks architectures: - Multi-Layer Perceptron - Convolutional and Recurrent Neural Networks - Autoencoder - Transformers - … !11 Recap on DL Deep Learning involves neural networks architectures: - Multi-Layer Perceptron - Convolutional and Recurrent Neural Networks - Autoencoder - Transformers - … … mainly oriented to Supervised Tasks: - Image Classification & Semantic Segmentation - Document Classification & Machine Translation - Speech Recognition !11 Recap on DL Deep Learning involves neural networks architectures: - Multi-Layer Perceptron - Convolutional and Recurrent Neural Networks - Autoencoder - Transformers - … … mainly oriented to Supervised Tasks: - Image Classification & Semantic Segmentation - Document Classification & Machine Translation - Speech Recognition … and few approaches exist for unsupervised tasks (i.e. clustering) !11 Recap on DL Deep Learning involves neural networks architectures: - Multi-Layer Perceptron - Convolutional and Recurrent Neural Networks - Autoencoder - Transformers - … … mainly oriented to Supervised Tasks: - Image Classification & Semantic Segmentation - Document Classification & Machine Translation - Speech Recognition … and few approaches exist for unsupervised tasks (i.e. clustering) … and really few methods deal with semi-supervised tasks (i.e. semi-supervised clustering or semi-supervised classifications) !11 Recap on DL In the Supervised setting Features / Representation Encoder Classification Output Data Layer The encoder can be: - A Multi-Layer Perceptron - A Convolutional NN - A Recurrent NN !12 Recap on DL In the Unsupervised setting, usually, the task modelled is the signal reconstruction Features / Representation Encoder Decoder Bottleneck Signal Reconstruction The encoder and decoder can be: - A Multi-Layer Perceptron - A Convolutional NN - A Recurrent NN !13 Knowledge-Aware Data Embedding In the (Transductive) Semi-Supervised setting we dispose of: A big amount of unlabelled data (X , ?) m (X ,Y ) n - { i }i=1 [{ i i }i=m+1 - A really small amount of labeled information The goal is to exploit the small amount of knowledge (labeled information) to guide the learning process (classification or clustering) !14 Knowledge-Aware Data Embedding In the (Transductive) Semi-Supervised setting we dispose of: A big amount of unlabelled data (X , ?) m (X ,Y ) n - { i }i=1 [{ i i }i=m+1 - A really small amount of labeled information The goal is to exploit the small amount of knowledge (labeled information) to guide the learning process (classification or clustering) The main (Semi-Supervised) assumptions are: Continuity - Samples close to each other are more likely to share the same category Smoothness - Give a preference for decision boundaries in low-density regions Manifold - The data lies approximately on a manifold of much lower dimension than the input space (avoid curse of dimensionality) !14 Knowledge-Aware Data Embedding Idea: Learning new representations that not only allow to reconstruct the data but, it also involves the available knowledge !15 Knowledge-Aware Data Embedding Idea: Learning new representations that not only allow to reconstruct the data but, it also involves the available knowledge Features / Features / Representation Representation Encoder Classif Encoder Decoder Bottleneck Output Signal Data Layer Reconstruction !15 Knowledge-Aware Data Embedding Proposed model [IJCNN18, TNNLS19] An (Ensemble of) semi-supervised Autoencoder to: - Deal with reconstruction task on the whole set of data - Deal with the prediction task only on the labelled data - The auto encoder structure is symmetric [IJCNN18] D. Ienco, R. G. Pensa: Semi-Supervised Clustering With Multiresolution Autoencoders. IJCNN