Convolutional Neural Networks for Language

Convolutional Neural Networks for Language CS 6956: Deep Learning for NLP Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly excellent sequences Approach: Train a multiclass classifier What features? 2 Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly excellent sequences Approach: Train a multiclass classifier What features? Some words and ngrams are informative, while some are not 3 Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly excellent sequences Approach: Train a multiclass classifier What features? Some words and ngrams are informative, while some are not We need to: 1. Identify informative local information 2. Aggregate it into a fixed size vector representation 4 Convolutional Neural Networks Designed to 1. Identify local predictors in a larger input 2. Pool them together to create a feature representation 3. And possibly repeat this in a hierarchical fashion In the NLP context, it helps identify predictive ngrams for a task 5 Overview • Convolutional Neural Networks: A brief history • The two operations in a CNN – Convolution – Pooling • Convolution + Pooling as a building block • CNNs in NLP • Recurrent networks vs Convolutional networks 6 Overview • Convolutional Neural Networks: A brief history • The two operations in a CNN – Convolution – Pooling • Convolution + Pooling as a building block • CNNs in NLP • Recurrent networks vs Convolutional networks 7 Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field • Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations: convolutional layer that reacts to specific patterns and a down-sampling layer that aggregates information • Le Cun 1989-today, Convolutional Neural Network: A supervised version – Related to convolution kernels in computer vision – Very successful on handwriting recognition and other computer vision tasks • Has become better over recent years with more data, computation – Krizhevsky et al 2012: Object detection with ImageNet – The de facto feature extractor for computer vision 8 Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Nobel Prize in Physiology or Medicine, 1981 David H. Hubel Torsten Wiesel 9 Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field • Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations 1. convolutional layer that reacts to specific patterns and, 2. a down-sampling layer that aggregates information 10 Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field • Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations: convolutional layer that reacts to specific patterns and a down-sampling layer that aggregates information • Le Cun 1989-today, Convolutional Neural Network: A supervised version – Related to convolution kernels in computer vision – Success with handwriting recognition and other computer vision tasks 11 Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field • Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations: convolutional layer that reacts to specific patterns and a down-sampling layer that aggregates information • Le Cun 1989-today, Convolutional Neural Network: A supervised version – Related to convolution kernels in computer vision – Success with handwriting recognition and other computer vision tasks • Has become better over recent years with more data, computation – Krizhevsky et al 2012: Object detection with ImageNet – The de facto feature extractor for computer vision 12 Convolutional Neural Networks: Brief history • Introduced to NLP by Collobert et al, 2011 – Used as a feature extraction system for semantic role labeling • Since then several other applications such as sentiment analysis, question classification, etc – Kalchbrener et al 2014, Kim 2014 13 CNN terminology Shows its computer visions and signal processing origins • Filter – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector • Channel – In computer vision, color images have red, blue and green channels – In general, a channel represents a medium that captures information about an input independent of other channels • For example, different kinds of word embeddings could be different channels • Channels could themselves be produced by previous convolutional layers • Receptive field – The region of the input that a filter currently focuses on 14 CNN terminology Shows its computer visions and signal processing origins • Filter – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector (also called a feature map) • Channel – In computer vision, color images have red, blue and green channels – In general, a channel represents a medium that captures information about an input independent of other channels • For example, different kinds of word embeddings could be different channels • Channels could themselves be produced by previous convolutional layers • Receptive field – The region of the input that a filter currently focuses on 15 CNN terminology Shows its computer visions and signal processing origins • Filter – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector (also called a feature map) • Channel – In computer vision, color images have red, blue and green channels – In general, a channel represents a medium that captures information about an input independent of other channels • For example, different kinds of word embeddings could be different channels • Channels could themselves be produced by previous convolutional layers • Receptive field – The region of the input that a filter currently focuses on 16 CNN terminology Shows its computer visions and signal processing origins • Filter – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector (also called a feature map) • Channel – In computer vision, color images have red, blue and green channels – In general, a channel represents a “view of the input” that captures information about an input independent of other channels • For example, different kinds of word embeddings could be different channels • Channels could themselves be produced by previous convolutional layers • Receptive field – The region of the input that a filter currently focuses on 17 Overview • Convolutional Neural Networks: A brief history • The two operations in a CNN – Convolution – Pooling • Convolution + Pooling as a building block • CNNs in NLP • Recurrent networks vs Convolutional networks 18 What is a convolution? Let’s see this using an example for vectors. We will generalize this to matrices and beyond, but the general idea remains the same. 19 What is a convolution? An example using vectors A vector � 2 3 1 3 2 1 20 What is a convolution? An example using vectors A vector � 2 3 1 3 2 1 Filter � of size � 1 2 1 Here, the filter size is 3 21 What is a convolution? An example using vectors A vector � 2 3 1 3 2 1 Filter � of size � 1 2 1 The output is also a vector 3 output( = * �, ⋅ � 0 (/ 1 2, , 22 What is a convolution? An example using vectors A vector � 2 3 1 3 2 1 Filter � of size � 1 2 1 The output is also a vector 3 output( = * �, ⋅ � 0 (/ 1 2, , The filter moves across the vector. At each position, the output is the dot product of the filter with a slice of the vector of that size. 23 What is a convolution? An example using vectors Padding at the beginning A vector � 0 2 3 1 3 2 1 Filter � of size � 1 2 1 3 output( = * �, ⋅ � 0 (/ 1 2, , 24 What is a convolution? An example using vectors Padding at the beginning A vector � 0 2 3 1 3 2 1 Filter � of size � 1 2 1 The output is also a vector 7 3 output( = * �, ⋅ � 0 (/ 1 2, , 25 What is a convolution? An example using vectors A vector � 2 3 1 3 2 1 Filter � of size � 1 2 1 The output is also a vector 7 9 3 output( = * �, ⋅ � 0 (/ 1

Convolutional Neural Networks for Language

Multiscale Computation and Dynamic Attention in Biological and Artificial

Deep Neural Network Models for Sequence Labeling and Coreference Tasks

Convolutional Neural Network Model Layers Improvement for Segmentation and Classification on Kidney Stone Images Using Keras and Tensorflow

The History Began from Alexnet: a Comprehensive Survey on Deep Learning Approaches

The Understanding of Convolutional Neuron Network Family

Methods and Trends in Natural Language Processing Applications in Big Data

The Neocognitron As a System for Handavritten Character Recognition: Limitations and Improvements

Memory Based Machine Intelligence Techniques in VLSI Hardware

An Improved Hierarchical Temporal Memory

What Can Computer Vision Teach NLP About Efficient Neural Networks?

Deep Learning 1.1. from Neural Networks to Deep Learning

Neocognitron: a Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position