Learning Graph Representations with Recurrent Neural Network Autoencoders

Total Page:16

File Type:pdf, Size:1020Kb

Learning Graph Representations with Recurrent Neural Network Autoencoders Learning Graph Representations with Recurrent Neural Network Autoencoders Aynaz Taheri Kevin Gimpel Tanya Berger-Wolf University of Illinois at Chicago Toyota Technological Institute at University of Illinois at Chicago [email protected] Chicago [email protected] [email protected] ABSTRACT Representing and comparing graphs is a central problem in many fields. We present an approach to learn representations of graphs using recurrent neural network autoencoders. Recurrent neural networks require sequential data, so we begin with several meth- ods to generate sequences from graphs, including random walks, breadth-first search, and shortest paths. We train long short-term memory (LSTM) autoencoders to embed these graph sequences into a continuous vector space. We then represent a graph by averaging its graph sequence representations. The graph representations are then used for graph classification and comparison tasks. We demon- strate the effectiveness of our approach by showing improvements over the existing state-of-the-art on several graph classification tasks, including both labeled and unlabeled graphs. Figure 1: Learned graph representations in Proteins [5] KEYWORDS dataset, which has two classes: enzyme (red) and non- enzyme (blue). The plot is generated by t-SNE [29] and the Representation learning, Deep Learning, Recurrent Neural Net- two highlighted graphs are generated by Gephi [2]. works, Graph Classification 1 INTRODUCTION three main (not necessarily disjoint) classes: feature extraction, We address the problem of comparing and classifying graphs by graph kernels, and graph matching. learning their representations. This problem arises in many domain Feature extraction methods compare graphs across a set of fea- areas, including bioinformatics, social network analysis, chemistry, tures, such as specific subgraphs or numerical properties that cap- neuroscience, and computer vision. For instance, in neuroscience, ture the topology of the graphs [4, 30]. The efficiency and per- comparing brain networks represented by graphs helps to identify formance of such methods is highly dependent on the feature se- brains with neurological disorders [43]. In social network analy- lection process. Most graph kernels [44] are based on the idea of sis, we may need to compare egonetworks to detect anomalies [1] R-convolutional kernels [19], a way of defining kernels on struc- or to identify corresponding egonetworks across multiple social tured objects by decomposing the objects into substructures and networks. Cutting across domains, we may be interested in under- comparing pairs in the decompositions. For graphs, the substruc- standing how to distinguish the structure of a social network from tures include graphlets [39], shortest paths [6], random walks [16], a biological network, an authorship network, a computer network, and subtrees [38]. While graph kernel methods are effective, their or a citation network [33]. In some tasks, we need to quantify the time complexity is quadratic in the number of graphs. Graph match- similarity between two graphs. For example, in pattern recognition ing algorithms use the topology of the graphs, their nodes and tasks such as handwriting recognition, we need to measure the edges directly, counting matches and mismatches [8, 36]. These similarity between graphs derived from handwriting images [14]. approaches do not consider the global structure of the graphs and We consider the setting in which we want to compare graphs are sensitive to noise. without necessarily finding any node correspondence between In this paper, we propose an unsupervised approach for learning them. Existing graph comparison methods can be categorized into graph representations using long short-term memory (LSTM) recur- rent neural networks [22]. An unsupervised graph representation Permission to make digital or hard copies of part or all of this work for personal or approach can be used not only in processing labeled data, such as classroom use is granted without fee provided that copies are not made or distributed in graph classification in bioinformatics, but can be also applied for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. in many practical applications, such as anomaly detection in so- For all other uses, contact the owner/author(s). cial networks or streaming data, as well as in exploratory analysis KDD’18 Deep Learning Day, August 2018, London, UK and scientific hypothesis generation. An unsupervised method for © 2018 Copyright held by the owner/author(s). ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. learning graph representations provides a fundamental capability https://doi.org/10.1145/nnnnnnn.nnnnnnn to analyze graphs based on their intrinsic properties. KDD’18 Deep Learning Day, August 2018, London, UK B. Trovato et al. We leverage the LSTM sequence-to-sequence learning frame- used to predict new features for the graphs. Bruna et al. [7] proposed work of [40], which uses one LSTM to encode the input sequence spectral networks, generalizations of CNNs on low-dimensional into a vector and another LSTM to generate the output sequence graphs via graph Laplacians. Henaff et al. [20] and Defferrard et from that vector. We use the same sequence as both input and out- al. [11] extended spectral networks to high-dimensional graphs. put, making this a sequence-to-sequence LSTM autoencoder [27]. Scarselli et al. [37] proposed graph neural networks (GNN) and find We consider several ways to generate sequences from the graphs, node representations using random walks. Li et al. [28] extended including random walks, breadth-first search, and shortest paths. GNNs with gating recurrent neural networks to predict sequences After training the autoencoder, we represent a graph by averag- from graphs. In general, neural message passing approaches can suf- ing the encodings of several of its graph sequences. The graph fer from high computational and memory costs, since they perform representations are then used for graph classification. multiple iterations of updating hidden node states in graph repre- Figure 1 shows the result of using our approach to embed the sentations. However, our unsupervised approach obtains strong graphs in the Proteins dataset [5] into a 100-dimensional space, performance without the requirement of passing messages between visualized with t-SNE [29]. Each point is one of the 1113 graphs in vertices for multiple iterations. the dataset. The two class labels were not used when learning the Graph2vec [32] is an unsupervised method inspired by document graph representations, but we still generally find that graphs of the embedding models [25]. This approach finds a representation for same class are clustered in the learned space. a graph by maximizing the likelihood of graph subtrees given the We demonstrate the efficacy of our approach by using our learned graph embedding. Our approach outperforms this method by a large graph representations in classification tasks for both labeled and margin. Graph2vec suffers from its lack of capturing global infor- unlabeled graphs. We outperform the state-of-the-art on nearly all mation in the graph structure by only considering subtrees as graph datasets considered, showing performance gains over prior graph representatives. Other methods have been developed to learn rep- kernel methods while also being asymptotically more efficient. In- resentations for individual nodes in graphs, such as DeepWalk [35], deed, the time complexity of our approach is linear in the number LINE [41], node2vec [18], and many others. These methods are not of graphs, while kernel methods are quadratic. directly related to graph comparison because they require aggre- gating node representations to represent entire graphs. However, 2 RELATED WORK we compared to one representative method in our experiments to show that the aggregation of node embeddings is not informative While we briefly discussed prior work in Section 1, we now provide enough to represent the structure of a graph. some additional remarks to provide more perspective on other work in this area. 3 BACKGROUND Recently, some new graph kernels such as Deep Graph Ker- We briefly discuss the required background about graphs and LSTM nel (DGK) [47] and optimal-assignment Weisfeiler-Lehman (WL- recurrent neural networks. OA) [23] have been proposed and evaluated for graph classification Graphs. For a graph G = ¹V ; Eº, V denotes its vertex set and E ⊆ tasks. The DGK [47] uses methods from unsupervised learning of V × V denotes its edge set. G is called a labeled graph if there is a word embeddings [31] to augment the kernel with substructure labeling function label : V ! L that assigns a label from a set of similarity. WL-OA [23] is an assignment kernel that finds an op- labels L to each vertex. The graph G is called an unlabeled graph if timal bijection between different parts of the graph. While graph no label has been assigned to each vertex in the graph. kernel methods are effective, their time complexity is quadratic in Long short-term memory (LSTM). An LSTM [22] is a recurrent the number of graphs, and there is no opportunity to customize neural network (RNN) designed to model long-distance dependen- their representations for supervised tasks. cies in sequential data. We denote the input vector at time t by x There has been other work in using neural methods to learn t and we denote the hidden vector computed at time t by h . At each graph representations. The representations obtained by these ap- t time step, an LSTM computes a memory cell vector c , an input proaches are tailored for a supervised task and are not determined t gate vector i , a forget gate vector f , and an output gate vector o : solely based on the graph structure. Niepert et al. [34] developed a t t t framework (PSCN) to learn graph representations using convolu- it = σ¹Wixt + Uiht−1 + Kict−1 + bi º tional neural networks (CNNs). Deep Graph Convolutional Neural ft = σ¹Wf xt + Uf ht−1 + Kf ct−1 + bf º Network (DGCNN) [48] is another model that applies CNNs for ot = σ¹Woxt + Uoht−1 + Koct−1 + boº (1) graph classification.
Recommended publications
  • Backpropagation and Deep Learning in the Brain
    Backpropagation and Deep Learning in the Brain Simons Institute -- Computational Theories of the Brain 2018 Timothy Lillicrap DeepMind, UCL With: Sergey Bartunov, Adam Santoro, Jordan Guerguiev, Blake Richards, Luke Marris, Daniel Cownden, Colin Akerman, Douglas Tweed, Geoffrey Hinton The “credit assignment” problem The solution in artificial networks: backprop Credit assignment by backprop works well in practice and shows up in virtually all of the state-of-the-art supervised, unsupervised, and reinforcement learning algorithms. Why Isn’t Backprop “Biologically Plausible”? Why Isn’t Backprop “Biologically Plausible”? Neuroscience Evidence for Backprop in the Brain? A spectrum of credit assignment algorithms: A spectrum of credit assignment algorithms: A spectrum of credit assignment algorithms: How to convince a neuroscientist that the cortex is learning via [something like] backprop - To convince a machine learning researcher, an appeal to variance in gradient estimates might be enough. - But this is rarely enough to convince a neuroscientist. - So what lines of argument help? How to convince a neuroscientist that the cortex is learning via [something like] backprop - What do I mean by “something like backprop”?: - That learning is achieved across multiple layers by sending information from neurons closer to the output back to “earlier” layers to help compute their synaptic updates. How to convince a neuroscientist that the cortex is learning via [something like] backprop 1. Feedback connections in cortex are ubiquitous and modify the
    [Show full text]
  • Chombining Recurrent Neural Networks and Adversarial Training
    Combining Recurrent Neural Networks and Adversarial Training for Human Motion Modelling, Synthesis and Control † ‡ Zhiyong Wang∗ Jinxiang Chai Shihong Xia Institute of Computing Texas A&M University Institute of Computing Technology CAS Technology CAS University of Chinese Academy of Sciences ABSTRACT motions from the generator using recurrent neural networks (RNNs) This paper introduces a new generative deep learning network for and refines the generated motion using an adversarial neural network human motion synthesis and control. Our key idea is to combine re- which we call the “refiner network”. Fig 2 gives an overview of current neural networks (RNNs) and adversarial training for human our method: a motion sequence XRNN is generated with the gener- motion modeling. We first describe an efficient method for training ator G and is refined using the refiner network R. To add realism, a RNNs model from prerecorded motion data. We implement recur- we train our refiner network using an adversarial loss, similar to rent neural networks with long short-term memory (LSTM) cells Generative Adversarial Networks (GANs) [9] such that the refined because they are capable of handling nonlinear dynamics and long motion sequences Xre fine are indistinguishable from real motion cap- term temporal dependencies present in human motions. Next, we ture sequences Xreal using a discriminative network D. In addition, train a refiner network using an adversarial loss, similar to Gener- we embed contact information into the generative model to further ative Adversarial Networks (GANs), such that the refined motion improve the quality of the generated motions. sequences are indistinguishable from real motion capture data using We construct the generator G based on recurrent neural network- a discriminative network.
    [Show full text]
  • Automatic Feature Learning Using Recurrent Neural Networks
    Predicting purchasing intent: Automatic Feature Learning using Recurrent Neural Networks Humphrey Sheil Omer Rana Ronan Reilly Cardiff University Cardiff University Maynooth University Cardiff, Wales Cardiff, Wales Maynooth, Ireland [email protected] [email protected] [email protected] ABSTRACT influence three out of four major variables that affect profit. Inaddi- We present a neural network for predicting purchasing intent in an tion, merchants increasingly rely on (and pay advertising to) much Ecommerce setting. Our main contribution is to address the signifi- larger third-party portals (for example eBay, Google, Bing, Taobao, cant investment in feature engineering that is usually associated Amazon) to achieve their distribution, so any direct measures the with state-of-the-art methods such as Gradient Boosted Machines. merchant group can use to increase their profit is sorely needed. We use trainable vector spaces to model varied, semi-structured input data comprising categoricals, quantities and unique instances. McKinsey A.T. Kearney Affected by Multi-layer recurrent neural networks capture both session-local shopping intent and dataset-global event dependencies and relationships for user Price management 11.1% 8.2% Yes sessions of any length. An exploration of model design decisions Variable cost 7.8% 5.1% Yes including parameter sharing and skip connections further increase Sales volume 3.3% 3.0% Yes model accuracy. Results on benchmark datasets deliver classifica- Fixed cost 2.3% 2.0% No tion accuracy within 98% of state-of-the-art on one and exceed state-of-the-art on the second without the need for any domain / Table 1: Effect of improving different variables on operating dataset-specific feature engineering on both short and long event profit, from [22].
    [Show full text]
  • Exploring the Potential of Sparse Coding for Machine Learning
    Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 10-19-2020 Exploring the Potential of Sparse Coding for Machine Learning Sheng Yang Lundquist Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/open_access_etds Part of the Applied Mathematics Commons, and the Computer Sciences Commons Let us know how access to this document benefits ou.y Recommended Citation Lundquist, Sheng Yang, "Exploring the Potential of Sparse Coding for Machine Learning" (2020). Dissertations and Theses. Paper 5612. https://doi.org/10.15760/etd.7484 This Dissertation is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Exploring the Potential of Sparse Coding for Machine Learning by Sheng Y. Lundquist A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Dissertation Committee: Melanie Mitchell, Chair Feng Liu Bart Massey Garrett Kenyon Bruno Jedynak Portland State University 2020 © 2020 Sheng Y. Lundquist Abstract While deep learning has proven to be successful for various tasks in the field of computer vision, there are several limitations of deep-learning models when com- pared to human performance. Specifically, human vision is largely robust to noise and distortions, whereas deep learning performance tends to be brittle to modifi- cations of test images, including being susceptible to adversarial examples. Addi- tionally, deep-learning methods typically require very large collections of training examples for good performance on a task, whereas humans can learn to perform the same task with a much smaller number of training examples.
    [Show full text]
  • Recurrent Neural Network for Text Classification with Multi-Task
    Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Recurrent Neural Network for Text Classification with Multi-Task Learning Pengfei Liu Xipeng Qiu⇤ Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, Fudan University School of Computer Science, Fudan University 825 Zhangheng Road, Shanghai, China pfliu14,xpqiu,xjhuang @fudan.edu.cn { } Abstract are based on unsupervised objectives such as word predic- tion for training [Collobert et al., 2011; Turian et al., 2010; Neural network based methods have obtained great Mikolov et al., 2013]. This unsupervised pre-training is effec- progress on a variety of natural language process- tive to improve the final performance, but it does not directly ing tasks. However, in most previous works, the optimize the desired task. models are learned based on single-task super- vised objectives, which often suffer from insuffi- Multi-task learning utilizes the correlation between related cient training data. In this paper, we use the multi- tasks to improve classification by learning tasks in parallel. [ task learning framework to jointly learn across mul- Motivated by the success of multi-task learning Caruana, ] tiple related tasks. Based on recurrent neural net- 1997 , there are several neural network based NLP models [ ] work, we propose three different mechanisms of Collobert and Weston, 2008; Liu et al., 2015b utilize multi- sharing information to model text with task-specific task learning to jointly learn several tasks with the aim of and shared layers. The entire network is trained mutual benefit. The basic multi-task architectures of these jointly on all these tasks. Experiments on four models are to share some lower layers to determine common benchmark text classification tasks show that our features.
    [Show full text]
  • A Survey on Data Collection for Machine Learning a Big Data - AI Integration Perspective
    1 A Survey on Data Collection for Machine Learning A Big Data - AI Integration Perspective Yuji Roh, Geon Heo, Steven Euijong Whang, Senior Member, IEEE Abstract—Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning, deep learning techniques automatically generate features, which saves feature engineering costs, but in return may require larger amounts of labeled data. Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data. In this survey, we perform a comprehensive study of data collection from a data management point of view. Data collection largely consists of data acquisition, data labeling, and improvement of existing data or models. We provide a research landscape of these operations, provide guidelines on which technique to use when, and identify interesting research challenges. The integration of machine learning and data management for data collection is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research. Index Terms—data collection, data acquisition, data labeling, machine learning F 1 INTRODUCTION E are living in exciting times where machine learning expertise. This problem applies to any novel application that W is having a profound influence on a wide range of benefits from machine learning.
    [Show full text]
  • Text-To-Speech Synthesis
    Generative Model-Based Text-to-Speech Synthesis Heiga Zen (Google London) February rd, @MIT Outline Generative TTS Generative acoustic models for parametric TTS Hidden Markov models (HMMs) Neural networks Beyond parametric TTS Learned features WaveNet End-to-end Conclusion & future topics Outline Generative TTS Generative acoustic models for parametric TTS Hidden Markov models (HMMs) Neural networks Beyond parametric TTS Learned features WaveNet End-to-end Conclusion & future topics Text-to-speech as sequence-to-sequence mapping Automatic speech recognition (ASR) “Hello my name is Heiga Zen” ! Machine translation (MT) “Hello my name is Heiga Zen” “Ich heiße Heiga Zen” ! Text-to-speech synthesis (TTS) “Hello my name is Heiga Zen” ! Heiga Zen Generative Model-Based Text-to-Speech Synthesis February rd, of Speech production process text (concept) fundamental freq voiced/unvoiced char freq transfer frequency speech transfer characteristics magnitude start--end Sound source fundamental voiced: pulse frequency unvoiced: noise modulation of carrier wave by speech information air flow Heiga Zen Generative Model-Based Text-to-Speech Synthesis February rd, of Typical ow of TTS system TEXT Sentence segmentation Word segmentation Text normalization Text analysis Part-of-speech tagging Pronunciation Speech synthesis Prosody prediction discrete discrete Waveform generation ) NLP Frontend discrete continuous SYNTHESIZED ) SEECH Speech Backend Heiga Zen Generative Model-Based Text-to-Speech Synthesis February rd, of Rule-based, formant synthesis []
    [Show full text]
  • Deep Learning Architectures for Sequence Processing
    Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright © 2021. All rights reserved. Draft of September 21, 2021. CHAPTER Deep Learning Architectures 9 for Sequence Processing Time will explain. Jane Austen, Persuasion Language is an inherently temporal phenomenon. Spoken language is a sequence of acoustic events over time, and we comprehend and produce both spoken and written language as a continuous input stream. The temporal nature of language is reflected in the metaphors we use; we talk of the flow of conversations, news feeds, and twitter streams, all of which emphasize that language is a sequence that unfolds in time. This temporal nature is reflected in some of the algorithms we use to process lan- guage. For example, the Viterbi algorithm applied to HMM part-of-speech tagging, proceeds through the input a word at a time, carrying forward information gleaned along the way. Yet other machine learning approaches, like those we’ve studied for sentiment analysis or other text classification tasks don’t have this temporal nature – they assume simultaneous access to all aspects of their input. The feedforward networks of Chapter 7 also assumed simultaneous access, al- though they also had a simple model for time. Recall that we applied feedforward networks to language modeling by having them look only at a fixed-size window of words, and then sliding this window over the input, making independent predictions along the way. Fig. 9.1, reproduced from Chapter 7, shows a neural language model with window size 3 predicting what word follows the input for all the. Subsequent words are predicted by sliding the window forward a word at a time.
    [Show full text]
  • Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J
    1 Unsupervised speech representation learning using WaveNet autoencoders Jan Chorowski, Ron J. Weiss, Samy Bengio, Aaron¨ van den Oord Abstract—We consider the task of unsupervised extraction speaker gender and identity, from phonetic content, properties of meaningful latent representations of speech by applying which are consistent with internal representations learned autoencoding neural networks to speech waveforms. The goal by speech recognizers [13], [14]. Such representations are is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being desired in several tasks, such as low resource automatic speech invariant to confounding low level details in the signal such as recognition (ASR), where only a small amount of labeled the underlying pitch contour or background noise. Since the training data is available. In such scenario, limited amounts learned representation is tuned to contain only phonetic content, of data may be sufficient to learn an acoustic model on the we resort to using a high capacity WaveNet decoder to infer representation discovered without supervision, but insufficient information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the to learn the acoustic model and a data representation in a fully kind of constraint that is applied to the latent representation. supervised manner [15], [16]. We compare three variants: a simple dimensionality reduction We focus on representations learned with autoencoders bottleneck, a Gaussian Variational Autoencoder (VAE), and a applied to raw waveforms and spectrogram features and discrete Vector Quantized VAE (VQ-VAE). We analyze the quality investigate the quality of learned representations on LibriSpeech of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately re- [17].
    [Show full text]
  • Sparse Feature Learning for Deep Belief Networks
    Sparse Feature Learning for Deep Belief Networks Marc'Aurelio Ranzato1 Y-Lan Boureau2,1 Yann LeCun1 1 Courant Institute of Mathematical Sciences, New York University 2 INRIA Rocquencourt {ranzato,ylan,[email protected]} Abstract Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have cer- tain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the repre- sentation. We describe a novel and efficient algorithm to learn sparse represen- tations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input observed variables can be captured. 1 Introduction One of the main purposes of unsupervised learning is to produce good representations for data, that can be used for detection, recognition, prediction, or visualization. Good representations eliminate irrelevant variabilities of the input data, while preserving the information that is useful for the ul- timate task. One cause for the recent resurgence of interest in unsupervised learning is the ability to produce deep feature hierarchies by stacking unsupervised modules on top of each other, as pro- posed by Hinton et al.
    [Show full text]
  • Machine Learning V/S Deep Learning
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072 Machine Learning v/s Deep Learning Sachin Krishan Khanna Department of Computer Science Engineering, Students of Computer Science Engineering, Chandigarh Group of Colleges, PinCode: 140307, Mohali, India ---------------------------------------------------------------------------***--------------------------------------------------------------------------- ABSTRACT - This is research paper on a brief comparison and summary about the machine learning and deep learning. This comparison on these two learning techniques was done as there was lot of confusion about these two learning techniques. Nowadays these techniques are used widely in IT industry to make some projects or to solve problems or to maintain large amount of data. This paper includes the comparison of two techniques and will also tell about the future aspects of the learning techniques. Keywords: ML, DL, AI, Neural Networks, Supervised & Unsupervised learning, Algorithms. INTRODUCTION As the technology is getting advanced day by day, now we are trying to make a machine to work like a human so that we don’t have to make any effort to solve any problem or to do any heavy stuff. To make a machine work like a human, the machine need to learn how to do work, for this machine learning technique is used and deep learning is used to help a machine to solve a real-time problem. They both have algorithms which work on these issues. With the rapid growth of this IT sector, this industry needs speed, accuracy to meet their targets. With these learning algorithms industry can meet their requirements and these new techniques will provide industry a different way to solve problems.
    [Show full text]
  • Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning
    Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Lecture 11 Recurrent Neural Networks I CMSC 35246 Introduction Sequence Learning with Neural Networks Lecture 11 Recurrent Neural Networks I CMSC 35246 Some Sequence Tasks Figure credit: Andrej Karpathy Lecture 11 Recurrent Neural Networks I CMSC 35246 MLPs only accept an input of fixed dimensionality and map it to an output of fixed dimensionality Great e.g.: Inputs - Images, Output - Categories Bad e.g.: Inputs - Text in one language, Output - Text in another language MLPs treat every example independently. How is this problematic? Need to re-learn the rules of language from scratch each time Another example: Classify events after a fixed number of frames in a movie Need to resuse knowledge about the previous events to help in classifying the current. Problems with MLPs for Sequence Tasks The "API" is too limited. Lecture 11 Recurrent Neural Networks I CMSC 35246 Great e.g.: Inputs - Images, Output - Categories Bad e.g.: Inputs - Text in one language, Output - Text in another language MLPs treat every example independently. How is this problematic? Need to re-learn the rules of language from scratch each time Another example: Classify events after a fixed number of frames in a movie Need to resuse knowledge about the previous events to help in classifying the current. Problems with MLPs for Sequence Tasks The "API" is too limited. MLPs only accept an input of fixed dimensionality and map it to an output of fixed dimensionality Lecture 11 Recurrent Neural Networks I CMSC 35246 Bad e.g.: Inputs - Text in one language, Output - Text in another language MLPs treat every example independently.
    [Show full text]