Deep Neural Network Models for Sequence Labeling and Coreference Tasks

Total Page:16

File Type:pdf, Size:1020Kb

Deep Neural Network Models for Sequence Labeling and Coreference Tasks Federal state autonomous educational institution for higher education ¾Moscow institute of physics and technology (national research university)¿ On the rights of a manuscript Le The Anh DEEP NEURAL NETWORK MODELS FOR SEQUENCE LABELING AND COREFERENCE TASKS Specialty 05.13.01 - ¾System analysis, control theory, and information processing (information and technical systems)¿ A dissertation submitted in requirements for the degree of candidate of technical sciences Supervisor: PhD of physical and mathematical sciences Burtsev Mikhail Sergeevich Dolgoprudny - 2020 Федеральное государственное автономное образовательное учреждение высшего образования ¾Московский физико-технический институт (национальный исследовательский университет)¿ На правах рукописи Ле Тхе Ань ГЛУБОКИЕ НЕЙРОСЕТЕВЫЕ МОДЕЛИ ДЛЯ ЗАДАЧ РАЗМЕТКИ ПОСЛЕДОВАТЕЛЬНОСТИ И РАЗРЕШЕНИЯ КОРЕФЕРЕНЦИИ Специальность 05.13.01 – ¾Системный анализ, управление и обработка информации (информационные и технические системы)¿ Диссертация на соискание учёной степени кандидата технических наук Научный руководитель: кандидат физико-математических наук Бурцев Михаил Сергеевич Долгопрудный - 2020 Contents Abstract 4 Acknowledgments 6 Abbreviations 7 List of Figures 11 List of Tables 13 1 Introduction 14 1.1 Overview of Deep Learning . 14 1.1.1 Artificial Intelligence, Machine Learning, and Deep Learning . 14 1.1.2 Milestones in Deep Learning History . 16 1.1.3 Types of Machine Learning Models . 16 1.2 Brief Overview of Natural Language Processing . 18 1.3 Dissertation Overview . 20 1.3.1 Scientific Actuality of the Research . 20 1.3.2 The Goal and Task of the Dissertation . 20 1.3.3 Scientific Novelty . 21 1.3.4 Theoretical and Practical Value of the Work in the Dissertation . 21 1.3.5 Statements to be Defended . 22 1.3.6 Presentations and Validation of the Research Results . 23 1.3.7 Publications . 23 1.3.8 Dissertation Structure . 24 2 Deep Neural Network Models for NLP Tasks 26 2.1 Word Representation Models . 26 2.1.1 Word Representation . 26 2.1.2 Prediction-based Models . 28 2.1.3 Count-based Models . 34 2.2 Deep Neural Network Models . 38 1 2.2.1 Convolutional Neural Network . 38 2.2.2 Recurrent Neural Network . 42 2.2.3 Long Short-Term Memory Cells . 46 2.2.4 LSTM Networks . 49 2.3 Pre-trained Language Models . 50 2.3.1 ELMo . 50 2.3.2 Transformer . 52 2.3.3 OpenAI’s GPT . 56 2.3.4 Google’s BERT . 58 2.4 Summary . 60 3 Sequence Labeling with Character-aware Deep Neural Networks and Lan- guage Models 62 3.1 Introduction to the Sequence Labeling Tasks . 62 3.2 Related Work . 63 3.2.1 Rule-based Models . 64 3.2.2 Feature-based Models . 64 3.2.3 Deep Learning-based Models . 65 3.2.4 Related Work on Vietnamese Named Entity Recognition . 67 3.2.5 Related Work on Russian Named Entity Recognition . 68 3.3 Tagging Schemes . 69 3.4 Evaluation Metrics . 70 3.5 WCC-NN-CRF Models for Sequence Labeling Tasks . 71 3.5.1 Backbone WCC-NN-CRF Architecture . 71 3.5.2 Language Model-based Architecture . 77 3.6 Application of WCC-NN-CRF Models for Named Entity Recognition . 78 3.6.1 Overview of Named Entity Recognition Task . 78 3.6.2 Datasets and Pre-trained Word Embeddings . 78 3.6.3 Evaluation of backbone WCC-NN-CRF Model . 80 3.6.4 Evaluation of ELMo-based WCC-NN-CRF model . 85 3.6.5 Evaluation of BERT-based Multilingual WCC-NN-CRF Model . 85 3.7 Application of WCC-NN-CRF Model for Sentence Boundary Detection . 87 3.7.1 Introduction to the Sentence Boundary Detection Task . 87 3.7.2 Sentence Boundary Detection as a Sequence Labeling Task . 89 3.7.3 Evaluation of WCC-NN-CRF SBD Model . 91 3.8 Conclusions . 93 2 4 Coreference Resolution with Sentence-level Coreferential Scoring 95 4.1 The Coreference Resolution Task . 95 4.2 Related Work . 96 4.2.1 Rule-based Models . 96 4.2.2 Deep Learning Models . 97 4.3 Coreference Resolution Evaluation Metrics . 100 4.4 Baseline Model Description . 102 4.5 Sentence-level Coreferential Relation-based Model . 104 4.6 BERT-based Coreference Model . 108 4.7 Experiments and Results . 108 4.7.1 Datasets . 108 4.7.2 Evaluation of Proposed Models . 110 4.8 Conclusions . 115 5 Conclusions 117 5.1 Conclusions for Sequence Labeling Task . 117 5.2 Conclusions for Coreference Resolution Task . 120 5.3 Summary of the Main Contributions of the Dissertation . 121 Bibliography 123 3 Abstract Deep neural network models have recently received tremendous attentions from both academy and industry, and of course, garnered amazing results in a variety of domains ranging from Computer Vision, Speech Recognition to Natural Language Processing (NLP). They sig- nificantly lifted the performance of machine learning-based systems to a whole new level, close to the human-level performance. As a matter of course, the number of deep learning projects has also increased year by year. The IPavlov project1, based at the Neural Networks and Deep Learning Lab of Moscow Institute of Physics and Technology (MIPT), is one of them, aiming at building a set of pre-trained network models, predefined dialogue system components and pipeline templates. This thesis is based on the work carried out as a part of this project, focusing on studying deep neural network models to address Sequence Labeling and Coreference Resolution tasks. This thesis consists of three main parts. Firstly, we systematically synthesize three key concepts in the field of Deep Learning for NLP closely related to the work carried out in this thesis, including (1) two approaches to word representation learning, (2) deep neural network models often used to address machine learning tasks in general and NLP tasks in particular, and (3) cutting-edge pre-trained language models and their applications in downstream tasks. Secondly, we propose three deep neural network models for Sequence Labeling tasks, including (1) the hybrid model consisting of three sub-networks to fully capture character- level and capitalization features as well as word context features, (2) language modeling- based model, and (3) the multilingual model. These proposed models were evaluated on the task of Named Entity Recognition. Conducted experiments on six datasets covering four languages Russian, Vietnamese, English, and Chinese datasets showed that our models achieved state-of-the-art performance. Besides that, we reformulated the task of Sentence Boundary Detection as Sequence Labeling task and used the proposed model to address this task. The obtained results on two conversational datasets pointed out that the proposed model achieved an impressive accuracy. Thirdly, we propose two models for Coreference Resolution task, including (1) Sentence-level Coreferential Relation-based model that can take as input a paragraph, a discourse, or even a document with hundreds of sentences and predict the coreference relations between sentences, and (2) the language modeling-based 1https://ipavlov.ai/ 4 model that leverages the power of modern language models to boost the model performance. The experiment results on two Russian datasets and the comparisons with the other existing models pointed out that our models obtained the cutting-edge results on both Anaphora and Coreference Resolution tasks. 5 Acknowledgments The pursuit of a Ph.D. is a long and challenging journey through unknown lands. I would like to express sincere thanks to those who have accompanied and encouraged me on this journey. Without their supports, this thesis would not have been done. First and foremost, I would like to thank my supervisor Burtsev Mikhail Sergeevich, the Head of Neural Networks and Deep Learning Lab2 - Moscow Institute of Physics and Technology, who provided me a good research environment and gave me the right directions during my Ph.D. study. I also would like to thank my colleagues Yura Kuratov, Mikhail Arkhipov, Valentin Malykh, Maxim Petrov, Alexey Sorokin, and the others in our lab for the highly fruitful and enlightening discussions we shared. Besides, I would like to express my thanks to the reviewers Artem Shelmanov, Natalia Lukashevich, and Aleksandr Panov - the chair of the pre-defense seminar, for their valuable comments and suggestions on my dissertation. Finally, this thesis is dedicated to my family for their love, mental encouragement, and for always being by my side. This work was supported by National Technology Initiative and PAO Sberbank project ID 0000000007417F630002. Dolgoprudny - 2020 2https://mipt.ru/science/labs/laboratoriya-neyronnykh-sistem-i-glubokogo-obucheniya/ 6 Abbreviations General notation a.k.a. also known as e.g. exempli gratia, meaning “for example” et al. et alia, meaning “and others” etc. et cetera, meaning “and the other things” i.e. id est, meaning “in other words” w.r.t. with respect to Natural language processing BERT Bidirectional Encoder Representations from Transformers model [44] Bi-LM Bidirectional Language Model BPE BytePair Encoding CBOW Continuous Bag-of-Words model [123] CEAF Constrained Entity Alignment F-measure CR Coreference Resolution CRF Conditional Random Field ELMo Embeddings from Language Models [69] GPT Generative Pretraining [14] IOB Inside-Outside-Beginning tagging scheme IOBES Inside-Outside-Beginning-End-Single tagging scheme IOE Inside-Outside-End tagging scheme 7 LM Language Modeling LR-MVL Low-Rank Multi-View Learning model [87] LSA Laten Semantic Analysis Masked LM Masked Language Modeling MT Machine Translation NER Named Entity Recognition NLP Natural Language Processing NPs Noun Phrases OOV Out-Of-Vocabulary POS Part-Of-Speech QA Question Answering SRL Semantic Role Labeling WER Word Error Rate Neural networks AGI Artificial General Intelligence AI Artificial Intelligence ANI Artificial Narrow Intelligence ANN Artificial Neural Network Bi-LSTM Bi-directional Long Short-Term Memory BPTT Back-Propagation Through Time CNN Convolutional Neural Network DL Deep Learning FFNN Feed-Forward Neural Network GRU Gated Recurrent Network HMM Hidden Markov Model 8 LSTM Long Short-Term Memory ML Machine Learning ReLU Rectified Linear Unit RNN Recurrent Neural Network Probability and statistics CCA Canonical Correlation Analysis PCA Principle Component Analysis PPL Perplexity SVD Singular Value Decomposition T-SNE T-distributed Stochastic Neighbor Embedding 9 List of Figures 1.1 The Artificial Intelligence taxonomy.
Recommended publications
  • CS855 Pattern Recognition and Machine Learning Homework 3 A.Aziz Altowayan
    CS855 Pattern Recognition and Machine Learning Homework 3 A.Aziz Altowayan Problem Find three recent (2010 or newer) journal articles of conference papers on pattern recognition applications using feed-forward neural networks with backpropagation learning that clearly describe the design of the neural network { number of layers and number of units in each layer { and the rationale for the design. For each paper, describe the neural network, the reasoning behind the design, and include images of the neural network when available. Answer 1 The theme of this ansewr is Deep Neural Network 2 (Deep Learning or multi-layer deep architecture). The reason is that in recent years, \Deep learning technology and related algorithms have dramatically broken landmark records for a broad range of learning problems in vision, speech, audio, and text processing." [1] Deep learning models are a class of machines that can learn a hierarchy of features by building high-level features from low-level ones, thereby automating the process of feature construction [2]. Following are three paper in this topic. Paper1: D. C. Ciresan, U. Meier, J. Schmidhuber. \Multi-column Deep Neural Networks for Image Classifica- tion". IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012. Feb 2012. arxiv \Work from Swiss AI Lab IDSIA" This method is the first to achieve near-human performance on MNIST handwriting dataset. It, also, outperforms humans by a factor of two on the traffic sign recognition benchmark. In this paper, the network model is Deep Convolutional Neural Networks. The layers in their NNs are comparable to the number of layers found between retina and visual cortex of \macaque monkeys".
    [Show full text]
  • Multiscale Computation and Dynamic Attention in Biological and Artificial
    brain sciences Review Multiscale Computation and Dynamic Attention in Biological and Artificial Intelligence Ryan Paul Badman 1,* , Thomas Trenholm Hills 2 and Rei Akaishi 1,* 1 Center for Brain Science, RIKEN, Saitama 351-0198, Japan 2 Department of Psychology, University of Warwick, Coventry CV4 7AL, UK; [email protected] * Correspondence: [email protected] (R.P.B.); [email protected] (R.A.) Received: 4 March 2020; Accepted: 17 June 2020; Published: 20 June 2020 Abstract: Biological and artificial intelligence (AI) are often defined by their capacity to achieve a hierarchy of short-term and long-term goals that require incorporating information over time and space at both local and global scales. More advanced forms of this capacity involve the adaptive modulation of integration across scales, which resolve computational inefficiency and explore-exploit dilemmas at the same time. Research in neuroscience and AI have both made progress towards understanding architectures that achieve this. Insight into biological computations come from phenomena such as decision inertia, habit formation, information search, risky choices and foraging. Across these domains, the brain is equipped with mechanisms (such as the dorsal anterior cingulate and dorsolateral prefrontal cortex) that can represent and modulate across scales, both with top-down control processes and by local to global consolidation as information progresses from sensory to prefrontal areas. Paralleling these biological architectures, progress in AI is marked by innovations in dynamic multiscale modulation, moving from recurrent and convolutional neural networks—with fixed scalings—to attention, transformers, dynamic convolutions, and consciousness priors—which modulate scale to input and increase scale breadth.
    [Show full text]
  • Convolutional Neural Network Model Layers Improvement for Segmentation and Classification on Kidney Stone Images Using Keras and Tensorflow
    Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 2458-9403 Vol. 8 Issue 6, June - 2021 Convolutional Neural Network Model Layers Improvement For Segmentation And Classification On Kidney Stone Images Using Keras And Tensorflow Orobosa Libert Joseph Waliu Olalekan Apena Department of Computer / Electrical and (1) Department of Computer / Electrical and Electronics Engineering, The Federal University of Electronics Engineering, The Federal University of Technology, Akure. Nigeria. Technology, Akure. Nigeria. [email protected] (2) Biomedical Computing and Engineering Technology, Applied Research Group, Coventry University, Coventry, United Kingdom [email protected], [email protected] Abstract—Convolutional neural network (CNN) and analysis, by automatic or semiautomatic means, of models are beneficial to image classification large quantities of data in order to discover meaningful algorithms training for highly abstract features patterns [1,2]. The domains of data mining include: and work with less parameter. Over-fitting, image mining, opinion mining, web mining, text mining, exploding gradient, and class imbalance are CNN and graph mining and so on. Some of its applications major challenges during training; with appropriate include anomaly detection, financial data analysis, management training, these issues can be medical data analysis, social network analysis, market diminished and enhance model performance. The analysis [3]. Recent progress in deep learning using models are 128 by 128 CNN-ML and 256 by 256 CNN machine learning (CNN-ML) has been helpful in CNN-ML training (or learning) and classification. decision support and contributed to positive outcome, The results were compared for each model significantly. The application of CNN-ML to diverse classifier. The study of 128 by 128 CNN-ML model areas of soft computing is adapted diagnosis has the following evaluation results consideration procedure to enhance time and accuracy [4].
    [Show full text]
  • Arxiv:2103.13076V1 [Cs.CL] 24 Mar 2021
    Finetuning Pretrained Transformers into RNNs Jungo Kasai♡∗ Hao Peng♡ Yizhe Zhang♣ Dani Yogatama♠ Gabriel Ilharco♡ Nikolaos Pappas♡ Yi Mao♣ Weizhu Chen♣ Noah A. Smith♡♢ ♡Paul G. Allen School of Computer Science & Engineering, University of Washington ♣Microsoft ♠DeepMind ♢Allen Institute for AI {jkasai,hapeng,gamaga,npappas,nasmith}@cs.washington.edu {Yizhe.Zhang, maoyi, wzchen}@microsoft.com [email protected] Abstract widely used in autoregressive modeling such as lan- guage modeling (Baevski and Auli, 2019) and ma- Transformers have outperformed recurrent chine translation (Vaswani et al., 2017). The trans- neural networks (RNNs) in natural language former makes crucial use of interactions between generation. But this comes with a signifi- feature vectors over the input sequence through cant computational cost, as the attention mech- the attention mechanism (Bahdanau et al., 2015). anism’s complexity scales quadratically with sequence length. Efficient transformer vari- However, this comes with significant computation ants have received increasing interest in recent and memory footprint during generation. Since the works. Among them, a linear-complexity re- output is incrementally predicted conditioned on current variant has proven well suited for au- the prefix, generation steps cannot be parallelized toregressive generation. It approximates the over time steps and require quadratic time complex- softmax attention with randomized or heuris- ity in sequence length. The memory consumption tic feature maps, but can be difficult to train in every generation step also grows linearly as the and may yield suboptimal accuracy. This work aims to convert a pretrained transformer into sequence becomes longer. This bottleneck for long its efficient recurrent counterpart, improving sequence generation limits the use of large-scale efficiency while maintaining accuracy.
    [Show full text]
  • The History Began from Alexnet: a Comprehensive Survey on Deep Learning Approaches
    > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches Md Zahangir Alom1, Tarek M. Taha1, Chris Yakopcic1, Stefan Westberg1, Paheding Sidike2, Mst Shamima Nasrin1, Brian C Van Essen3, Abdul A S. Awwal3, and Vijayan K. Asari1 Abstract—In recent years, deep learning has garnered I. INTRODUCTION tremendous success in a variety of application domains. This new ince the 1950s, a small subset of Artificial Intelligence (AI), field of machine learning has been growing rapidly, and has been applied to most traditional application domains, as well as some S often called Machine Learning (ML), has revolutionized new areas that present more opportunities. Different methods several fields in the last few decades. Neural Networks have been proposed based on different categories of learning, (NN) are a subfield of ML, and it was this subfield that spawned including supervised, semi-supervised, and un-supervised Deep Learning (DL). Since its inception DL has been creating learning. Experimental results show state-of-the-art performance ever larger disruptions, showing outstanding success in almost using deep learning when compared to traditional machine every application domain. Fig. 1 shows, the taxonomy of AI. learning approaches in the fields of image processing, computer DL (using either deep architecture of learning or hierarchical vision, speech recognition, machine translation, art, medical learning approaches) is a class of ML developed largely from imaging, medical information processing, robotics and control, 2006 onward. Learning is a procedure consisting of estimating bio-informatics, natural language processing (NLP), cybersecurity, and many others.
    [Show full text]
  • The Understanding of Convolutional Neuron Network Family
    2017 2nd International Conference on Computer Engineering, Information Science and Internet Technology (CII 2017) ISBN: 978-1-60595-504-9 The Understanding of Convolutional Neuron Network Family XINGQUN QI ABSTRACT Along with the development of the computing speed and Artificial Intelligence, more and more jobs have been done by computers. Convolutional Neural Network (CNN) is one of the most popular algorithms in the field of Deep Learning. It is mainly used in computer identification, especially in voice, text recognition and other aspects of the application. CNNs have been developed for decades. In theory, Convolutional Neural Network has gradually become more mature. However, in practical applications, it is limited by various conditions. This paper will introduce the course of Convolutional Neural Network’s development today, as well as the current more mature and popular architecture and related applications of it, to help readers have more macro and comprehensive understanding of Convolutional Neural Networks. KEYWORDS Convolutional Neuron Network, Architecture, CNN family, Application INTRODUCTION With the continuous improvements of technology, there are more and more multi- disciplinary applications achieved with computer technology. And Machine Learning, especially Deep Learning is one of the most indispensable and important parts of that technology. Deep learning has the great advantage of acquiring features and modeling from data [1]. The Convolutional Neural Network (CNN) algorithm is one of the best and foremost algorithms in the Deep Learning algorithms. It is mainly used for image recognition and computer vision. In today’s living conditions, the impact and significance of CNN is becoming more and more important. In computer vision, CNN has an important position.
    [Show full text]
  • A Survey Paper on Convolutional Neural Network
    A Survey Paper on Convolutional Neural Network 1Ekta Upadhyay, 2Ranjeet Singh, 3Pallavi Upadhyay 1.1Department of Information Technology, 2.1Department of Computer Science Engineering 3.1Department of Information Technology, Buddha Institute of Technology, Gida, Gorakhpur, India Abstract In this era use of machines are growing promptly in every field such as pattern recognition, image, video processing project that can mimic like human cerebral network function and to achieve this Convolutional Neural Network of deep learning algorithm helps to train large datasets with millions of parameters of 2d image to provide desirable output using filters. Going through the convolutional layer then pooling layer and in last fully connected layer, Images becomes more effective how many times it filters become better than other. In this article we are going through the basic of Convolution Neural Network and its working process. Keywords-Deep Learning, Convolutional Neural Network, Handwritten digit recognition, MNIST, Pooling. 1- Introduction In the world of technology deep learning has become one of the most aspect in the field of machine. Deep learning which is sub-field of Artificial Learning that focuses on creating large Neural Network Model which provides accuracy in the field of data processing decision. social media apps like Instagram, twitter Google, Microsoft and many other apps with million users having multiple features like face recognition, some apps for handwriting recognition. Which is machine learning problem to recognize clearly[3], as we know machines are man-made and machines does not have minds or visual cortex to understands or see the real word entity, so to understand the real world in 1960 human create a theorem named Convolutional neural network of deep learning algorithm that make machine much more understandable.
    [Show full text]
  • Countering Terrorism Online with Artificial Intelligence an Overview for Law Enforcement and Counter-Terrorism Agencies in South Asia and South-East Asia
    COUNTERING TERRORISM ONLINE WITH ARTIFICIAL INTELLIGENCE AN OVERVIEW FOR LAW ENFORCEMENT AND COUNTER-TERRORISM AGENCIES IN SOUTH ASIA AND SOUTH-EAST ASIA COUNTERING TERRORISM ONLINE WITH ARTIFICIAL INTELLIGENCE An Overview for Law Enforcement and Counter-Terrorism Agencies in South Asia and South-East Asia A Joint Report by UNICRI and UNCCT 3 Disclaimer The opinions, findings, conclusions and recommendations expressed herein do not necessarily reflect the views of the Unit- ed Nations, the Government of Japan or any other national, regional or global entities involved. Moreover, reference to any specific tool or application in this report should not be considered an endorsement by UNOCT-UNCCT, UNICRI or by the United Nations itself. The designation employed and material presented in this publication does not imply the expression of any opinion whatsoev- er on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or area of its authorities, or concerning the delimitation of its frontiers or boundaries. Contents of this publication may be quoted or reproduced, provided that the source of information is acknowledged. The au- thors would like to receive a copy of the document in which this publication is used or quoted. Acknowledgements This report is the product of a joint research initiative on counter-terrorism in the age of artificial intelligence of the Cyber Security and New Technologies Unit of the United Nations Counter-Terrorism Centre (UNCCT) in the United Nations Office of Counter-Terrorism (UNOCT) and the United Nations Interregional Crime and Justice Research Institute (UNICRI) through its Centre for Artificial Intelligence and Robotics.
    [Show full text]
  • Time Dependent Plasticity in Vanilla Backpro
    Under review as a conference paper at ICLR 2018 ITERATIVE TEMPORAL DIFFERENCING WITH FIXED RANDOM FEEDBACK ALIGNMENT SUPPORT SPIKE- TIME DEPENDENT PLASTICITY IN VANILLA BACKPROP- AGATION FOR DEEP LEARNING Anonymous authors Paper under double-blind review ABSTRACT In vanilla backpropagation (VBP), activation function matters considerably in terms of non-linearity and differentiability. Vanishing gradient has been an im- portant problem related to the bad choice of activation function in deep learning (DL). This work shows that a differentiable activation function is not necessary any more for error backpropagation. The derivative of the activation function can be replaced by an iterative temporal differencing (ITD) using fixed random feed- back weight alignment (FBA). Using FBA with ITD, we can transform the VBP into a more biologically plausible approach for learning deep neural network ar- chitectures. We don’t claim that ITD works completely the same as the spike-time dependent plasticity (STDP) in our brain but this work can be a step toward the integration of STDP-based error backpropagation in deep learning. 1 INTRODUCTION VBP was proposed around 1987 Rumelhart et al. (1985). Almost at the same time, biologically- inspired convolutional networks was also introduced as well using VBP LeCun et al. (1989). Deep learning (DL) was introduced as an approach to learn deep neural network architecture using VBP LeCun et al. (1989; 2015); Krizhevsky et al. (2012). Extremely deep networks learning reached 152 layers of representation with residual and highway networks He et al. (2016); Srivastava et al. (2015). Deep reinforcement learning was successfully implemented and applied which was mimicking the dopamine effect in our brain for self-supervised and unsupervised learning Silver et al.
    [Show full text]
  • Methods and Trends in Natural Language Processing Applications in Big Data
    International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-7, Issue-6S5, April 2019 Methods and Trends in Natural Language Processing Applications in Big Data Joseph M. De Guia, Madhavi Devaraj translation that can be processed and accessible to computer Abstract: Understanding the natural language of humans by applications. Some of the tasks available to NLP can be the processing them in the computer makes sense for applications in following: machine translation, generation and understanding solving practical problems. These applications are present in our of natural language, morphological separation, part of speech systems that automates some and even most of the human tasks performed by the computer algorithms.The “big data” deals with tagging, recognition of speech, entities, optical characters, NLP techniques and applications that sifts through each word, analysis of discourse, sentiment analysis, etc. Through these phrase, sentence, paragraphs, symbols, images, speech, tasks, NLPs achieved the goal of analyzing, understanding, utterances with its meanings, relationships, and translations that and generating the natural language of human using a can be processed and accessible to computer applications by computer application and with the help of learning humans. Through these automated tasks, NLPs achieved the goal techniques. of analyzing, understanding, and generating the natural This paper is a survey of the published literature in NLP and language of human using a computer application and with the its uses to big data. This also presents a review of the NLP help of classic and advanced machine learning techniques. This paper is a survey of the published literature in NLP and its uses to applications and learning techniques used in some of the big data.
    [Show full text]
  • The Neocognitron As a System for Handavritten Character Recognition: Limitations and Improvements
    The Neocognitron as a System for HandAvritten Character Recognition: Limitations and Improvements David R. Lovell A thesis submitted for the degree of Doctor of Philosophy Department of Electrical and Computer Engineering University of Queensland March 14, 1994 THEUliW^^ This document was prepared using T^X and WT^^. Figures were prepared using tgif which is copyright © 1992 William Chia-Wei Cheng (william(Dcs .UCLA. edu). Graphs were produced with gnuplot which is copyright © 1991 Thomas Williams and Colin Kelley. T^ is a trademark of the American Mathematical Society. Statement of Originality The work presented in this thesis is, to the best of my knowledge and belief, original, except as acknowledged in the text, and the material has not been subnaitted, either in whole or in part, for a degree at this or any other university. David R. Lovell, March 14, 1994 Abstract This thesis is about the neocognitron, a neural network that was proposed by Fuku- shima in 1979. Inspired by Hubel and Wiesel's serial model of processing in the visual cortex, the neocognitron was initially intended as a self-organizing model of vision, however, we are concerned with the supervised version of the network, put forward by Fukushima in 1983. Through "training with a teacher", Fukushima hoped to obtain a character recognition system that was tolerant of shifts and deformations in input images. Until now though, it has not been clear whether Fukushima's ap- proach has resulted in a network that can rival the performance of other recognition systems. In the first three chapters of this thesis, the biological basis, operational principles and mathematical implementation of the supervised neocognitron are presented in detail.
    [Show full text]
  • Memory Based Machine Intelligence Techniques in VLSI Hardware
    Memory Based Machine Intelligence Techniques in VLSI hardware Alex Pappachen James, Machine Intelligence Research Group, Indian Institute of Information Technology and Management-Kerala Email: [email protected] Abstract: We briefly introduce the memory based approaches to emulate machine intelligence in VLSI hardware, describing the challenges and advantages. Implementation of artificial intelligence techniques in VLSI hardware is a practical and difficult problem. Deep architectures, hierarchical temporal memories and memory networks are some of the contemporary approaches in this area of research. The techniques attempt to emulate low level intelligence tasks and aim at providing scalable solutions to high level intelligence problems such as sparse coding and contextual processing. Neocortex in human brain that accounts for 76% of the brain's volume can be seen as a control unit that is involved in the processing of intelligence functions such as sensory perception, generation of motor commands, spatial reasoning, conscious thought and language. The realisation of neocortex like functions through algorithms and hardware implementation has been the inspiration to majority of state of the art artificial intelligence systems and theory. Intelligent functions are often required to be performed in environments having high levels of natural variability, which makes individual functions of the brain highly complex. On the other hand, the huge number of neurons in neocortex and their interconnections with each other makes the brain structurally very complex. As a result, the mimicking of neocortex is exceptionally challenging and difficult. Human brain is highly modular and hierarchical in structure and functionality. Each module in a human brain has evolved in structure and is trained to deal with very specific task.
    [Show full text]