Cytonmt: an Efficient Neural Machine Translation Open-Source Toolkit

Total Page:16

File Type:pdf, Size:1020Kb

Cytonmt: an Efficient Neural Machine Translation Open-Source Toolkit CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++ Xiaolin Wang Masao Utiyama Eiichiro Sumita Advanced Translation Research and Development Promotion Center National Institute of Information and Communications Technology, Japan {xiaolin.wang,mutiyama,eiichiro.sumita}@nict.go.jp Abstract • ByteNet (Kalchbrenner et al., 2016)9 This paper presents an open-source neural ma- • ConvS2S (Gehring et al., 2017)10 chine translation toolkit named CytonMT1. The toolkit is built from scratch only using • Tensor2Tensor (Vaswani et al., 2017)11 C++ and NVIDIA’s GPU-accelerated libraries. The toolkit features training efficiency, code • Marian (Junczys-Dowmunt et al., 2018)12 simplicity and translation quality. Benchmarks show that CytonMT accelerates the training These open-source NMT toolkits are undoubt- speed by 64.5% to 110.8% on neural net- edly excellent software. However, there is a com- works of various sizes, and achieves compet- mon issue – they are all written in script languages itive translation quality. with dependencies on third-party GPU platforms (see Table 1) except Marian, which is developed 1 Introduction simultaneously with our toolkit. Neural Machine Translation (NMT) Using script languages and third-party GPU has made remarkable progress over the platforms is a two-edged sword. On one hand, past few years (Sutskever et al., 2014; it greatly reduces the workload of coding neural Bahdanau et al., 2014; Wu et al., 2016). Just networks. On the other hand, it also causes two like Moses (Koehn et al., 2007) does for statistic problems as follows, machine translation (SMT), open-source NMT • The running efficiency drops, and profiling toolkits contribute greatly to this progress, and optimization also become difficult, as the including but not limited to, direct access to GPUs is blocked by the lan- • RNNsearch-LV (Jean et al., 2015)2 guage interpreters or the platforms. NMT systems typically require days or weeks to • Luong-NMT (Luong et al., 2015a)3 train, so training efficiency is a paramount concern. Slightly faster training can make the • DL4MT by Kyunghyun Cho et al.4 difference between plausible and impossible experiments (Klein et al., 2017). • BPE-char (Chung et al., 2016)5 The researchers using these toolkits may 6 • • Nematus (Sennrich et al., 2017) be constrained by the platforms. Unex- plored computations or operations may be- • OpenNMT (Klein et al., 2017)7 come disallowed or unnecessarily inefficient • Seq2seq (Britz et al., 2017)8 on a third-party platform, which lowers the chances of developing novel neural network 1 https://github.com/arthurxlw/cytonMt techniques. 2https://github.com/sebastien-j/LV groundhog 3https://github.com/lmthang/nmt.hybrid 4https://github.com/nyu-dl/dl4mt-tutorial 9https://github.com/paarthneekhara/byteNet-tensorflow 5https://github.com/nyu-dl/dl4mt-cdec (unofficial) and others. 6https://github.com/EdinburghNLP/nematus 10https://github.com/facebookresearch/fairseq 7https://github.com/OpenNMT/OpenNMT-py 11https://github.com/tensorflow/tensor2tensor 8https://github.com/google/seq2seq 12https://github.com/marian-nmt/marian 133 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (System Demonstrations), pages 133–138 Brussels, Belgium, October 31–November 4, 2018. c 2018 Association for Computational Linguistics ( ( Toolkit Language Platform RNNsearch-LV Python Theano,GroundHog Luong-NMT Matlab Matlab 9(*0: DL4MT Python Theano /0!.#($1 (' ($ BPE-char Python Theano )*+#,,-&. Nematus Python Theano "# ; >5? ; " OpenNMT Lua Torch 8((#&(-& Seq2seq Python Tensorflow 204#! ByteNet Python Tensorflow <5" <6" <7" <B" >5? " @5" ;A ;A ;A ;A ; ; ;( " " ( ( ConvS2S Lua Torch !"#$$2/3 /0!.#($$2/3 Tensor2Tensor Python Tensorflow 204#!$7 204#!$7 Marian C++ – ! ! CytonMT C++ – !"#$$2/3 /0!.#($$2/3 204#!$6 204#!$6 ! ! !"#$$2/3 /0!.#($$2/3 Table 1: Languages and Platforms of Open-source 204#!$5 204#!$5 NMT toolkits. !"#$$%&' ( /0!.#($$%&' ( )*+#,,-&. )*+#,,-&. #$%&' ( ( CytonMT is developed to address this issue, in ! " hopes of providing the community an attractive alternative. The toolkit is written in C++ which Figure 1: Architecture of CytonMT. is the genuine official language of NVIDIA – the manufacturer of the most widely-used GPU hard- trates the architecture. The conditional probability ware. This gives the toolkit an advantage on effi- of a translation given a source sentence is formu- ciency when compared with other toolkits. lated as, Implementing in C++ also gives CytonMT great m hji flexibility and freedom on coding. The researchers log p(y x) = X log(p(yj Ho ) | | who are interested in the real calculations inside j=1 m neural networks can trace source codes down to hji = X log(softmaxyj (tanh(WoHo + Bo))) (1) kernel functions, matrix operations or NVIDIA’s j=1 APIs, and then modify them freely to test their hji hji Ho = att(Hs,Ht ), (2) novel ideas. F The code simplicity of CytonMT is compara- where x is a source sentence; y=(y1,...,ym) is ble to those NMT toolkits implemented in script a translation; Hs is a source-side top-layer hidden hji languages. This owes to an open-source general- state; Ht is a target-side top-layer hidden state; purpose neural network library in C++, named Cy- hji Ho is a state generated by an attention model tonLib, which is shipped as part of the source Fatt; Wo and Bo are the weight and bias of an code. The library defines a simple and friendly output embedding. pattern for users to build arbitrary network archi- The toolkit adopts the multiplicative attention tectures in the cost of two lines of genuine C++ model proposed by Luong et al. (2015a), because code per layer. it is slightly more efficient than the additive variant proposed by Bahdanau et al. (2014). This issue is CytonMT achieves competitive translation addressed in Britz et al. (2017) and Vaswani et al. quality, which is the main purpose of NMT (2017). Figure 2 illustrates the model, formulated toolkits. It implements the popular framework of as , attention-based RNN encoder-decoder. Among hiji hii hji a = softmax( a(H ,H )) st F s t the reported systems of the same architecture, it hii hji eFa(Hs ,Ht ) ranks at top positions on the benchmarks of both = , (3) n hii hji eFa(Hs ,Ht ) WMT14 and WMT17 English-to-German tasks. Pi=1 hii hji hii⊤ hji The following of this paper presented the details a(H ,H ) = H WaH , (4) F s t s t of CytonMT from the aspects of method, imple- n hji hiji hii mentation, benchmark, and future works. Cs = X ast Hs , (5) i=1 hji hji 2 Method Cst = [Cs; Ht ], (6) hji hji The toolkit approaches to the problem of ma- Ho = tanh(WcCst ), (7) chine translation using the attention-based RNN encoder-decoder proposed by Bahdanau et al. where Fa is a scoring function for alignment; Wa (2014) and Luong et al. (2015a). Figure 1 illus- is a matrix for linearly mapping target-side hidden 134 class Attention: public Network { DuplicateLayer dupHt; // declare components !"# LinearLayer linearHt; 4 MultiplyHsHt multiplyHsHt; %& $"'()( *" + SoftmaxLayer softmax; WeightedHs weightedHs; !"# ! $ !"# ! $ 0$&1+#$. !"#$ "#$ Concatenate concateCsHt; 2$" LinearLayer linearCst; !"# 1 *2 3&2 ActivationLayer actCst; ,/*( %& $"'() ,#(-'./!# 4 !5# 4 !7# !6# 4 !0# 4 Variable* init(LinearLayer* linHt, 4 !"# LinearLayer linCst, Variable hs, * * %&'()*++,%-. -(8* ++,%-. Variable* ht) ,/*(+0 ,/*(+0 { Variable* tx; tx=dupHt.init(ht); // make two copies layers.push_back(&dupHt); Figure 2: Architecure of Attention Model. tx=linearHt.init(linHt, tx); // WaHt layers.push_back(&linearHt); tx=multiplyHsHt.init(hs, tx); // Fa layers.push_back(&multiplyHsHt); states into a space comparable to the source-side; tx=softmax.init(tx); //ast hiji hji layers.push_back(&softmax); ast is an alignment coefficient; Cs is a source- j side context; Ch i is a context derived from both tx=weightedHs.init(hs, tx); // Cs st layers.push_back(&weightedHs); sides. tx=concateCsHt.init(tx, &dupHt.y1); // Cst layers.push_back(&concateCsHt); 3 Implementation tx=linearCst.init(linCst, tx); // WcCst layers.push_back(&linearCst); The toolkit consists of a general purpose neural tx=actCst.init(tx, CUDNN_ACTIVATION_TANH);// Ho layers.push_back(&actCst); network library, and a neural machine translation return tx; //pointer to result system built upon the library. The neural network } library defines a class named Network to facili- }; tate the construction of arbitrary neural networks. Users only need to inherit the class, declare com- Figure 3: Complete Code of Attention Model For- ponents as data members, and write down two mulated by Equations 3 to 7 lines of codes per component in an initialization function. For example, the complete code of the void LinearLayer::forward() { attention network formulated by the equations 3 cublasXgemm(cublasH, CUBLAS_OP_T, CUBLAS_OP_N, to 7 is presented in Figure 3. This piece of code dimOutput, num, dimInput, &one, w.data, w.ni, x.data, dimInput, fulfills the task of building a neural network as fol- &zero, y.data, dimOutput) lows, } void LinearLayer::backward() { cublasXgemm(cublasH, CUBLAS_OP_N, CUBLAS_OP_N, • The class of Variable stores numeric values dimInput, num, dimOutput, and gradients. Through passing the pointers &one, w.data, w.ni, y.grad.data, dimOutput, &beta, x.grad.data, dimInput)); of Variable around, component are connected } together. void LinearLayer::calculateGradient() { cublasXgemm(cublasH, CUBLAS_OP_N, CUBLAS_OP_T, dimInput, dimOutput, num, • The data member of layers collects all the &one, x.data,
Recommended publications
  • Part Vi Neural Machine Translation, Seq2seq and Attention 2
    CS224n: Natural Language Processing with Deep 1 Learning 1 Course Instructors: Christopher Lecture Notes: Part VI Manning, Richard Socher 2 Neural Machine Translation, Seq2seq and Attention 2 Authors: Guillaume Genthial, Lucas Liu, Barak Oshri, Kushal Ranjan Winter 2019 Keyphrases: Seq2Seq and Attention Mechanisms, Neural Machine Translation, Speech Processing 1 Neural Machine Translation with Seq2Seq So far in this class, we’ve dealt with problems of predicting a single output: an NER label for a word, the single most likely next word in a sentence given the past few, and so on. However, there’s a whole class of NLP tasks that rely on sequential output, or outputs that are sequences of potentially varying length. For example, • Translation: taking a sentence in one language as input and out- putting the same sentence in another language. • Conversation: taking a statement or question as input and re- sponding to it. • Summarization: taking a large body of text as input and out- putting a summary of it. In these notes, we’ll look at sequence-to-sequence models, a deep learning-based framework for handling these types of problems. This framework proved to be very effective, and has, in fewer than 3 years, become the standard for machine translation. 1.1 Brief Note on Historical Approaches In the past, translation systems were based on probabilistic models constructed from: •a translation model, telling us what a sentence/phrase in a source language most likely translates into •a language model, telling us how likely a given sentence/phrase is overall. These components were used to build translation systems based on words or phrases.
    [Show full text]
  • Deep Transformer Models for Time Series Forecasting:The Influenza
    Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case Neo Wu 1 Bradley Green 1 Xue Ben 1 Shawn O’Banion 1 Abstract ods. Mechanistic modeling is based on the understanding of In this paper, we present a new approach to time underlying disease infection dynamics. For example, com- series forecasting. Time series data are preva- partmental methods such as SIR are popular approaches to lent in many scientific and engineering disciplines. simulating disease spreading dynamics. Time series forecasting is a crucial task in mod- Statistical and machine learning methods leverage the eling time series data, and is an important area ground truth data to learn the trends and patterns. One of machine learning. In this work we developed popular family of methods includes auto-regression (AR), a novel method that employs Transformer-based autoregressive moving average (ARMA), and autoregressive machine learning models to forecast time series integrated moving average (ARIMA). Additionally, deep data. This approach works by leveraging self- learning approaches based on convolutional and recurrent attention mechanisms to learn complex patterns neural networks have been developed to model ILI data. and dynamics from time series data. Moreover, These sequence-aligned models are natural choices for mod- it is a generic framework and can be applied to eling time series data. However, due to “gradient vanishing univariate and multivariate time series data, as and exploding” problems in RNNs and the limits of convo- well as time series embeddings. Using influenza- lutional filters, these methods have limitations in modeling like illness (ILI) forecasting as a case study, we long-term and complex relations in the sequence data.
    [Show full text]
  • Deep Learning
    CSI 436/536 Introduction to Machine Learning Deep Learning Professor Siwei Lyu Computer Science University at Albany, State University of New York the up and downs of NN • first high wave 1960s: simple one layer perceptron CHAPTER• first 1. INTRODUCTION down wave 1970s: show of limitations of one layer perception • second high wave 1980s: development of BP and many uses (and abuses) • second down wave late 1990s to 2006: overfitting problem and vanishing gradient 0.000250 cybernetics 0.000200 (connectionism + neural networks) 0.000150 0.000100 0.000050 Frequency of Word0.000000 or Phrase 1940 1950 1960 1970 1980 1990 2000 Year Figure 1.7: The figure shows two of the three historical waves of artificial neural nets research, as measured by the frequency of the phrases “cybernetics” and “connectionism” or “neural networks” according to Google Books (the third wave is too recent to appear). The first wave started with cybernetics in the 1940s–1960s, with the development of theories of biological learning (McCulloch and Pitts, 1943; Hebb, 1949)andimplementationsof the first models such as the perceptron (Rosenblatt, 1958) allowing the training of a single neuron. The second wave started with the connectionist approach of the 1980–1995 period, with back-propagation (Rumelhart et al., 1986a) to train a neural network with one or two hidden layers. The current and third wave, deep learning, started around 2006 (Hinton et al., 2006; Bengio et al., 2007; Ranzato et al., 2007a), and is just now appearing in book form as of 2016. The other two waves similarly appeared in book form much later than the corresponding scientific activity occurred.
    [Show full text]
  • Autograph: Imperative-Style Coding with Graph-Based Performance
    AUTOGRAPH:IMPERATIVE-STYLE CODING WITH GRAPH-BASED PERFORMANCE Dan Moldovan 1 James M Decker 2 Fei Wang 2 Andrew A Johnson 1 Brian K Lee 1 Zachary Nado 1 D Sculley 1 Tiark Rompf 2 Alexander B Wiltschko 1 ABSTRACT There is a perceived trade-off between machine learning code that is easy to write, and machine learning code that is scalable or fast to execute. In machine learning, imperative style libraries like Autograd and PyTorch are easy to write, but suffer from high interpretive overhead and are not easily deployable in production or mobile settings. Graph-based libraries like TensorFlow and Theano benefit from whole-program optimization and can be deployed broadly, but make expressing complex models more cumbersome. We describe how the use of staged programming in Python, via source code transformation, offers a midpoint between these two library design patterns, capturing the benefits of both. A key insight is to delay all type-dependent decisions until runtime, similar to dynamic dispatch. We instantiate these principles in AutoGraph, a software system that improves the programming experience of the TensorFlow library, and demonstrate usability improvements with no loss in performance compared to native TensorFlow graphs. We also show that our system is backend agnostic, targeting an alternate IR with characteristics not found in TensorFlow graphs. 1 PROGRAMMING PARADIGMS FOR code directly, building up a representation of the user’s pro- MACHINE LEARNING gram incrementally for either automatic differentiation or compilation. TensorFlow also supports imperative-style cod- Programming platforms specialized for machine learning ing via “eager execution”, where user-written Python code (ML) are undergoing widespread adoption, as ML mod- immediately executes TensorFlow kernels, without a graph els such as neural networks demonstrate state-of-the-art being built.
    [Show full text]
  • Tensorflow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction
    TensorFlow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction TensorFlow Google Brain, 2015 (rewritten DistBelief) Theano University of Montréal, 2009 Keras François Chollet, 2015 (now at Google) Torch Facebook AI Research, Twitter, Google DeepMind Caffe Berkeley Vision and Learning Center (BVLC), 2013 Outline 1. Introduction of each framework a. TensorFlow b. Theano c. Keras d. Torch e. Caffe 2. Further comparison a. Code + models b. Community and documentation c. Performance d. Model deployment e. Extra features 3. Which framework to choose when ..? Introduction of each framework TensorFlow architecture 1) Low-level core (C++/CUDA) 2) Simple Python API to define the computational graph 3) High-level API (TF-Learn, TF-Slim, soon Keras…) TensorFlow computational graph - auto-differentiation! - easy multi-GPU/multi-node - native C++ multithreading - device-efficient implementation for most ops - whole pipeline in the graph: data loading, preprocessing, prefetching... TensorBoard TensorFlow development + bleeding edge (GitHub yay!) + division in core and contrib => very quick merging of new hotness + a lot of new related API: CRF, BayesFlow, SparseTensor, audio IO, CTC, seq2seq + so it can easily handle images, videos, audio, text... + if you really need a new native op, you can load a dynamic lib - sometimes contrib stuff disappears or moves - recently introduced bells and whistles are barely documented Presentation of Theano: - Maintained by Montréal University group. - Pioneered the use of a computational graph. - General machine learning tool -> Use of Lasagne and Keras. - Very popular in the research community, but not elsewhere. Falling behind. What is it like to start using Theano? - Read tutorials until you no longer can, then keep going.
    [Show full text]
  • Convolutional Attention-Based Seq2seq Neural Network for End-To-End ASR
    Thesis for the Degree of Master Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR by Dan Lim Department of Computer Science and Engineering Korea University Graduate School arXiv:1710.04515v1 [cs.CL] 12 Oct 2017 Abstract Traditional approach in artificial intelligence (AI) have been solving the problem that is difficult for human but relatively easy for computer if it could be formulated as mathematical rules or formal languages. However, their symbol, rule-based approach failed in the problem where human being solves intuitively like image recognition, natural language understanding and speech recognition. Therefore the machine learning, which is subfield of AI, have tackled this intuitive problems by making the computer learn from data automatically instead of human efforts of extracting complicated rules. Especially the deep learning which is a particular kind of machine learning as well as central theme of this thesis, have shown great popularity and usefulness recently. It has been known that the powerful computer, large dataset and algo- rithmic improvement have made recent success of the deep learning. And this factors have enabled recent research to train deeper network achieving significant performance improvement. Those current research trends moti- vated me to quest deeper architecture for the end-to-end speech recognition. In this thesis, I experimentally showed that the proposed deep neural net- work achieves state-of-the-art results on `TIMIT' speech recognition bench- mark dataset. Specifically, the convolutional attention-based sequence-to- sequence model which has the deep stacked convolutional layers in the attention-based seq2seq framework achieved 15.8% phoneme error rate.
    [Show full text]
  • Arxiv:2104.08301V2 [Cs.CL] 7 Jul 2021
    Text2App: A Framework for Creating Android Apps from Text Descriptions Masum Hasan1*, Kazi Sajeed Mehrab1*, Wasi Uddin Ahmad2, and Rifat Shahriyar1 1Bangladesh University of Engineering and Technology (BUET) 2University of California, Los Angeles (UCLA) [email protected], [email protected], [email protected] [email protected] Abstract Natural language description: Create an app with a textbox, a button named “Speak”, and a We present Text2App – a framework that al- text2speech. When the button is lows users to create functional Android appli- clicked, speak the text in the text box. cations from natural language specifications. Simplified App Representation: The conventional method of source code gen- On click <complist> eration tries to generate source code directly, <textbox> Happy text to which is impractical for creating complex soft- <button> STRING0 app ware. We overcome this limitation by trans- </button> <text2speech> forming natural language into an abstract in- </complist> termediate formal language representing an ap- <code> plication with a substantially smaller number <button1_clicked> of tokens. The intermediate formal representa- <speak> <textbox1text> tion is then compiled into target source codes. </speak> This abstraction of programming details al- </button1_clicked> </code> lows seq2seq networks to learn complex ap- plication structures with less overhead. In or- Literal Dictionary: der to train sequence models, we introduce a { “STRING0”: “Speak” data synthesis method grounded in a human } survey. We demonstrate that Text2App gener- alizes well to unseen combination of app com- Figure 1: An example app created by our system that ponents and it is capable of handling noisy nat- speaks the textbox text on button press.
    [Show full text]
  • Distilling the Knowledge of BERT for Sequence-To-Sequence ASR
    Distilling the Knowledge of BERT for Sequence-to-Sequence ASR Hayato Futami1, Hirofumi Inaguma1, Sei Ueno1, Masato Mimura1, Shinsuke Sakai1, Tatsuya Kawahara1 1Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto, Japan [email protected] Abstract we propose to apply BERT [13] as an external LM. BERT fea- tures Masked Language Modeling (MLM) in the pre-training Attention-based sequence-to-sequence (seq2seq) models have objective, where MLM masks a word from the input and then achieved promising results in automatic speech recognition predicts the original word. BERT can be called a “bidirectional” (ASR). However, as these models decode in a left-to-right way, LM that predicts each word on the basis of both its left and right they do not have access to context on the right. We leverage context. both left and right context by applying BERT as an external language model to seq2seq ASR through knowledge distilla- Seq2seq models decode in a left-to-right way, and therefore tion. In our proposed method, BERT generates soft labels to they do not have access to the right context during training or guide the training of seq2seq ASR. Furthermore, we leverage inference. We aim to alleviate this seq2seq’s left-to-right bias, context beyond the current utterance as input to BERT. Experi- by taking advantage of BERT’s bidirectional nature. N-best mental evaluations show that our method significantly improves rescoring with BERT was proposed in [14, 15], but the recogni- the ASR performance from the seq2seq baseline on the Corpus tion result was restricted to hypotheses from left-to-right decod- of Spontaneous Japanese (CSJ).
    [Show full text]
  • Towards Russian Text Generation Problem Using Openai's GPT-2
    Towards Russian Text Generation Problem Using OpenAI’s GPT-2 Oleksii Shatalov, Nataliya Ryabova National University of Radio Electronics, Nauky av., 14, Kharkiv, 61000, Ukraine Abstract This work is devoted to Natural Language Generation (NLG) problem. The modern approaches in this area based on deep neural networks are considered. The most famous and promising deep neural network architectures that are related to this problem are considered, in particular, the most popular free software solutions for NLG based on Transformers architecture with pre-trained deep neural network models GPT-2 and BERT. The main problem is that the main part of already existing solutions is devoted to the English language. But there are few models that are able to generate text in Russian. Moreover, the text they generate often belongs to a general topic and not about a specific subject area. The object of the study is the generation of a contextually coherent narrow-profile text in Russian. Within the framework of the study, a model was trained for generating coherent articles of a given subject area in Russian, as well as a software application for interacting with it. Keywords 1 Natural Language Generation, Natural Language Processing, Transformers Architecture, Deep Learning, Transfer Learning, GPT-2 1. Introduction The current rate of growth of content is so great that organizations are beginning to fail to keep up with their own set of speeds. Editors and copywriters do not have time to create new texts from scratch, think over ideas for new publications so that they are original. Hiring a large staff of additional staff can significantly increase the costs of the company, which will lead to lower profits.
    [Show full text]
  • A Survey on Long Short-Term Memory Networks for Time Series Prediction
    See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/341480884 A Survey on Long Short-Term Memory Networks for Time Series Prediction Preprint · May 2020 DOI: 10.13140/RG.2.2.36761.65129/1 CITATIONS READS 0 330 5 authors, including: Benjamin Lindemann Timo Müller Universität Stuttgart Universität Stuttgart 9 PUBLICATIONS 48 CITATIONS 10 PUBLICATIONS 13 CITATIONS SEE PROFILE SEE PROFILE Hannes Vietz Nasser Jazdi Universität Stuttgart Universität Stuttgart 2 PUBLICATIONS 0 CITATIONS 108 PUBLICATIONS 895 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Industry 4.0 in forming technologies View project Verification of distributed automation systems based on a model composition View project All content following this page was uploaded by Benjamin Lindemann on 05 June 2020. The user has requested enhancement of the downloaded file. Available online at www.sciencedirect.com ScienceDirect Procedia CIRP 00 (2020) 000–000 Preprint www.elsevier.com/locate/procedia 14th CIRP Conference on Intelligent Computation in Manufacturing Engineering, Gulf of Naples, Italy A Survey on Long Short-Term Memory Networks for Time Series Prediction Benjamin Lindemann*, Timo Müller, Hannes Vietz, Nasser Jazdi, Michael Weyrich Institute of Industrial Automation and Software Engineering, University of Stuttgart, Pfaffenwaldring 47, 70569 Stuttgart, Germany * Corresponding author. Tel.: +49-711-685-67321; fax: +49-711-685-67302. E-mail address: [email protected] Abstract Recurrent neural networks and exceedingly Long short-term memory (LSTM) have been investigated intensively in recent years due to their ability to model and predict nonlinear time-variant system dynamics.
    [Show full text]
  • Time-Series Prediction Approaches to Forecasting Deformation in Sentinel-1 Insar Data
    manuscript submitted to JGR: Solid Earth 1 Time-Series Prediction Approaches to Forecasting Deformation in Sentinel-1 InSAR Data 1 2 1 1 2 P. Hill , J. Biggs , V. Ponce-L´opez , D. Bull 1 3 Department of Electrical and Electronic Engineering, University of Bristol, Bristol, United Kingdom 2 4 COMET, School of Earth Sciences, University of Bristol, Bristol, United Kingdom 5 This manuscript is a preprint and has been submitted for publication in JGR-Solid Earth. 6 It has yet to undergo peer review and the manuscript has yet to be formally accepted 7 for publication. Subsequent versions of this manuscript may have slightly different con- 8 tent. {1{ manuscript submitted to JGR: Solid Earth 9 Time-Series Prediction Approaches to Forecasting 10 Deformation in Sentinel-1 InSAR Data 1 2 1 1 11 P. Hill , J. Biggs , V. Ponce-L´opez , D. Bull , 1 12 Department of Electrical and Electronic Engineering, University of Bristol, Bristol, United Kingdom 2 13 COMET, School of Earth Sciences, University of Bristol, Bristol, United Kingdom 14 Key Points: 15 • We test established time series prediction methods on 4 years of Sentinel-1 InSAR 16 data, and investigate the role of seasonality. 17 • For seasonal signals, SARIMA and machine learning (LSTM) perform best over 18 <3 months, and sinusoid extrapolation over >6 months. 19 • Forecast quality decreases for less seasonal signals, and a constant value predic- 20 tion performs best for randomly-selected datapoints. Corresponding author: Paul Hill, [email protected] {2{ manuscript submitted to JGR: Solid Earth 21 Abstract 22 Time series of displacement are now routinely available from satellite InSAR and are used 23 for flagging anomalous ground motion, but not yet forecasting.
    [Show full text]
  • Seq2seq Model Deep Learning Lecture 9
    Seq2seq model Deep Learning Lecture 9 Samuel Cheng School of ECE University of Oklahoma Spring, 2017 (Slides credit to Stanford CS20si) S. Cheng (OU-Tulsa) Seq2seq model Feb 2017 1 / 38 Table of Contents 1 Introduction 2 Neural machine translation 3 Chatbots 4 Conclusions S. Cheng (OU-Tulsa) Seq2seq model Feb 2017 2 / 38 Logistics Homework 3 due on Sunday 5% penalty per day starting next Monday Siraj is the winner for homework 3 Expect homework 3 review from Siraj next week Activity 2 will be coming up soon. As usual, first successful submission will be awarded by 3% overall bonus As winner of last activity, Siraj will be excluded for the competition Ahmad and Obada will present Keras today. Please vote accordingly S. Cheng (OU-Tulsa) Seq2seq model Feb 2017 3 / 38 Logistics Homework 3 due on Sunday 5% penalty per day starting next Monday Siraj is the winner for homework 3 Expect homework 3 review from Siraj next week Activity 2 will be coming up soon. As usual, first successful submission will be awarded by 3% overall bonus As winner of last activity, Siraj will be excluded for the competition Ahmad and Obada will present Keras today. Please vote accordingly S. Cheng (OU-Tulsa) Seq2seq model Feb 2017 3 / 38 Logistics Homework 3 due on Sunday 5% penalty per day starting next Monday Siraj is the winner for homework 3 Expect homework 3 review from Siraj next week Activity 2 will be coming up soon. As usual, first successful submission will be awarded by 3% overall bonus As winner of last activity, Siraj will be excluded for the competition Ahmad and Obada will present Keras today.
    [Show full text]