Automatic Differentiation in Machine Learning: a Survey
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Understanding Binomial Sequential Testing Table of Contents of Two START Sheets
Selected Topics in Assurance Related Technologies START Volume 12, Number 2 Understanding Binomial Sequential Testing Table of Contents of two START sheets. In this first START Sheet, we start by exploring double sampling plans. From there, we proceed to • Introduction higher dimension sampling plans, namely sequential tests. • Double Sampling (Two Stage) Testing Procedures We illustrate their discussion via numerical and practical examples of sequential tests for attributes (pass/fail) data that • The Sequential Probability Ratio Test (SPRT) follow the Binomial distribution. Such plans can be used for • The Average Sample Number (ASN) Quality Control as well as for Life Testing problems. We • Conclusions conclude with a discussion of the ASN or “expected sample • For More Information number,” a performance measure widely used to assess such multi-stage sampling plans. In the second START Sheet, we • About the Author will discuss sequential testing plans for continuous data (vari- • Other START Sheets Available ables), following the same scheme used herein. Introduction Double Sampling (Two Stage) Testing Determining the sample size “n” required in testing and con- Procedures fidence interval (CI) derivation is of importance for practi- tioners, as evidenced by the many related queries that we In hypothesis testing, we often define a Null (H0) that receive at the RAC in this area. Therefore, we have expresses the desired value for the parameter under test (e.g., addressed this topic in a number of START sheets. For exam- acceptable quality). Then, we define the Alternative (H1) as ple, we discussed the problem of calculating the sample size the unacceptable value for such parameter. -
Reading Inventory Technical Guide
Technical Guide Using a Valid and Reliable Assessment of College and Career Readiness Across Grades K–12 Technical Guide Excepting those parts intended for classroom use, no part of this publication may be reproduced in whole or in part, or stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission of the publisher. For information regarding permission, write to Scholastic Inc., 557 Broadway, New York, NY 10012. Scholastic Inc. grants teachers who have purchased SRI College & Career permission to reproduce from this book those pages intended for use in their classrooms. Notice of copyright must appear on all copies of copyrighted materials. Portions previously published in: Scholastic Reading Inventory Target Success With the Lexile Framework for Reading, copyright © 2005, 2003, 1999; Scholastic Reading Inventory Using the Lexile Framework, Technical Manual Forms A and B, copyright © 1999; Scholastic Reading Inventory Technical Guide, copyright © 2007, 2001, 1999; Lexiles: A System for Measuring Reader Ability and Text Difficulty, A Guide for Educators, copyright © 2008; iRead Screener Technical Guide by Richard K. Wagner, copyright © 2014; Scholastic Inc. Copyright © 2014, 2008, 2007, 1999 by Scholastic Inc. All rights reserved. Published by Scholastic Inc. ISBN-13: 978-0-545-79638-5 ISBN-10: 0-545-79638-5 SCHOLASTIC, READ 180, SCHOLASTIC READING COUNTS!, and associated logos are trademarks and/or registered trademarks of Scholastic Inc. LEXILE and LEXILE FRAMEWORK are registered trademarks of MetaMetrics, Inc. Other company names, brand names, and product names are the property and/or trademarks of their respective owners. -
University of North Carolina at Charlotte Belk College of Business
University of North Carolina at Charlotte Belk College of Business BPHD 8220 Financial Bayesian Analysis Fall 2019 Course Time: Tuesday 12:20 - 3:05 pm Location: Friday Building 207 Professor: Dr. Yufeng Han Office Location: Friday Building 340A Telephone: (704) 687-8773 E-mail: [email protected] Office Hours: by appointment Textbook: Introduction to Bayesian Econometrics, 2nd Edition, by Edward Greenberg (required) An Introduction to Bayesian Inference in Econometrics, 1st Edition, by Arnold Zellner Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan, 2nd Edition, by Joseph M. Hilbe, de Souza, Rafael S., Emille E. O. Ishida Bayesian Analysis with Python: Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ, 2nd Edition, by Osvaldo Martin The Oxford Handbook of Bayesian Econometrics, 1st Edition, by John Geweke Topics: 1. Principles of Bayesian Analysis 2. Simulation 3. Linear Regression Models 4. Multivariate Regression Models 5. Time-Series Models 6. State-Space Models 7. Volatility Models Page | 1 8. Endogeneity Models Software: 1. Stan 2. Edward 3. JAGS 4. BUGS & MultiBUGS 5. Python Modules: PyMC & ArviZ 6. R, STATA, SAS, Matlab, etc. Course Assessment: Homework assignments, paper presentation, and replication (or project). Grading: Homework: 30% Paper presentation: 40% Replication (project): 30% Selected Papers: Vasicek, O. A. (1973), A Note on Using Cross-Sectional Information in Bayesian Estimation of Security Betas. Journal of Finance, 28: 1233-1239. doi:10.1111/j.1540- 6261.1973.tb01452.x Shanken, J. (1987). A Bayesian approach to testing portfolio efficiency. Journal of Financial Economics, 19(2), 195–215. https://doi.org/10.1016/0304-405X(87)90002-X Guofu, C., & Harvey, R. -
Lecture 6 Learned Feedforward Visual Processing Neural Networks, Deep Learning, Convnets
William T. Freeman, Antonio Torralba, 2017 Lecture 6 Learned feedforward visual processing Neural Networks, Deep learning, ConvNets Some slides modified from R. Fergus We need translation invariance Lots of useful linear filters… Laplacian Gaussian derivative Gaussian Gabor And many more… High order Gaussian derivatives We need translation and scale invariance Lots of image pyramids… Gaussian Pyr Laplacian Pyr And many more: QMF, steerable, … We need … What is the best representation? • All the previous representation are manually constructed. • Could they be learnt from data? A brief history of Neural Networks enthusiasm time Perceptrons, 1958 Rosenblatt http://www.ecse.rpi.edu/homepages/nagy/PDF_chrono/2011_Na gy_Pace_FR.pdf. Photo by George Nagy 9 http://www.manhattanrarebooks-science.com/rosenblatt.htm Perceptrons, 1958 10 Perceptrons, 1958 enthusiasm time Minsky and Papert, Perceptrons, 1972 12 Perceptrons, 1958 enthusiasm Minsky and Papert, 1972 time Parallel Distributed Processing (PDP), 1986 14 XOR problem Inputs Output 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 PDP authors pointed to the backpropagation algorithm as a breakthrough, allowing multi-layer neural networks to be trained. Among the functions that a multi-layer network can represent but a single-layer network cannot: the XOR function. 15 Perceptrons, PDP book, 1958 1986 enthusiasm Minsky and Papert, 1972 time LeCun conv nets, 1998 Demos: http://yann.lecun.com/exdb/lenet/index.html 17 18 Neural networks to recognize handwritten digits? yes Neural networks for tougher problems? not really http://pub.clement.farabet.net/ecvw09.pdf 19 NIPS 2000 • NIPS, Neural Information Processing Systems, is the premier conference on machine learning. -
Comparative Study of Deep Learning Software Frameworks
Comparative Study of Deep Learning Software Frameworks Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, Mohak Shah Research and Technology Center, Robert Bosch LLC {Soheil.Bahrampour, Naveen.Ramakrishnan, fixed-term.Lukas.Schott, Mohak.Shah}@us.bosch.com ABSTRACT such as dropout and weight decay [2]. As the popular- Deep learning methods have resulted in significant perfor- ity of the deep learning methods have increased over the mance improvements in several application domains and as last few years, several deep learning software frameworks such several software frameworks have been developed to have appeared to enable efficient development and imple- facilitate their implementation. This paper presents a com- mentation of these methods. The list of available frame- parative study of five deep learning frameworks, namely works includes, but is not limited to, Caffe, DeepLearning4J, Caffe, Neon, TensorFlow, Theano, and Torch, on three as- deepmat, Eblearn, Neon, PyLearn, TensorFlow, Theano, pects: extensibility, hardware utilization, and speed. The Torch, etc. Different frameworks try to optimize different as- study is performed on several types of deep learning ar- pects of training or deployment of a deep learning algorithm. chitectures and we evaluate the performance of the above For instance, Caffe emphasises ease of use where standard frameworks when employed on a single machine for both layers can be easily configured without hard-coding while (multi-threaded) CPU and GPU (Nvidia Titan X) settings. Theano provides automatic differentiation capabilities which The speed performance metrics used here include the gradi- facilitates flexibility to modify architecture for research and ent computation time, which is important during the train- development. Several of these frameworks have received ing phase of deep networks, and the forward time, which wide attention from the research community and are well- is important from the deployment perspective of trained developed allowing efficient training of deep networks with networks. -
Comparative Study of Caffe, Neon, Theano, and Torch
Workshop track - ICLR 2016 COMPARATIVE STUDY OF CAFFE,NEON,THEANO, AND TORCH FOR DEEP LEARNING Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, Mohak Shah Bosch Research and Technology Center fSoheil.Bahrampour,Naveen.Ramakrishnan, fixed-term.Lukas.Schott,[email protected] ABSTRACT Deep learning methods have resulted in significant performance improvements in several application domains and as such several software frameworks have been developed to facilitate their implementation. This paper presents a comparative study of four deep learning frameworks, namely Caffe, Neon, Theano, and Torch, on three aspects: extensibility, hardware utilization, and speed. The study is per- formed on several types of deep learning architectures and we evaluate the per- formance of the above frameworks when employed on a single machine for both (multi-threaded) CPU and GPU (Nvidia Titan X) settings. The speed performance metrics used here include the gradient computation time, which is important dur- ing the training phase of deep networks, and the forward time, which is important from the deployment perspective of trained networks. For convolutional networks, we also report how each of these frameworks support various convolutional algo- rithms and their corresponding performance. From our experiments, we observe that Theano and Torch are the most easily extensible frameworks. We observe that Torch is best suited for any deep architecture on CPU, followed by Theano. It also achieves the best performance on the GPU for large convolutional and fully connected networks, followed closely by Neon. Theano achieves the best perfor- mance on GPU for training and deployment of LSTM networks. Finally Caffe is the easiest for evaluating the performance of standard deep architectures. -
Tensorflow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction
TensorFlow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction TensorFlow Google Brain, 2015 (rewritten DistBelief) Theano University of Montréal, 2009 Keras François Chollet, 2015 (now at Google) Torch Facebook AI Research, Twitter, Google DeepMind Caffe Berkeley Vision and Learning Center (BVLC), 2013 Outline 1. Introduction of each framework a. TensorFlow b. Theano c. Keras d. Torch e. Caffe 2. Further comparison a. Code + models b. Community and documentation c. Performance d. Model deployment e. Extra features 3. Which framework to choose when ..? Introduction of each framework TensorFlow architecture 1) Low-level core (C++/CUDA) 2) Simple Python API to define the computational graph 3) High-level API (TF-Learn, TF-Slim, soon Keras…) TensorFlow computational graph - auto-differentiation! - easy multi-GPU/multi-node - native C++ multithreading - device-efficient implementation for most ops - whole pipeline in the graph: data loading, preprocessing, prefetching... TensorBoard TensorFlow development + bleeding edge (GitHub yay!) + division in core and contrib => very quick merging of new hotness + a lot of new related API: CRF, BayesFlow, SparseTensor, audio IO, CTC, seq2seq + so it can easily handle images, videos, audio, text... + if you really need a new native op, you can load a dynamic lib - sometimes contrib stuff disappears or moves - recently introduced bells and whistles are barely documented Presentation of Theano: - Maintained by Montréal University group. - Pioneered the use of a computational graph. - General machine learning tool -> Use of Lasagne and Keras. - Very popular in the research community, but not elsewhere. Falling behind. What is it like to start using Theano? - Read tutorials until you no longer can, then keep going. -
DIY Deep Learning for Vision: the Caffe Framework
DIY Deep Learning for Vision: the Caffe framework caffe.berkeleyvision.org github.com/BVLC/caffe Evan Shelhamer adapted from the Caffe tutorial with Jeff Donahue, Yangqing Jia, and Ross Girshick. Why Deep Learning? The Unreasonable Effectiveness of Deep Features Classes separate in the deep representations and transfer to many tasks. [DeCAF] [Zeiler-Fergus] Why Deep Learning? The Unreasonable Effectiveness of Deep Features Maximal activations of pool5 units [R-CNN] conv5 DeConv visualization Rich visual structure of features deep in hierarchy. [Zeiler-Fergus] Why Deep Learning? The Unreasonable Effectiveness of Deep Features 1st layer filters image patches that strongly activate 1st layer filters [Zeiler-Fergus] What is Deep Learning? Compositional Models Learned End-to-End What is Deep Learning? Compositional Models Learned End-to-End Hierarchy of Representations - vision: pixel, motif, part, object - text: character, word, clause, sentence - speech: audio, band, phone, word concrete abstract learning What is Deep Learning? Compositional Models Learned End-to-End figure credit Yann LeCun, ICML ‘13 tutorial What is Deep Learning? Compositional Models Learned End-to-End Back-propagation: take the gradient of the model layer-by-layer by the chain rule to yield the gradient of all the parameters. figure credit Yann LeCun, ICML ‘13 tutorial What is Deep Learning? Vast space of models! Caffe models are loss-driven: - supervised - unsupervised slide credit Marc’aurelio Ranzato, CVPR ‘14 tutorial. Convolutional Neural Nets (CNNs): 1989 LeNet: a layered model composed of convolution and subsampling operations followed by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ] Convolutional Nets: 2012 AlexNet: a layered model composed of convolution, + data subsampling, and further operations followed by a holistic + gpu representation and all-in-all a landmark classifier on + non-saturating nonlinearity ILSVRC12. -
Reconciling Situation Calculus and Fluent Calculus
Reconciling Situation Calculus and Fluent Calculus Stephan Schiffel and Michael Thielscher Department of Computer Science Dresden University of Technology stephan.schiffel,mit @inf.tu-dresden.de { } Abstract between domain descriptions in both calculi. Another ben- efit of a formal relation between the SC and the FC is the The Situation Calculus and the Fluent Calculus are successful possibility to compare extensions made separately for the action formalisms that share many concepts. But until now two calculi, such as concurrency. Furthermore it might even there is no formal relation between the two calculi that would allow to formally analyze the relationship between the two allow to translate current or future extensions made only for approaches as well as between the programming languages one of the calculi to the other. based on them, Golog and FLUX. Furthermore, such a formal Golog is a programming language for intelligent agents relation would allow to combine Golog and FLUX and to ana- that combines elements from classical programming (condi- lyze which of the underlying computation principles is better tionals, loops, etc.) with reasoning about actions. Primitive suited for different classes of programs. We develop a for- statements in Golog programs are actions to be performed mal translation between domain axiomatizations of the Situ- by the agent. Conditional statements in Golog are composed ation Calculus and the Fluent Calculus and present a Fluent of fluents. The execution of a Golog program requires to Calculus semantics for Golog programs. For domains with reason about the effects of the actions the agent performs, deterministic actions our approach allows an automatic trans- lation of Golog domain descriptions and execution of Golog in order to determine the values of fluents when evaluating programs with FLUX. -
Caffe in Practice
This Business of Brewing: Caffe in Practice caffe.berkeleyvision.org Evan Shelhamer github.com/BVLC/caffe from the tutorial by Evan Shelhamer, Jeff Donahue, Yangqing Jia, and Ross Girshick Deep Learning, as it is executed... What should a framework handle? Compositional Models Decompose the problem and code! End-to-End Learning Solve and check! Vast Space of Architectures and Tasks Define, experiment, and extend! Frameworks ● Torch7 ○ NYU ○ scientific computing framework in Lua ○ supported by Facebook ● Theano/Pylearn2 ○ U. Montreal ○ scientific computing framework in Python ○ symbolic computation and automatic differentiation ● Cuda-Convnet2 ○ Alex Krizhevsky ○ Very fast on state-of-the-art GPUs with Multi-GPU parallelism ○ C++ / CUDA library Framework Comparison ● More alike than different ○ All express deep models ○ All are nicely open-source ○ All include scripting for hacking and prototyping ● No strict winners – experiment and choose the framework that best fits your work ● We like to brew our deep networks with Caffe Why Caffe? In one sip… ● Expression: models + optimizations are plaintext schemas, not code. ● Speed: for state-of-the-art models and massive data. ● Modularity: to extend to new tasks and architectures. ● Openness: common code and reference models for reproducibility. ● Community: joint discussion, development, and modeling. CAFFE EXAMPLES + APPLICATIONS Share a Sip of Brewed Models demo.caffe.berkeleyvision.org demo code open-source and bundled Scene Recognition by MIT Places CNN demo B. Zhou et al. NIPS 14 Object Detection R-CNN: Regions with Convolutional Neural Networks http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/detection.ipynb Full R-CNN scripts available at https://github.com/rbgirshick/rcnn Ross Girshick et al. -
An Introduction to Data Analysis Using the Pymc3 Probabilistic Programming Framework: a Case Study with Gaussian Mixture Modeling
An introduction to data analysis using the PyMC3 probabilistic programming framework: A case study with Gaussian Mixture Modeling Shi Xian Liewa, Mohsen Afrasiabia, and Joseph L. Austerweila aDepartment of Psychology, University of Wisconsin-Madison, Madison, WI, USA August 28, 2019 Author Note Correspondence concerning this article should be addressed to: Shi Xian Liew, 1202 West Johnson Street, Madison, WI 53706. E-mail: [email protected]. This work was funded by a Vilas Life Cycle Professorship and the VCRGE at University of Wisconsin-Madison with funding from the WARF 1 RUNNING HEAD: Introduction to PyMC3 with Gaussian Mixture Models 2 Abstract Recent developments in modern probabilistic programming have offered users many practical tools of Bayesian data analysis. However, the adoption of such techniques by the general psychology community is still fairly limited. This tutorial aims to provide non-technicians with an accessible guide to PyMC3, a robust probabilistic programming language that allows for straightforward Bayesian data analysis. We focus on a series of increasingly complex Gaussian mixture models – building up from fitting basic univariate models to more complex multivariate models fit to real-world data. We also explore how PyMC3 can be configured to obtain significant increases in computational speed by taking advantage of a machine’s GPU, in addition to the conditions under which such acceleration can be expected. All example analyses are detailed with step-by-step instructions and corresponding Python code. Keywords: probabilistic programming; Bayesian data analysis; Markov chain Monte Carlo; computational modeling; Gaussian mixture modeling RUNNING HEAD: Introduction to PyMC3 with Gaussian Mixture Models 3 1 Introduction Over the last decade, there has been a shift in the norms of what counts as rigorous research in psychological science. -
The Features-And-Fluents Semantics for the Fluent Calculus
The Features-and-Fluents Semantics for the Fluent Calculus Michael Thielscher and Thomas Witkowski Department of Computer Science Dresden University of Technology, Germany Abstract fluent calculus, is expressive enough to be applicable to the full test suite of example reasoning problems in (Sandewall Based on an elaborate ontological taxonomy, the Features- 1994), let alone to be provably sound and complete wrt. one and-Fluents framework provides an independent action se- of the more expressive ontological classes in the taxonomy. mantics for assessing the range of applicability of action cal- culi. In this paper, we show how the fluent calculus can In this paper, we present a version of the fluent calculus be used to capture the full range of phenomena in K-IA, that is sufficiently expressive to capture K-IA, which is the the broadest ontological class that has been fully formalized broadest class that has been rigorously formalized and in- in (Sandewall 1994). To this end, we develop a significant tensively studied in (Sandewall 1994) and in which correct extension of the fluent calculus for modeling actions with du- knowledge and a fully inertial world is assumed. To this rations and with specific trajectories of changes. We present a end, we develop a significant extension of the basic fluent provably correct translation of scenario descriptions from the calculus by introducing an explicit model for the duration Features-and-Fluents semantics into fluent calculus axioma- of actions and for trajectories of changes. On this basis, we tizations. present a translation function that maps any K-IA scenario into a fluent calculus axiomatization, and we prove that the Introduction intended models of the former coincide with the classical models of the latter.