Anima Anandkumar
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  Building Machine Learning Inference Pipelines at ScaleBuilding Machine Learning inference pipelines at scale Julien Simon Global Evangelist, AI & Machine Learning @julsimon © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem statement • Real-life Machine Learning applications require more than a single model. • Data may need pre-processing: normalization, feature engineering, dimensionality reduction, etc. • Predictions may need post-processing: filtering, sorting, combining, etc. Our goal: build scalable ML pipelines with open source (Spark, Scikit-learn, XGBoost) and managed services (Amazon EMR, AWS Glue, Amazon SageMaker) © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Apache Spark https://spark.apache.org/ • Open-source, distributed processing system • In-memory caching and optimized execution for fast performance (typically 100x faster than Hadoop) • Batch processing, streaming analytics, machine learning, graph databases and ad hoc queries • API for Java, Scala, Python, R, and SQL • Available in Amazon EMR and AWS Glue © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. MLlib – Machine learning library https://spark.apache.org/docs/latest/ml-guide.html • Algorithms: classification, regression, clustering, collaborative filtering. • Featurization: feature extraction, transformation, dimensionality reduction. • Tools for constructing, evaluating and tuning pipelines • Transformer – a transform function that maps a DataFrame into a new
- 
												  Anima AnandkumarAnima Anandkumar Director of Machine Learning Research, NVIDIA "Anima is at the forefront of AI, as both a theorist and a praconer" Anima Anandkumar holds dual posions in academia and industry. She is a Bren professor at Caltech CMS department and a director of machine learning research at NVIDIA. At NVIDIA, she is leading the research group that develops next-generaon AI algorithms. At Caltech, she is the co-director of Dolcit and co-leads the AI4science iniave. TOPICS: IN DETAIL: AI Security Anima is a former Principal Scienst at Amazon Web Services (AWS) where she Artificial Intelligence was head of research and development providing enterprises with arficial Big Data intelligence tools. Anima is also a champion for enabling AI methods and tools in Empowering Women every part of the cloud and for the kind of progressive, incremental Futurism Technology improvements in performance that can make a big difference over me - the Diversity in Technology same kind of processes with which machines learn. Anima holds the endowed Machine Learning Bren chair at CalTech. She has received numerous awards and honors, including a Can Artificial Intelligence be Conscious Microso Faculty Fellowship, a Google faculty award, and a number of other too? research and best paper awards. She has been featured in several documentaries Bridging the Gap Between Artificial and and media arcles. Human Intelligence WHAT SHE OFFERS YOU: LANGUAGES: Anima Anandkumar offers that rare combinaon of leading-edge academic She presents in English. research with equally deep experience in business and praccal applicaons. She specialises in using AI in cloud compung to solve real world problems.
- 
												  Open Source in the EnterpriseOpen Source in the Enterprise Andy Oram and Zaheda Bhorat Beijing Boston Farnham Sebastopol Tokyo Open Source in the Enterprise by Andy Oram and Zaheda Bhorat Copyright © 2018 O’Reilly Media. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online edi‐ tions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Editor: Michele Cronin Interior Designer: David Futato Production Editor: Kristen Brown Cover Designer: Karen Montgomery Copyeditor: Octal Publishing Services, Inc. July 2018: First Edition Revision History for the First Edition 2018-06-18: First Release The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Open Source in the Enterprise, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors, and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the informa‐ tion and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
- 
												  Serverless Predictions at ScaleServerless Predictions at Scale Thomas Reske Global Solutions Architect, Amazon Web Services © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serverless computing allows you to build and run applications and services without thinking about servers © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What are the benefits of serverless computing? No server management Flexible scaling $ Automated high availability No excess capacity © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Machine Learning Process Fetch data Operations, Clean monitoring, and and optimization format data Integration Prepare and and deployment transform data Evaluation and Model Training verificaton © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Machine Learning Process Fetch data Operations, Clean How can I use pre-trained monitoring, and and models for serving optimization format data predictions? How to scale effectively? Integration Prepare and and deployment transform data Evaluation and Model Training verificaton © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What exactly are we deploying? © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. A Pre-trained Model... • ... is simply a model which has been trained previously • Model artifacts (structure, helper files, parameters, weights, ...) are files, e.g. in JSON or binary format • Can be serialized (saved) and de-serialized (loaded) by common deep learning frameworks • Allows for fine-tuning of models („head-start“ or „freezing“), but also to apply them to similar problems and different data („transfer learning“) • Models can be made publicly available, e.g. in the MXNet Model Zoo © 2018, Amazon Web Services, Inc.
- 
												  Analytics Lens AWS Well-Architected Framework Analytics Lens AWS Well-Architected FrameworkAnalytics Lens AWS Well-Architected Framework Analytics Lens AWS Well-Architected Framework Analytics Lens: AWS Well-Architected Framework Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon. Analytics Lens AWS Well-Architected Framework Table of Contents Abstract ............................................................................................................................................ 1 Abstract .................................................................................................................................... 1 Introduction ...................................................................................................................................... 2 Definitions ................................................................................................................................. 2 Data Ingestion Layer ........................................................................................................... 2 Data Access and Security Layer ............................................................................................ 3 Catalog and Search Layer ...................................................................................................
- 
												  Tensor Regression NetworksJournal of Machine Learning Research 21 (2020) 1-21 Submitted 7/18; Published 7/20 Tensor Regression Networks Jean Kossaifi [email protected] NVIDIA & Imperial College London Zachary C. Lipton [email protected] Carnegie Mellon University Arinbj¨ornKolbeinsson [email protected] Imperial College London Aran Khanna [email protected] Amazon AI Tommaso Furlanello [email protected] University of Southern California Anima Anandkumar [email protected] NVIDIA & California Institute of Technology Editor: Aapo Hyvarinen Abstract Convolutional neural networks typically consist of many convolutional layers followed by one or more fully connected layers. While convolutional layers map between high-order activation tensors, the fully connected layers operate on flattened activation vectors. Despite empirical success, this approach has notable drawbacks. Flattening followed by fully connected layers discards multilinear structure in the activations and requires many parameters. We address these problems by incorporating tensor algebraic operations that preserve multilinear structure at every layer. First, we introduce Tensor Contraction Layers (TCLs) that reduce the dimensionality of their input while preserving their multilinear structure using tensor contraction. Next, we introduce Tensor Regression Layers (TRLs), which express outputs through a low-rank multilinear mapping from a high-order activation tensor to an output tensor of arbitrary order. We learn the contraction and regression factors end-to-end, and produce accurate nets with fewer parameters. Additionally, our layers regularize networks by imposing low-rank constraints on the activations (TCL) and regression weights (TRL). Experiments on ImageNet show that, applied to VGG and ResNet architectures, TCLs and TRLs reduce the number of parameters compared to fully connected layers by more than 65% while maintaining or increasing accuracy.
- 
												  Ezgi Korkmaz OutlineKTH ROYAL INSTITUTE OF TECHNOLOGY BigDL: A Distributed Deep Learning Framework for Big Data ACM Symposium on Cloud Computing 2019 [Acceptance Rate: 24%] Jason Jinquan Dai, Yiheng Wang, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Cherry Li Zhang, Yan Wan, Zhichao Li, Jiao Wang, Sheng- sheng Huang, Zhongyuan Wu, Yang Wang, Yuhao Yang, Bowen She, Dongjie Shi, Qi Lu, Kai Huang, Guoqiong Song. [Intel Corporation] Presenter: Ezgi Korkmaz Outline I Deep learning frameworks I BigDL applications I Motivation for end-to-end framework I Drawbacks of prior approaches I BigDL framework I Experimental Setup and Results of BigDL framework I Critique of the paper 2/16 Deep Learning Frameworks I Big demand from organizations to apply deep learning to big data I Deep learning frameworks: I Torch [2002 Collobert et al.] [ C, Lua] I Caffe [2014 Berkeley BAIR] [C++] I TensorFlow [2015 Google Brain] [C++, Python, CUDA] I Apache MXNet [2015 Apache Software Foundation] [C++] I Chainer [2015 Preferred Networks] [ Python] I Keras [2016 Francois Chollet] [Python] I PyTorch [2016 Facebook] [Python, C++, CUDA] I Apache Spark is an open-source distributed general-purpose cluster-computing framework. I Provides interface for programming clusters with data parallelism 3/16 BigDL I A library on top of Apache Spark I Provides integrated data-analytics within a unifed data analysis pipeline I Allows users to write their own deep learning applications I Running directly on big data clusters I Supports similar API to Torch and Keras I Supports both large scale distributed training and inference I Able to run across hundreds or thousands servers efficiently by uses underlying Spark framework 4/16 BigDL I Developed as an open source project I Used by I Mastercard I WorldBank I Cray I Talroo I UCSF I JD I UnionPay I GigaSpaces I Wide range of applications: transfer learning based image classification, object detection, feature extraction, sequence-to-sequence prediction, neural collaborative filtering for reccomendation etc.
- 
												  Deep Learning in Unconventional DomainsDeep Learning in Unconventional Domains Thesis by Milan Cvitkovic In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy CALIFORNIA INSTITUTE OF TECHNOLOGY Pasadena, California 2020 Defended March 10, 2020 ii © 2020 Milan Cvitkovic ORCID: 0000-0003-4188-452X All rights reserved iii ACKNOWLEDGEMENTS To my parents, Kate Turpin and Michael Cvitkovic, and my sister, Adriana Cvitkovic: thank you for your boundless love and support. And all the teasing. To my thesis committee, Prof. Anima Anandkumar, Prof. Yisong Yue, Prof. Adam Wierman, and Prof. Thomas Vidick: thank you for your invaluable guidance and encouragement throughout my time at Caltech, both in your roles as committee members and beyond. To Prof. Anima Anandkumar, Prof. Pietro Perona, and Prof. Stefano Soatto: thank you for facilitating my affiliation with Amazon Web Services, without which this dissertation would never have been possible. To all my coauthors who helped with the work in this thesis — Badal Singh, Prof. Anima Anandkumar, Dr. Günther Koliander, and Dr. Zohar Karnin — and to all my other collaborators during my PhD — Joe Marino, Dr. Grant Van Horn, Dr. Hamed Alemohammad, Prof. Yisong Yue, Keng Wah Loon, Laura Graesser, Jiahao Su, and Prof. Furong Huang: thank you for your generosity and your brilliance. To Lee, Carmen, Kirsten, Leslie, James, Alex, Josh, Claudia, and Patricia: thank you for being cheerful stewards of daily rituals. And most of all to my advisor, Prof. Thomas Vidick: thank you for your unwavering support of a non-traditional advisee. iv ABSTRACT Machine learning methods have dramatically improved in recent years thanks to advances in deep learning (LeCun, Bengio, and Hinton, 2015), a set of methods for training high-dimensional, highly-parameterized, nonlinear functions.
- 
												  Apache Mxnet on AWS Developer Guide Apache Mxnet on AWS Developer GuideApache MXNet on AWS Developer Guide Apache MXNet on AWS Developer Guide Apache MXNet on AWS: Developer Guide Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon. Apache MXNet on AWS Developer Guide Table of Contents What Is Apache MXNet? ...................................................................................................................... 1 iii Apache MXNet on AWS Developer Guide What Is Apache MXNet? Apache MXNet (MXNet) is an open source deep learning framework that allows you to define, train, and deploy deep neural networks on a wide array of platforms, from cloud infrastructure to mobile devices. It is highly scalable, which allows for fast model training, and it supports a flexible programming model and multiple languages. The MXNet library is portable and lightweight. It scales seamlessly on multiple GPUs on multiple machines. MXNet supports programming in various languages including Python, R, Scala, Julia, and Perl. This user guide has been deprecated and is no longer available. For more information on MXNet and related material, see the topics below. MXNet MXNet is an Apache open source project. For more information about MXNet, see the following open source documentation: • Getting Started – Provides details for setting up MXNet on various platforms, such as macOS, Windows, and Linux.
- 
												  Open Source in the Enterpriseopensource.amazon.com 企业中的开放源代码 Andy Oram 和 Zaheda Bhorat 企业中的开放源代码 作者:Andy Oram 和 Zaheda Bhorat 版权 © 2018 O'Reilly Media 所有。保留所有权利。 本书采用知识共享组织署名-非商业性使用 4.0 国际许可协议进行授权。要查看该许可, 请访问 http://creativecommons.org/licenses/by-nc/4.0/ 或致函以下地址:Creative Commons, PO Box 1866, Mountain View, CA 94042, USA。 本书中表达的观点是作者的观点,并不代表出版商的看法。虽然出版商和作者都真诚地付出 努力来确保本书中所含的信息和说明准确无误,但双方对错误或遗漏之处(包括但不限于因 使用或依据本书而导致的任何损失)概不负责。使用本书中所含的信息和说明带来的风险将 由您自行承担。如果本书中包含或介绍的任何代码示例或其他技术需遵循开源许可证或涉及 他人知识产权,您应负责确保对相应代码示例和技术的使用遵循此类许可证和/或知识产权 规定。 本书由 O'Reilly 与 Amazon 协作完成。 目录 鸣谢 ................................................................................................................................... vii 企业中的开放源代码 ..................................................................................................... 1 为什么公司和政府均改为使用开放源代码? ..................................................................2 不仅仅需要许可证或代码 .....................................................................................................4 了解开放源代码的准备工作 ................................................................................................5 采用开放源代码 ......................................................................................................................7 参与项目社区 ........................................................................................................................ 13 为开源项目做贡献 ............................................................................................................... 17 启动开源项目 .......................................................................................................................
- 
												![Arxiv:1907.04433V2 [Cs.LG] 13 Feb 2020 by Gluoncv and Gluonnlp to Allow for Software Distribution, Modification, and Usage](https://docslib.b-cdn.net/cover/8288/arxiv-1907-04433v2-cs-lg-13-feb-2020-by-gluoncv-and-gluonnlp-to-allow-for-software-distribution-modi-cation-and-usage-2258288.webp)  Arxiv:1907.04433V2 [Cs.LG] 13 Feb 2020 by Gluoncv and Gluonnlp to Allow for Software Distribution, Modification, and UsageJournal of Machine Learning Research 21 (2020) 1-7 Submitted 5/19; Revised 1/20; Published 2/20 GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing Jian Guo [email protected] University of Michigan MI, USA He He [email protected] Tong He [email protected] Leonard Lausen [email protected] ∗Mu Li [email protected] Haibin Lin [email protected] Xingjian Shi [email protected] Chenguang Wang [email protected] Junyuan Xie [email protected] Sheng Zha [email protected] Aston Zhang [email protected] Hang Zhang [email protected] Zhi Zhang [email protected] Zhongyue Zhang [email protected] Shuai Zheng [email protected] Yi Zhu [email protected] Amazon Web Services, CA, USA Editor: Antti Honkela Abstract We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flex- ible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted arXiv:1907.04433v2 [cs.LG] 13 Feb 2020 by GluonCV and GluonNLP to allow for software distribution, modification, and usage. Keywords: Machine Learning, Deep Learning, Apache MXNet, Computer Vision, Nat- ural Language Processing 1. Introduction Deep learning, a sub-field of machine learning research, has driven the rapid progress in artificial intelligence research, leading to astonishing breakthroughs on long-standing prob- lems in a plethora of fields such as computer vision and natural language processing.
- 
												  Communication-Efficient Distributed SGD with SketchingCommunication-efficient Distributed SGD with Sketching Nikita Ivkin ∗y Daniel Rothchild ∗ Enayat Ullah ∗ Amazon UC Berkeley Johns Hopkins University [email protected] [email protected] [email protected] Vladimir Braverman z Ion Stoica Raman Arora Johns Hopkins University UC Berkeley Johns Hopkins University [email protected] [email protected] [email protected] Abstract Large-scale distributed training of neural networks is often limited by network band- width, wherein the communication time overwhelms the local computation time. Motivated by the success of sketching methods in sub-linear/streaming algorithms, we introduce SKETCHED-SGD4, an algorithm for carrying out distributed SGD by communicating sketches instead of full gradients. We show that SKETCHED-SGD has favorable convergence rates on several classes of functions. When considering all communication – both of gradients and of updated model weights – SKETCHED- SGD reduces the amount of communication required compared to other gradient compression methods from O(d) or O(W ) to O(log d), where d is the number of model parameters and W is the number of workers participating in training. We run experiments on a transformer model, an LSTM, and a residual network, demonstrating up to a 40x reduction in total communication cost with no loss in final model performance. We also show experimentally that SKETCHED-SGD scales to at least 256 workers without increasing communication cost or degrading model performance. 1 Introduction Modern machine learning training workloads are commonly distributed across many machines using data-parallel synchronous stochastic gradient descent. At each iteration, W worker nodes split a mini-batch of size B; each worker computes the gradient of the loss on its portion of the data, and then a parameter server sums each worker’s gradient to yield the full mini-batch gradient.