Scalable Distributed DNN Training Using Tensorflow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

Total Page:16

File Type:pdf, Size:1020Kb

Scalable Distributed DNN Training Using Tensorflow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation —Submitted to IEEE IPDPS 2019 (Main Track) for Peer Review— 1 Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation Ammar Ahmad Awan, Ching-Hsiang Chu, Jeroen Bedorf´ Hari Subramoni, and Dhabaleswar K. Panda Minds.ai Department of Computer Science and Engineering Santa Cruz, the United States The Ohio State University [email protected] fawan.10, chu.368, subramoni.1, [email protected] Leiden Observatory, Leiden University Leiden, the Netherlands Abstract—The current wave of advances in Machine Learn- To this end, the current resurgence and a renewed interest ing (ML) and Deep Learning (DL) have been triggered by in DL and DNN-based solutions to classical as well as new the availability of large-scale datasets, efficient CPU and GPU Machine Learning (ML) problems can be attributed to the hardware, and development of easy-to-use software frameworks like TensorFlow (TF), Caffe and Torch. TensorFlow has been, by widespread availability of 1) versatile and large-scale datasets far, the most widely adopted ML/DL framework. However, little like ImageNet [1], and 2) efficient computation capabilities in exists in the literature that provides a thorough understanding modern hardware architectures like Graphics Processing Units of the capabilities which TensorFlow offers for the distributed (GPUs) and multi-/many-core CPUs. These two trends have training of large ML/DL models that need computation and positively influenced the development of several high-level communication at scale. Most commonly used distributed train- ing approaches for TF can be categorized as follows: 1) Google DL toolkits like Caffe [9], [25], Microsoft Cognitive Toolkit Remote Procedure Call (gRPC), 2) gRPC+‘X’: X = (InfiniBand (CNTK), Facebook PyTorch [5], and Google TensorFlow [11]. Verbs, Message Passing Interface (MPI), and GPUDirect RDMA), Implementing DNN and back-propagation techniques in an and 3) No-gRPC: Baidu Allreduce with MPI, Horovod with efficient manner has been a challenging problem. However, MPI, and Horovod with NVIDIA NCCL. In this paper, we these toolkits have played a crucial role in making ML and provide an in-depth performance characterization and analysis of these distributed training approaches on various GPU clusters DL more accessible for both academic as well as industry- including the Piz Daint system (#6 on Top500). We perform based researchers in various fields. In the context of DL experiments to gain novel insights along the following vectors: 1) frameworks, it is pertinent to mention that TensorFlow is Application-level scalability of DNN training, 2) Effect of Batch the most popular DL framework and has seen widespread Size on scaling efficiency, 3) Impact of the MPI library used for adoption. Today, more than 1,600 people have contributed to no-gRPC approaches, and 4) Type and size of DNN architectures (e.g ResNet vs. MobileNet). Based on these experiments, we the TensorFlow GitHub repository [6] and several hundred present two key insights: 1) Overall, No-gRPC designs achieve research papers have utilized TensorFlow for both research better performance compared to gRPC-based approaches for and commercial applications. However, TensorFlow in its early most configurations, and 2) The performance of No-gRPC is days was criticized for its slower performance [39] as well as heavily influenced by the gradient aggregation using the Allre- lack of support for efficient distributed training and limited duce communication pattern. Finally, we propose a truly CUDA- Aware MPI Allreduce design that exploits 1) CUDA kernels to support for High Performance Computing (HPC) systems. To perform large reductions on the GPU and 2) A pointer cache this end, recent efforts by the TF developers as well as the to avoid overheads involved in queries to the CUDA driver. Our open source community are commendable and performance proposed designs have been implemented in MVAPICH2-GDR has significantly improved for both single-GPU/single-node arXiv:1810.11112v1 [cs.DC] 25 Oct 2018 and offer 5-17× better performance than NCCL2 for small and as well as multi-GPU/multi-node (distributed) training. The medium messages, and reduces latency by 29% for large messages on 16 GPUs (nodes). The proposed optimizations help Horovod- gRPC [19] library, which is the official distributed training MPI to achieve approximately 90% scaling efficiency for ResNet- infrastructure for TF, has been optimized for tensor transfers 50 training on 64 GPUs. Further, Horovod-MPI achieves 1.8× (fewer memory operations), but still uses the relatively slow and 3.2× higher throughput than the native gRPC method for standard Ethernet networks. However, gRPC can take advan- ResNet-50 and MobileNet, respectively, on the Piz Daint cluster. tage of InfiniBand (IB) using the IP over IB (IPoIB) protocol which offers significantly better performance. At the same I. INTRODUCTION time, the community has been actively exploring Message Deep Learning (DL) has been a significant contributor to Passing Interface (MPI) – a de facto standard for the HPC the recent achievements in the Artificial Intelligence (AI) community – based designs to improve distributed training on realm. Novel approaches like back-propagation in Deep Neural HPC clusters. However, the active interest and contributions Networks (DNNs) were investigated around the 1980 time from the community have led to several disjoint efforts and frame [36]. However, the potential of these approaches was fragmentation in the way users can take advantage of the marred by slow hardware and lack of sufficient training data. advanced distributed training designs in TF. The two broad —Submitted to IEEE IPDPS 2019 (Main Track) for Peer Review— 2 challenges that we investigate in this paper are: ”1) What is the speed has forced the ML/DL framework designers to rethink most efficient tensor communication (for gradient aggregation) their strategy and to start utilizing existing communication framework for TensorFlow and 2) How can we improve the schemes or design their own libraries from scratch. Microsoft performance of this communication using CUDA-Aware MPI? Cognitive Toolkit (CNTK) [29] is based on an MPI design Several detailed questions follow this broad challenge. It is whereas Caffe2 [9] uses the Gloo [10] collective communi- pertinent to note that little in existing literature can offer cation library developed by Facebook, which is similar to insights to the following key questions, which are of significant NVIDIA’s NCCL [32] library. Gloo exploits the InfiniBand interest if large-scale distributed training is employed. verbs interface and Remote Direct Memory Access (RDMA) • What are the available choices for distributed training technology to offer various reduction algorithms. Apart from using TensorFlow for a given execution platform? the communication libraries that come integrated with the • What are the key features and performance characteristics frameworks, there are external (commercial) efforts to enable of the various distributed training approaches for Tensor- efficient distributed training. For example, IBM has developed Flow? the PowerAI Distributed Deep-learning Library (DDL), which • How can we optimize the performance of distributed uses a multi-ring reduction algorithm. The library is built on training using TensorFlow on modern HPC systems? top of IBM’s Spectrum MPI (a derivative of OpenMPI) and as such supports all the network interfaces that are supported A. Contributions by the OpenMPI library. According to IBM, the library can be To the best of our knowledge, this is the first paper that integrated with TensorFlow, Caffe and Torch. Intel has devel- offers a comprehensive landscape that highlights, evaluates, oped the Intel Machine Learning Scaling Library (MLSL) [40]. and optimizes a diverse set of approaches that deal with This library is built on top of Intel MPI and therefore supports distributed DNN training using TensorFlow at scale. In this various interconnects such as InfiniBand, Omni-Path, and paper, we make the following key contributions: Ethernet. The library offers a set of communication primi- • We provide an in-depth categorization, design analysis, tives that neural network frameworks can take advantage of and performance characterization of distributed training when performing distributed communication. According to the approaches for TensorFlow using state-of-the-art DNNs research paper it is integrated with Intel-Caffe, TensorFlow, like ResNet-50, MobileNet, and NASNet. and Intel’s own neural-net compiler called nGraph. To bring • We propose a truly CUDA-Aware MPI Allreduce design support to TensorFlow, the library modifies the Horovod [37] that exploits 1) CUDA kernels to perform large reductions library by inserting the MLSL communication primitives. on the GPU and 2) A pointer cache to avoid overheads B. Communication Libraries for TensorFlow involved in queries to the CUDA driver. Our focus in this paper is on distributed training using • We illustrate benefits of the proposed MPI Allreduce TensorFlow, which, by default, can be realized using the optimizations using micro-benchmarks as well as appli- official Google RPC (gRPC) library. gRPC is a generic point- cation workloads (tf cnn benchmarks) using TensorFlow to-point communication library and has no collective com- and Horovod. munication support. It works in a simple client/server style • We present a comprehensive and large-scale performance fashion and has been tightly integrated with TF. However,
Recommended publications
  • Protolite: Highly Optimized Protocol Buffer Serializers
    Package ‘protolite’ July 28, 2021 Type Package Title Highly Optimized Protocol Buffer Serializers Author Jeroen Ooms Maintainer Jeroen Ooms <[email protected]> Description Pure C++ implementations for reading and writing several common data formats based on Google protocol-buffers. Currently supports 'rexp.proto' for serialized R objects, 'geobuf.proto' for binary geojson, and 'mvt.proto' for vector tiles. This package uses the auto-generated C++ code by protobuf-compiler, hence the entire serialization is optimized at compile time. The 'RProtoBuf' package on the other hand uses the protobuf runtime library to provide a general- purpose toolkit for reading and writing arbitrary protocol-buffer data in R. Version 2.1.1 License MIT + file LICENSE URL https://github.com/jeroen/protolite BugReports https://github.com/jeroen/protolite/issues SystemRequirements libprotobuf and protobuf-compiler LinkingTo Rcpp Imports Rcpp (>= 0.12.12), jsonlite Suggests spelling, curl, testthat, RProtoBuf, sf RoxygenNote 6.1.99.9001 Encoding UTF-8 Language en-US NeedsCompilation yes Repository CRAN Date/Publication 2021-07-28 12:20:02 UTC R topics documented: geobuf . .2 mapbox . .2 serialize_pb . .3 1 2 mapbox Index 5 geobuf Geobuf Description The geobuf format is an optimized binary format for storing geojson data with protocol buffers. These functions are compatible with the geobuf2json and json2geobuf utilities from the geobuf npm package. Usage read_geobuf(x, as_data_frame = TRUE) geobuf2json(x, pretty = FALSE) json2geobuf(json, decimals = 6) Arguments x file path or raw vector with the serialized geobuf.proto message as_data_frame simplify geojson data into data frames pretty indent json, see jsonlite::toJSON json a text string with geojson data decimals how many decimals (digits behind the dot) to store for numbers mapbox Mapbox Vector Tiles Description Read Mapbox vector-tile (mvt) files and returns the list of layers.
    [Show full text]
  • Tencentdb for Tcaplusdb Getting Started
    TencentDB for TcaplusDB TencentDB for TcaplusDB Getting Started Product Documentation ©2013-2019 Tencent Cloud. All rights reserved. Page 1 of 32 TencentDB for TcaplusDB Copyright Notice ©2013-2019 Tencent Cloud. All rights reserved. Copyright in this document is exclusively owned by Tencent Cloud. You must not reproduce, modify, copy or distribute in any way, in whole or in part, the contents of this document without Tencent Cloud's the prior written consent. Trademark Notice All trademarks associated with Tencent Cloud and its services are owned by Tencent Cloud Computing (Beijing) Company Limited and its affiliated companies. Trademarks of third parties referred to in this document are owned by their respective proprietors. Service Statement This document is intended to provide users with general information about Tencent Cloud's products and services only and does not form part of Tencent Cloud's terms and conditions. Tencent Cloud's products or services are subject to change. Specific products and services and the standards applicable to them are exclusively provided for in Tencent Cloud's applicable terms and conditions. ©2013-2019 Tencent Cloud. All rights reserved. Page 2 of 32 TencentDB for TcaplusDB Contents Getting Started Basic Concepts Cluster Table Group Table Index Data Types Read/Write Capacity Mode Table Definition in ProtoBuf Table Definition in TDR Creating Cluster Creating Table Group Creating Table Getting Access Point Information Access TcaplusDB ©2013-2019 Tencent Cloud. All rights reserved. Page 3 of 32 TencentDB for TcaplusDB Getting Started Basic Concepts Cluster Last updated:2020-07-31 11:15:59 Cluster Overview A cluster is the basic TcaplusDB management unit, which provides independent TcaplusDB service for the business.
    [Show full text]
  • Advanced Model Deployments with Tensorflow Serving Presentation.Pdf
    Most models don’t get deployed. Hi, I’m Hannes. An inefficient model deployment import json from flask import Flask from keras.models import load_model from utils import preprocess model = load_model('model.h5') app = Flask(__name__) @app.route('/classify', methods=['POST']) def classify(): review = request.form["review"] preprocessed_review = preprocess(review) prediction = model.predict_classes([preprocessed_review])[0] return json.dumps({"score": int(prediction)}) Simple Deployments @app.route('/classify', methods=['POST']) Why Flask is insufficient def classify(): review = request.form["review"] ● No consistent APIs ● No consistent payloads preprocessed_review = preprocess(review) ● No model versioning prediction = model.predict_classes( ● No mini-batching support [preprocessed_review])[0] ● Inefficient for large models return json.dumps({"score": int(prediction)}) Image: Martijn Baudoin, Unsplash TensorFlow Serving TensorFlow Serving Production ready Model Serving ● Part of the TensorFlow Extended Ecosystem ● Used internally at Google ● Highly scalable model serving solution ● Works well for large models up to 2GB TensorFlow 2.0 ready! * * With small exceptions Deploy your models in 90s ... Export your Model import tensorflow as tf TensorFlow 2.0 Export tf.saved_model.save( ● Consistent model export model, ● Using Protobuf format export_dir="/tmp/saved_model", ● Export of graphs and signatures=None estimators possible ) $ tree saved_models/ Export your Model saved_models/ └── 1555875926 ● Exported model as Protobuf ├── assets (Saved_model.pb)
    [Show full text]
  • Software License Agreement (EULA)
    Third-party Computer Software AutoVu™ ALPR cameras • angular-animate (https://docs.angularjs.org/api/ngAnimate) licensed under the terms of the MIT License (https://github.com/angular/angular.js/blob/master/LICENSE). © 2010-2016 Google, Inc. http://angularjs.org • angular-base64 (https://github.com/ninjatronic/angular-base64) licensed under the terms of the MIT License (https://github.com/ninjatronic/angular-base64/blob/master/LICENSE). © 2010 Nick Galbreath © 2013 Pete Martin • angular-translate (https://github.com/angular-translate/angular-translate) licensed under the terms of the MIT License (https://github.com/angular-translate/angular-translate/blob/master/LICENSE). © 2014 [email protected] • angular-translate-handler-log (https://github.com/angular-translate/bower-angular-translate-handler-log) licensed under the terms of the MIT License (https://github.com/angular-translate/angular-translate/blob/master/LICENSE). © 2014 [email protected] • angular-translate-loader-static-files (https://github.com/angular-translate/bower-angular-translate-loader-static-files) licensed under the terms of the MIT License (https://github.com/angular-translate/angular-translate/blob/master/LICENSE). © 2014 [email protected] • Angular Google Maps (http://angular-ui.github.io/angular-google-maps/#!/) licensed under the terms of the MIT License (https://opensource.org/licenses/MIT). © 2013-2016 angular-google-maps • AngularJS (http://angularjs.org/) licensed under the terms of the MIT License (https://github.com/angular/angular.js/blob/master/LICENSE). © 2010-2016 Google, Inc. http://angularjs.org • AngularUI Bootstrap (http://angular-ui.github.io/bootstrap/) licensed under the terms of the MIT License (https://github.com/angular- ui/bootstrap/blob/master/LICENSE).
    [Show full text]
  • An Evaluation of Tensorflow As a Programming Framework for HPC Applications
    DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2018 An Evaluation of TensorFlow as a Programming Framework for HPC Applications WEI DER CHIEN KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE An Evaluation of TensorFlow as a Programming Framework for HPC Applications WEI DER CHIEN Master in Computer Science Date: August 28, 2018 Supervisor: Stefano Markidis Examiner: Erwin Laure Swedish title: En undersökning av TensorFlow som ett utvecklingsramverk för högpresterande datorsystem School of Electrical Engineering and Computer Science iii Abstract In recent years, deep-learning, a branch of machine learning gained increasing popularity due to their extensive applications and perfor- mance. At the core of these application is dense matrix-matrix multipli- cation. Graphics Processing Units (GPUs) are commonly used in the training process due to their massively parallel computation capabili- ties. In addition, specialized low-precision accelerators have emerged to specifically address Tensor operations. Software frameworks, such as TensorFlow have also emerged to increase the expressiveness of neural network model development. In TensorFlow computation problems are expressed as Computation Graphs where nodes of a graph denote operation and edges denote data movement between operations. With increasing number of heterogeneous accelerators which might co-exist on the same cluster system, it became increasingly difficult for users to program efficient and scalable applications. TensorFlow provides a high level of abstraction and it is possible to place operations of a computation graph on a device easily through a high level API. In this work, the usability of TensorFlow as a programming framework for HPC application is reviewed.
    [Show full text]
  • Towards a Fully Automated Extraction and Interpretation of Tabular Data Using Machine Learning
    UPTEC F 19050 Examensarbete 30 hp August 2019 Towards a fully automated extraction and interpretation of tabular data using machine learning Per Hedbrant Per Hedbrant Master Thesis in Engineering Physics Department of Engineering Sciences Uppsala University Sweden Abstract Towards a fully automated extraction and interpretation of tabular data using machine learning Per Hedbrant Teknisk- naturvetenskaplig fakultet UTH-enheten Motivation A challenge for researchers at CBCS is the ability to efficiently manage the Besöksadress: different data formats that frequently are changed. Significant amount of time is Ångströmlaboratoriet Lägerhyddsvägen 1 spent on manual pre-processing, converting from one format to another. There are Hus 4, Plan 0 currently no solutions that uses pattern recognition to locate and automatically recognise data structures in a spreadsheet. Postadress: Box 536 751 21 Uppsala Problem Definition The desired solution is to build a self-learning Software as-a-Service (SaaS) for Telefon: automated recognition and loading of data stored in arbitrary formats. The aim of 018 – 471 30 03 this study is three-folded: A) Investigate if unsupervised machine learning Telefax: methods can be used to label different types of cells in spreadsheets. B) 018 – 471 30 00 Investigate if a hypothesis-generating algorithm can be used to label different types of cells in spreadsheets. C) Advise on choices of architecture and Hemsida: technologies for the SaaS solution. http://www.teknat.uu.se/student Method A pre-processing framework is built that can read and pre-process any type of spreadsheet into a feature matrix. Different datasets are read and clustered. An investigation on the usefulness of reducing the dimensionality is also done.
    [Show full text]
  • Datacenter Tax Cuts: Improving WSC Efficiency Through Protocol Buffer Acceleration
    Datacenter Tax Cuts: Improving WSC Efficiency Through Protocol Buffer Acceleration Dinesh Parimi William J. Zhao Jerry Zhao University of California, Berkeley University of California, Berkeley University of California, Berkeley [email protected] william [email protected] [email protected] Abstract—According to recent literature, at least 25% of cycles will serialize some set of objects stored in memory to a in a modern warehouse-scale computer are spent on common portable format defined by the protobuf compiler (protoc). “building blocks” [1]. Referred to by Kanev et al. as a “datacenter The serialized objects can be sent to some remote device, tax”, these tasks are prime candidates for hardware acceleration, as even modest improvements here can generate immense cost which will then deserialize the object into its own memory and power savings given the scale of such systems. to execute on. Thus protobuf is widely used to implement One of these tasks is protocol buffer serialization and parsing. remote procedure calls (RPC), since it facilitates the transport Protocol buffers are an open-source mechanism for representing of arguments across a network. Protobuf remains the most structured data developed by Google, and used extensively in popular serialization framework due to its maturity, backwards their datacenters for communication and remote procedure calls. While the source code for protobufs is highly optimized, certain compatibility, and extensibility. Specifically, the encoding tasks - such as the compression/decompression of integer fields scheme enables future changes to the protobuf schema to and the encoding of variable-length strings - bottleneck the remain backwards compatible. throughput of serializing or parsing a protobuf.
    [Show full text]
  • From XML to Flat Buffers: Markup in the Twenty-Teens Warning! the Contenders
    Elliotte Rusty Harold [email protected] August 2018 From XML to Flat Buffers: Markup in the Twenty-teens Warning! The Contenders ● XML ● JSON ● YAML ● EXI ● Protobufs ● Flat Protobufs XML JSON YAML EXI Protobuf Flat Buffers App Engine X X Standard Java App Engine X Flex What Uses What Kubernetes X X From technology, tools, and systems Eclipse X I use frequently. There are many others. Maven X Ant X Google X X X X X “APIs” Publishing X XML XML ● Very well defined standard ● By far the most general format: ○ Mixed content ○ Attributes and elements ● By far the best tool support. Nothing else is close: ○ XSLT ○ XPath ○ Many schema languages: ■ W3C XSD ■ RELAX NG More Reasons to Choose XML ● Most composable for mixing and matching markup; e.g. MathML+SVG in HTML ● Does not require a schema. ● Streaming support: very large documents ● Better for interchange amongst unrelated parties ● The deeper your needs the more likely you’ll end up here. Why Not XML? ● Relatively complex for simple tasks ● Limited to no support for non-string programming types: ○ Numbers, booleans, dates, money, etc. ○ Lists, maps, sets ○ You can encode all these but APIs don’t necessarily recognize or support them. ● Lots of sharp edges to surprise the non-expert: ○ 9/10 are namespace related ○ Attribute value normalization ○ White space ● Some security issues if you’re not careful (Billion laughs) JSON ● Simple for object serialization and program data. If your data is a few basic types (int, string, boolean, float) and data structures (list, map) this works well. ● More or less standard (7-8 of them in fact) ● Consumption libraries for essentially all significant languages Why Not JSON? ● It is surprising how fast needs grow past a few basic types and data structures.
    [Show full text]
  • Trifacta Data Preparation for Amazon Redshift and S3 Must Be Deployed Into an Existing Virtual Private Cloud (VPC)
    Install Guide for Data Preparation for Amazon Redshift and S3 Version: 7.1 Doc Build Date: 05/26/2020 Copyright © Trifacta Inc. 2020 - All Rights Reserved. CONFIDENTIAL These materials (the “Documentation”) are the confidential and proprietary information of Trifacta Inc. and may not be reproduced, modified, or distributed without the prior written permission of Trifacta Inc. EXCEPT AS OTHERWISE PROVIDED IN AN EXPRESS WRITTEN AGREEMENT, TRIFACTA INC. PROVIDES THIS DOCUMENTATION AS-IS AND WITHOUT WARRANTY AND TRIFACTA INC. DISCLAIMS ALL EXPRESS AND IMPLIED WARRANTIES TO THE EXTENT PERMITTED, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT AND FITNESS FOR A PARTICULAR PURPOSE AND UNDER NO CIRCUMSTANCES WILL TRIFACTA INC. BE LIABLE FOR ANY AMOUNT GREATER THAN ONE HUNDRED DOLLARS ($100) BASED ON ANY USE OF THE DOCUMENTATION. For third-party license information, please select About Trifacta from the Help menu. 1. Quick Start . 4 1.1 Install from AWS Marketplace . 4 1.2 Upgrade for AWS Marketplace . 7 2. Configure . 8 2.1 Configure for AWS . 8 2.1.1 Configure for EC2 Role-Based Authentication . 14 2.1.2 Enable S3 Access . 16 2.1.2.1 Create Redshift Connections 28 3. Contact Support . 30 4. Legal 31 4.1 Third-Party License Information . 31 Page #3 Quick Start Install from AWS Marketplace Contents: Product Limitations Internet access Install Desktop Requirements Pre-requisites Install Steps - CloudFormation template SSH Access Troubleshooting SELinux Upgrade Documentation Related Topics This guide steps through the requirements and process for installing Trifacta® Data Preparation for Amazon Redshift and S3 through the AWS Marketplace.
    [Show full text]
  • Distributed Services with Go Your Guide to Reliable, Scalable, and Maintainable Systems
    Extracted from: Distributed Services with Go Your Guide to Reliable, Scalable, and Maintainable Systems This PDF file contains pages extracted from Distributed Services with Go, published by the Pragmatic Bookshelf. For more information or to purchase a paperback or PDF copy, please visit http://www.pragprog.com. Note: This extract contains some colored text (particularly in code listing). This is available only in online versions of the books. The printed versions are black and white. Pagination might vary between the online and printed versions; the content is otherwise identical. Copyright © 2021 The Pragmatic Programmers, LLC. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher. The Pragmatic Bookshelf Raleigh, North Carolina Distributed Services with Go Your Guide to Reliable, Scalable, and Maintainable Systems Travis Jeffery The Pragmatic Bookshelf Raleigh, North Carolina Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and The Pragmatic Programmers, LLC was aware of a trademark claim, the designations have been printed in initial capital letters or in all capitals. The Pragmatic Starter Kit, The Pragmatic Programmer, Pragmatic Programming, Pragmatic Bookshelf, PragProg and the linking g device are trade- marks of The Pragmatic Programmers, LLC. Every precaution was taken in the preparation of this book. However, the publisher assumes no responsibility for errors or omissions, or for damages that may result from the use of information (including program listings) contained herein.
    [Show full text]
  • Protocol Buffers
    Protocol Buffers http://code.google.com/p/protobuf/ 1 Overview ● What are Protocol Buffers ● Structure of a .proto file ● How to use a message ● How messages are encoded ● Important points to remember ● More Stuff 2/33 What are Protocol Buffers? ● Serialization format by Google ● used by Google for almost all internal RPC protocols and file formats (currently 48,162 different message types defined in the Google code tree across 12,183 .proto files. They©re used both in RPC systems and for persistent storage of data in a variety of storage systems.) ● Goals: ● Simplicity ● Compatibility ● Performance 3/33 Comparison XML - Protobuf ● Readable by humans ↔ binary format ● Self-describing ↔ Garbage without .proto file ● Big files ↔ small files (3-10 times) ● Slow to serialize/parse ↔ fast (20-100 times) ● .xsd (complex) ↔ .proto (simple, less ambiguous) ● Complex access ↔ easy access 4/33 Comparison XML ± Protobuf (cntd) <person> Person { <name>John Doe</name> name: "John Doe" <email>[email protected]</email> email: "[email protected]" </person> } (== 69 bytes, 5-10©000ns to parse) (== 28 bytes, 100-200ns to parse) cout << "Name: " cout << "Name: " << person.name() << << endl; person.getElementsByTagName("nam cout << "E-mail: " << person.email() << e")->item(0)->innerText() endl; << endl; cout << "E-mail: " << person.getElementsByTagName("emai l")->item(0)->innerText() 5/33 << endl; Example message Person { required string name = 1; // name of person required int32 id = 2; // id of person optional string email = 3; // email address enum PhoneType { MOBILE = 0; HOME = 1; WORK = 2; } message PhoneNumber { required string number = 1; optional PhoneType type = 2 [default = HOME]; } 6/33 repeated PhoneNumber phone = 4; } From .proto to runtime ● Messages defined in .proto file(s) ● Compiled into source code with protoc ● C++ ● Java ● Python ● More languages via AddOns (C#, PHP, Perl, ObjC, etc) ● Usage in code ● Passed via network / files 7/33 Message Definition ● Messages defined in .proto files ● Syntax: Message [MessageName] { ..
    [Show full text]
  • Cloud Native Communication Patterns with Grpc
    Cloud Native Communication Patterns with gRPC Kasun Indrasiri Author “gRPC Up and Running” and “Microservices for Enterprise” About Me ● Author “gRPC Up & Running”, “Microservices for Enterprise” ● Product Manager/Senior Director at WSO2. ● Committer and PMC member at Apache Software Foundation. ● Founder “Bay area Microservices, APIs and Integration” meetup group. What is gRPC? ● Modern Inter-process communication technology. ● Invoking remote functions as easy as making a local function invocation. ● Contract-first. ● Binary messaging on the wire on top of HTTP2 ● Polyglot. Fundamentals of gRPC - Service Definition syntax = "proto3"; ● Defines the business capabilities of package ecommerce; your service. service ProductInfo { rpc addProduct(Product) returns (ProductID); ● Protocol Buffers used as the IDL for rpc getProduct(ProductID) returns (Product); define services. } message Product { ● Protocol Buffers : string id = 1; ○ A language-agnostic, platform-neutral, string name = 2; extensible mechanism to serializing string description = 3; float price = 4; structured data. } ● Defines service, remote methods, and message ProductID { data types. string value = 1; } ProductInfo.proto Fundamentals of gRPC - gRPC Service // AddProduct implements ecommerce.AddProduct ● gRPC service implements the func (s *server) AddProduct(ctx context.Context, in *pb.Product) (*pb.ProductID, business logic. error) { ● Generate server side skeleton from // Business logic } service definition. // GetProduct implements ecommerce.GetProduct func (s *server) GetProduct(ctx
    [Show full text]