Artificial Intelligence for Understanding Large and Complex

Total Page:16

File Type:pdf, Size:1020Kb

Artificial Intelligence for Understanding Large and Complex Artificial Intelligence for Understanding Large and Complex Datacenters by Pengfei Zheng Department of Computer Science Duke University Date: Approved: Benjamin C. Lee, Advisor Bruce M. Maggs Jeffrey S. Chase Jun Yang Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2020 Abstract Artificial Intelligence for Understanding Large and Complex Datacenters by Pengfei Zheng Department of Computer Science Duke University Date: Approved: Benjamin C. Lee, Advisor Bruce M. Maggs Jeffrey S. Chase Jun Yang An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2020 Copyright © 2020 by Pengfei Zheng All rights reserved except the rights granted by the Creative Commons Attribution-Noncommercial Licence Abstract As the democratization of global-scale web applications and cloud computing, under- standing the performance of a live production datacenter becomes a prerequisite for making strategic decisions related to datacenter design and optimization. Advances in monitoring, tracing, and profiling large, complex systems provide rich datasets and establish a rigorous foundation for performance understanding and reasoning. But the sheer volume and complexity of collected data challenges existing techniques, which rely heavily on human intervention, expert knowledge, and simple statistics. In this dissertation, we address this challenge using artificial intelligence and make the case for two important problems, datacenter performance diagnosis and datacenter workload characterization. The first thrust of this dissertation is the use of statistical causal inference and Bayesian probabilistic model for datacenter straggler diagnosis. Stragglers are excep- tionally slow tasks in parallel execution that delay overall job completion. Stragglers, which are uncommon within a single job, are pervasive in datacenters with many jobs. A large body of research has focused on mitigating stragglers, but relatively little research has focused on systematically identifying their causes. We present Hound, a statistical machine learning framework that infers the causes of stragglers from traces of datacenter-scale jobs. The second thrust of this dissertation is the use of graph theory and statistical se- mantic learning for datacenter workload understanding, which has significant impact iv on datacenter hardware architecture, capacity planning, software re-optimization, etc. Datacenter engineers understand datacenter workloads with continuous, dis- tributed profiling that produces snapshots of call stacks across datacenter machines. Unlike stack traces profiled for isolated micro-benchmarks or small applications, those for hyperscale datcenters are enormous and complex and reflect the scale and diversity of their production codes, and expose great challenges for efficient and ef- fective interpretation. We present Limelight+, an algorithmic framework based on graph theory and statistical semantic learning, to extract workload insights from datacenter-scale stack traces, and to gain design insights for datacenter architecture. v For my teachers, family and friends | people who helped me to come this far vi Contents Abstract iv List of Tablesx List of Figures xii Acknowledgements xiv 1 Introduction1 1.1 Datacenter-scale Performance Diagnosis.................2 1.2 Understanding Datacenter Workload Structure.............3 1.3 Key Contributions............................4 2 Causal Inference and Bayesian Probabilistic Model for Straggler Diagnosis at Datacenter Scale6 2.1 System Objectives............................7 2.1.1 Datacenter-scale Diagnosis....................7 2.1.2 Interpretable Models.......................8 2.1.3 Unbiased Inference........................9 2.1.4 Computational Efficiency.....................9 2.2 The Hound Framework.......................... 10 2.2.1 Base Learning........................... 12 2.2.2 Meta Learning.......................... 19 2.2.3 Ensemble Learning........................ 22 2.3 Experimental Methods.......................... 26 vii 2.3.1 Google Trace........................... 26 2.3.2 Spark Traces........................... 27 2.4 Evaluation with Google Trace...................... 29 2.4.1 Mixtures of Causes........................ 33 2.4.2 Validation with Case Studies................... 37 2.4.3 Validation with Mutual Information............... 39 2.4.4 Comparisons with Expert Diagnosis............... 42 2.4.5 Comparison with Simpler Base Learners............ 44 2.5 Evaluation with Spark Traces...................... 49 2.6 Complexity and Overheads........................ 52 2.7 Related Work............................... 53 2.7.1 Straggler Mitigation....................... 53 2.7.2 Performance Analysis....................... 54 2.8 Conclusions................................ 55 3 Graph Theory and Semantic Learning for Understanding Datacen- ter Workload 57 3.1 Challenges of Existing Stack Trace Analysis Methods......... 58 3.1.1 Counting Exclusive Cycles.................... 63 3.1.2 Counting Inclusive Cycles.................... 64 3.1.3 Call Path Analysis........................ 66 3.1.4 Call Graph Analysis....................... 66 3.2 Limelight+ Overview........................... 67 3.3 Limelight+ Layerization.......................... 70 3.3.1 Foundational Degree....................... 70 3.3.2 Regularized Foundational Degree................ 77 3.3.3 Maximizing Foundational Degree................ 82 viii 3.4 Limelight+ Function Clustering..................... 87 3.5 Limelight+ Cycle Attribution....................... 97 3.6 Experimental Methods.......................... 100 3.6.1 SERVICES............................ 100 3.6.2 FLEET.............................. 102 3.7 Evaluation of Limelight+ ......................... 103 3.7.1 Discovering Layers........................ 104 3.7.2 Discovering Accelerators..................... 108 3.7.3 Evaluating Layer Quality..................... 117 3.7.4 Evaluating Semantic Embeddings................ 122 3.7.5 Analyzing Production Datacenter................ 124 3.8 Complexity and Overhead........................ 133 3.9 Related Work............................... 135 3.10 Conclusions................................ 138 4 Conclusions 140 4.1 Conclusions................................ 140 4.2 Future Work................................ 141 4.2.1 Performance Diagnosis and Optimization in the Era of Server- less, Microservices and Privacy................. 141 4.2.2 Machine Learning - Efficiency and Economics......... 142 Bibliography 144 Biography 171 ix List of Tables 2.1 Comparison of statistical dependence measures............. 17 2.2 Task metrics in the Google datacenter trace.............. 27 2.3 Task metrics in the Spark traces for BDBench and TPC-DS...... 28 2.4 Hound's causal topics for the Google dataset, derived from an ensem- ble of predictive, dependent, and causal models............. 30 2.5 Hound's causal topics on the Google trace, with predictive model... 32 2.6 Hound's causal topics on the Google trace, with dependence model.. 33 2.7 Hound's causal topics on the Google trace, with causal model..... 34 2.8 Example - Mixtures of causes...................... 35 2.9 Comparison of inferred causes from varied modeling strategies for job 6308689702................................. 35 2.10 Comparison of inferred causes from varied modeling strategies for job 6343946350................................. 36 2.11 Number of causes per job......................... 36 2.12 Coverage statistics for causes....................... 36 2.13 Examples of stragglers' causes from related studies that produce ex- pert diagnoses............................... 44 2.14 Comparison of causes diagnosed by Hound for the Google system against causes diagnosed by experts for related systems......... 45 2.15 Causal topics inferred with linear regression as base learner...... 46 2.16 Causal topics inferred with Pearson's correlation as base learner.... 48 x 2.17 Causal topics inferred with logistic regression based Rubin Causal Model as base learner........................... 48 2.18 Hound's causal topics for the Spark BDBench dataset......... 50 2.19 Hound's estimate of stragglers (percentage) explained by each cause for the Spark BDBench dataset..................... 50 2.20 Hound's causal topics for the Spark TPC-DS dataset......... 51 2.21 Hound's estimate of stragglers (percentage) explained by each cause for the Spark TPC-DS dataset...................... 51 2.22 Computational Complexity of Hound .................. 52 3.1 Stack Samples............................... 59 3.2 Notations. Let S denote a stack trace S, s a stack sample, and f or g a function................................. 62 3.3 Examples of conjugate functions..................... 81 3.4 A example stack trace. Suppose we target at a layer Lu and only show in the trace the five functions....................... 88 3.5 Experimental trace SERVICES and FLEET............... 100 3.6 Comparison between hot function clusters revealed by Limelight+ and expert-designed ASICs/accelerators.................... 112 3.7 Comparison between hot function clusters revealed by Limelight+ and expert-designed software re-optimizations................ 115 3.8 Comparison of Directed Acyclic Graph (DAG) layerization algorithms. 119 3.9 Evaluation of Limelight+'s EE (Equilibrium Embedding) and Word2Vec
Recommended publications
  • 16 Inspiring Women Engineers to Watch
    Hackbright Academy Hackbright Academy is the leading software engineering school for women founded in San Francisco in 2012. The academy graduates more female engineers than UC Berkeley and Stanford each year. https://hackbrightacademy.com 16 Inspiring Women Engineers To Watch Women's engineering school Hackbright Academy is excited to share some updates from graduates of the software engineering fellowship. Check out what these 16 women are doing now at their companies - and what languages, frameworks, databases and other technologies these engineers use on the job! Software Engineer, Aclima Tiffany Williams is a software engineer at Aclima, where she builds software tools to ingest, process and manage city-scale environmental data sets enabled by Aclima’s sensor networks. Follow her on Twitter at @twilliamsphd. Technologies: Python, SQL, Cassandra, MariaDB, Docker, Kubernetes, Google Cloud Software Engineer, Eventbrite 1 / 16 Hackbright Academy Hackbright Academy is the leading software engineering school for women founded in San Francisco in 2012. The academy graduates more female engineers than UC Berkeley and Stanford each year. https://hackbrightacademy.com Maggie Shine works on backend and frontend application development to make buying a ticket on Eventbrite a great experience. In 2014, she helped build a WiFi-enabled basal body temperature fertility tracking device at a hardware hackathon. Follow her on Twitter at @magksh. Technologies: Python, Django, Celery, MySQL, Redis, Backbone, Marionette, React, Sass User Experience Engineer, GoDaddy 2 / 16 Hackbright Academy Hackbright Academy is the leading software engineering school for women founded in San Francisco in 2012. The academy graduates more female engineers than UC Berkeley and Stanford each year.
    [Show full text]
  • NUMA-Aware Thread Migration for High Performance NVMM File Systems
    NUMA-Aware Thread Migration for High Performance NVMM File Systems Ying Wang, Dejun Jiang and Jin Xiong SKL Computer Architecture, ICT, CAS; University of Chinese Academy of Sciences fwangying01, jiangdejun, [email protected] Abstract—Emerging Non-Volatile Main Memories (NVMMs) out considering the NVMM usage on NUMA nodes. Besides, provide persistent storage and can be directly attached to the application threads accessing file system rely on the default memory bus, which allows building file systems on non-volatile operating system thread scheduler, which migrates thread only main memory (NVMM file systems). Since file systems are built on memory, NUMA architecture has a large impact on their considering CPU utilization. These bring remote memory performance due to the presence of remote memory access and access and resource contentions to application threads when imbalanced resource usage. Existing works migrate thread and reading and writing files, and thus reduce the performance thread data on DRAM to solve these problems. Unlike DRAM, of NVMM file systems. We observe that when performing NVMM introduces extra latency and lifetime limitations. This file reads/writes from 4 KB to 256 KB on a NVMM file results in expensive data migration for NVMM file systems on NUMA architecture. In this paper, we argue that NUMA- system (NOVA [47] on NVMM), the average latency of aware thread migration without migrating data is desirable accessing remote node increases by 65.5 % compared to for NVMM file systems. We propose NThread, a NUMA-aware accessing local node. The average bandwidth is reduced by thread migration module for NVMM file system.
    [Show full text]
  • Learning React Functional Web Development with React and Redux
    Learning React Functional Web Development with React and Redux Alex Banks and Eve Porcello Beijing Boston Farnham Sebastopol Tokyo Learning React by Alex Banks and Eve Porcello Copyright © 2017 Alex Banks and Eve Porcello. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐ tutional sales department: 800-998-9938 or [email protected]. Editor: Allyson MacDonald Indexer: WordCo Indexing Services Production Editor: Melanie Yarbrough Interior Designer: David Futato Copyeditor: Colleen Toporek Cover Designer: Karen Montgomery Proofreader: Rachel Head Illustrator: Rebecca Demarest May 2017: First Edition Revision History for the First Edition 2017-04-26: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491954621 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Learning React, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
    [Show full text]
  • Accordion: Better Memory Organization for LSM Key-Value Stores
    Accordion: Better Memory Organization for LSM Key-Value Stores Edward Bortnikov Anastasia Braginsky Eshcar Hillel Yahoo Research Yahoo Research Yahoo Research [email protected] [email protected] [email protected] Idit Keidar Gali Sheffi Technion and Yahoo Research Yahoo Research [email protected] gsheffi@oath.com ABSTRACT of applications for which they are used continuously in- Log-structured merge (LSM) stores have emerged as the tech- creases. A small sample of recently published use cases in- nology of choice for building scalable write-intensive key- cludes massive-scale online analytics (Airbnb/ Airstream [2], value storage systems. An LSM store replaces random I/O Yahoo/Flurry [7]), product search and recommendation (Al- with sequential I/O by accumulating large batches of writes ibaba [13]), graph storage (Facebook/Dragon [5], Pinter- in a memory store prior to flushing them to log-structured est/Zen [19]), and many more. disk storage; the latter is continuously re-organized in the The leading approach for implementing write-intensive background through a compaction process for efficiency of key-value storage is log-structured merge (LSM) stores [31]. reads. Though inherent to the LSM design, frequent com- This technology is ubiquitously used by popular key-value pactions are a major pain point because they slow down storage platforms [9, 14, 16, 22,4,1, 10, 11]. The premise data store operations, primarily writes, and also increase for using LSM stores is the major disk access bottleneck, disk wear. Another performance bottleneck in today's state- exhibited even with today's SSD hardware [14, 33, 34].
    [Show full text]
  • Magento on HHVM Speeding up Your Webshop with a Drop-In PHP Replacement
    Magento on HHVM Speeding up your webshop with a drop-in PHP replacement. Daniel Sloof [email protected] What is HHVM? ● HipHop Virtual Machine ● Created by engineers at Facebook ● Essentially a reimplementation of PHP ● Originally translated PHP to C++, now translates PHP to bytecode ● Just-in-time compiler, turning generated bytecode into machine code ● In some cases 5 to 10 times faster than regular PHP So what’s the problem? ● HHVM not entirely compatible with PHP ● Magento’s PHP triggering many of these incompatibilities ● Choosing between ○ Forking Magento to work around HHVM ○ Fixing issues within the extensive HHVM C++ codebase Resulted in... fixing HHVM ● Already over 100 commits fixing Magento related HHVM bugs; ○ SimpleXML (majority of bugfixes) ○ sessions ○ number_format ○ __get and __set ○ many more... ● Most of these fixes already merged back into the official (github) repository ● Community Edition running (relatively) stable! Benchmarks Before we go to the results... ● Magento 1.8 with sample data ● Standard Apache2 / php-fpm / MySQL stack (with APC opcode cache). ● Standard HHVM configuration (repo-authoritative mode disabled, JIT enabled) ● Repo-authoritative mode has potential to increase performance by a large margin ● Tool of choice: siege Benchmarks: Response time Average across 50 requests Benchmarks: Transaction rate While increasing siege concurrency until avg. response time ~2 seconds What about <insert caching mechanism here>? ● HHVM does not get in the way ● Dynamic content still needs to be generated ● Replaces PHP - not Varnish, Redis, FPC, Block Cache, etc. ● As long as you are burning CPU cycles (always), you will benefit from HHVM ● Think about speeding up indexing, order placement, routing, etc.
    [Show full text]
  • Automated Program Transformation for Improving Software Quality
    Automated Program Transformation for Improving Software Quality Rijnard van Tonder CMU-ISR-19-101 October 2019 Institute for Software Research School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Claire Le Goues, Chair Christian Kästner Jan Hoffmann Manuel Fähndrich, Facebook, Inc. Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Software Engineering. Copyright 2019 Rijnard van Tonder This work is partially supported under National Science Foundation grant numbers CCF-1750116 and CCF-1563797, and a Facebook Testing and Verification research award. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring corporation, institution, the U.S. government, or any other entity. Keywords: syntax, transformation, parsers, rewriting, crash bucketing, fuzzing, bug triage, program transformation, automated bug fixing, automated program repair, separation logic, static analysis, program analysis Abstract Software bugs are not going away. Millions of dollars and thousands of developer-hours are spent finding bugs, debugging the root cause, writing a patch, and reviewing fixes. Automated techniques like static analysis and dynamic fuzz testing have a proven track record for cutting costs and improving software quality. More recently, advances in automated program repair have matured and see nascent adoption in industry. Despite the value of these approaches, automated techniques do not come for free: they must approximate, both theoretically and in the interest of practicality. For example, static analyzers suffer false positives, and automatically produced patches may be insufficiently precise to fix a bug.
    [Show full text]
  • Unravel Data Systems Version 4.5
    UNRAVEL DATA SYSTEMS VERSION 4.5 Component name Component version name License names jQuery 1.8.2 MIT License Apache Tomcat 5.5.23 Apache License 2.0 Tachyon Project POM 0.8.2 Apache License 2.0 Apache Directory LDAP API Model 1.0.0-M20 Apache License 2.0 apache/incubator-heron 0.16.5.1 Apache License 2.0 Maven Plugin API 3.0.4 Apache License 2.0 ApacheDS Authentication Interceptor 2.0.0-M15 Apache License 2.0 Apache Directory LDAP API Extras ACI 1.0.0-M20 Apache License 2.0 Apache HttpComponents Core 4.3.3 Apache License 2.0 Spark Project Tags 2.0.0-preview Apache License 2.0 Curator Testing 3.3.0 Apache License 2.0 Apache HttpComponents Core 4.4.5 Apache License 2.0 Apache Commons Daemon 1.0.15 Apache License 2.0 classworlds 2.4 Apache License 2.0 abego TreeLayout Core 1.0.1 BSD 3-clause "New" or "Revised" License jackson-core 2.8.6 Apache License 2.0 Lucene Join 6.6.1 Apache License 2.0 Apache Commons CLI 1.3-cloudera-pre-r1439998 Apache License 2.0 hive-apache 0.5 Apache License 2.0 scala-parser-combinators 1.0.4 BSD 3-clause "New" or "Revised" License com.springsource.javax.xml.bind 2.1.7 Common Development and Distribution License 1.0 SnakeYAML 1.15 Apache License 2.0 JUnit 4.12 Common Public License 1.0 ApacheDS Protocol Kerberos 2.0.0-M12 Apache License 2.0 Apache Groovy 2.4.6 Apache License 2.0 JGraphT - Core 1.2.0 (GNU Lesser General Public License v2.1 or later AND Eclipse Public License 1.0) chill-java 0.5.0 Apache License 2.0 Apache Commons Logging 1.2 Apache License 2.0 OpenCensus 0.12.3 Apache License 2.0 ApacheDS Protocol
    [Show full text]
  • UNIVERSITY of CALIFORNIA SAN DIEGO Simplifying Datacenter Fault
    UNIVERSITY OF CALIFORNIA SAN DIEGO Simplifying datacenter fault detection and localization A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Arjun Roy Committee in charge: Alex C. Snoeren, Co-Chair Ken Yocum, Co-Chair George Papen George Porter Stefan Savage Geoff Voelker 2018 Copyright Arjun Roy, 2018 All rights reserved. The Dissertation of Arjun Roy is approved and is acceptable in quality and form for publication on microfilm and electronically: Co-Chair Co-Chair University of California San Diego 2018 iii DEDICATION Dedicated to my grandmother, Bela Sarkar. iv TABLE OF CONTENTS Signature Page . iii Dedication . iv Table of Contents . v List of Figures . viii List of Tables . x Acknowledgements . xi Vita........................................................................ xiii Abstract of the Dissertation . xiv Chapter 1 Introduction . 1 Chapter 2 Datacenters, applications, and failures . 6 2.1 Datacenter applications, networks and faults . 7 2.1.1 Datacenter application patterns . 7 2.1.2 Datacenter networks . 10 2.1.3 Datacenter partial faults . 14 2.2 Partial faults require passive impact monitoring . 17 2.2.1 Multipath hampers server-centric monitoring . 18 2.2.2 Partial faults confuse network-centric monitoring . 19 2.3 Unifying network and server centric monitoring . 21 2.3.1 Load-balanced links mean outliers correspond with partial faults . 22 2.3.2 Centralized network control enables collating viewpoints . 23 Chapter 3 Related work, challenges and a solution . 24 3.1 Fault localization effectiveness criteria . 24 3.2 Existing fault management techniques . 28 3.2.1 Server-centric fault detection . 28 3.2.2 Network-centric fault detection .
    [Show full text]
  • Crawling Code Review Data from Phabricator
    Friedrich-Alexander-Universit¨atErlangen-N¨urnberg Technische Fakult¨at,Department Informatik DUMITRU COTET MASTER THESIS CRAWLING CODE REVIEW DATA FROM PHABRICATOR Submitted on 4 June 2019 Supervisors: Michael Dorner, M. Sc. Prof. Dr. Dirk Riehle, M.B.A. Professur f¨urOpen-Source-Software Department Informatik, Technische Fakult¨at Friedrich-Alexander-Universit¨atErlangen-N¨urnberg Versicherung Ich versichere, dass ich die Arbeit ohne fremde Hilfe und ohne Benutzung anderer als der angegebenen Quellen angefertigt habe und dass die Arbeit in gleicher oder ¨ahnlicherForm noch keiner anderen Pr¨ufungsbeh¨ordevorgelegen hat und von dieser als Teil einer Pr¨ufungsleistung angenommen wurde. Alle Ausf¨uhrungen,die w¨ortlich oder sinngem¨aߨubernommenwurden, sind als solche gekennzeichnet. Nuremberg, 4 June 2019 License This work is licensed under the Creative Commons Attribution 4.0 International license (CC BY 4.0), see https://creativecommons.org/licenses/by/4.0/ Nuremberg, 4 June 2019 i Abstract Modern code review is typically supported by software tools. Researchers use data tracked by these tools to study code review practices. A popular tool in open-source and closed-source projects is Phabricator. However, there is no tool to crawl all the available code review data from Phabricator hosts. In this thesis, we develop a Python crawler named Phabry, for crawling code review data from Phabricator instances using its REST API. The tool produces minimal server and client load, reproducible crawling runs, and stores complete and genuine review data. The new tool is used to crawl the Phabricator instances of the open source projects FreeBSD, KDE and LLVM. The resulting data sets can be used by researchers.
    [Show full text]
  • Nástroje Pro Sjednocení Datových Zdrojů Projektu Gloffer Tools for Unification of Data Sources Project Gloffer
    VŠB – Technická univerzita Ostrava Fakulta elektrotechniky a informatiky Katedra informatiky Nástroje pro sjednocení datových zdrojů projektu Gloffer Tools for unification of data sources project Gloffer 2018 Bc. Jakub Malchárek Rád bych poděkoval panu Ing. Radoslavu Fasugovi, Ph.D. za odbornou pomoc a konzultaci při zpracování této diplomové práce a cenné rady v průběhu implementace. Abstrakt V této diplomové práci se zabývám analýzou dostupných technologií pro implementaci webo- vého portálu Gloffer. Jsou zde popsány databáze (MySQL, Redis, MongoDB, Aerospike, Apache HBase, Apache Cassandra, Google Bigtable, Memcached), vyhledávače (Solr, Lucene, Elastic Search), webové servery (Apache HTTP server, Apache Tomcat), zprostředkovatelé zpráv (Rab- bit MQ), distribuované výpočetní technologie (Apache Hadoop) a vývojové technologie (PHP 7, Nette Framework, Java, Spring Framework). Cílem je nejen popis těchto technologií, ale také ná- vrh a implementace rozhraní pro sjednocení datových zdrojů projektu Gloffer v programovacím jazyce Java s využitím Spring Frameworku. Výstupem práce je inteligentní nástroj zpřístupňující data z více datových zdrojů. Závěr práce obsahuje výkonové testování vyvinutého nástroje. Klíčová slova: Aerospike, Apache Cassandra, Apache Hadoop, Apache HBase, Apache HTTP server, Apache Tomcat, aplikační rozhraní, datové zdroje, Elastic Search, fulltext, Google Bi- gtable, index, Java, Lucene, Memcached, MongoDB, MySQL, Nette Framework, PHP, Rabbit MQ, Redis, REST, Solr, Spring Framework Abstract In this diploma thesis I deal with analysis of the available technologies for implementation of the Gloffer web portal. There are described databases (MySQL, Redis, MongoDB, Aerospike, Apache HBase, Apache Cassandra, Google Bigtable, Memcached), search engines (Solr, Lucene, Elastic Search), web servers (Apache HTTP server, Apache Tomcat), message brokers (Rabbit MQ), distributed computing technologies (Apache Hadoop) and develop technologies (PHP 7, Nette Framework, Java, Spring Framework).
    [Show full text]
  • Learning Key-Value Store Design
    Learning Key-Value Store Design Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, Zichen Zhu Harvard University ABSTRACT We introduce the concept of design continuums for the data Key-Value Stores layout of key-value stores. A design continuum unifies major Machine Databases K V K V … K V distinct data structure designs under the same model. The Table critical insight and potential long-term impact is that such unifying models 1) render what we consider up to now as Learning Data Structures fundamentally different data structures to be seen as \views" B-Tree Table of the very same overall design space, and 2) allow \seeing" Graph LSM new data structure designs with performance properties that Store Hash are not feasible by existing designs. The core intuition be- hind the construction of design continuums is that all data Performance structures arise from the very same set of fundamental de- Update sign principles, i.e., a small set of data layout design con- Data Trade-offs cepts out of which we can synthesize any design that exists Access Patterns in the literature as well as new ones. We show how to con- Hardware struct, evaluate, and expand, design continuums and we also Cloud costs present the first continuum that unifies major data structure Read Memory designs, i.e., B+tree, Btree, LSM-tree, and LSH-table. Figure 1: From performance trade-offs to data structures, The practical benefit of a design continuum is that it cre- key-value stores and rich applications.
    [Show full text]
  • Facebook Messenger Engineering
    SED 1037 Transcript EPISODE 1037 [INTRODUCTION] [00:00:00] JM: Facebook Messenger is a chat application that millions of people use every day to talk to each other. Over time, Messenger has grown to include group chats, video chats, animations, facial filters, stories and many more features. Messenger is a tool for utility as well as for entertainment. Messengers used on both mobile and desktop, but the size of the mobile application is particularly important. There are many users who are on devices that do not have much storage space. As Messenger has accumulated features, the iOS codebase has grown larger and larger. Several generations of Facebook engineers have rotated through the company with responsibility of working on Facebook Messenger, and that has led to different ways of managing information within the same codebase. The iOS codebase had room for improvement and Project LightSpeed was a project within Facebook that had the goal of making Messenger on iOS much smaller. Mohsen Agsen and is an engineer with Facebook and he joins the show to talk about the process of rewriting the Messenger app. This is a great deep dive into how to rewrite a mission- critical iOS application, and this team became very large at a certain point within Facebook. It's a great story and I hope you enjoy it as well. [SPONSOR MESSAGE] [00:01:27] JM: When I’m building a new product, G2i is the company that I call on to help me find a developer who can build the first version of my product. G2i is a hiring platform run by engineers that matches you with React, React Native, GraphQL and mobile engineers who you can trust.
    [Show full text]