Livegraph: a Transactional Graph Storage System with Purely Sequential Adjacency List Scans
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Accordion: Better Memory Organization for LSM Key-Value Stores
Accordion: Better Memory Organization for LSM Key-Value Stores Edward Bortnikov Anastasia Braginsky Eshcar Hillel Yahoo Research Yahoo Research Yahoo Research [email protected] [email protected] [email protected] Idit Keidar Gali Sheffi Technion and Yahoo Research Yahoo Research [email protected] gsheffi@oath.com ABSTRACT of applications for which they are used continuously in- Log-structured merge (LSM) stores have emerged as the tech- creases. A small sample of recently published use cases in- nology of choice for building scalable write-intensive key- cludes massive-scale online analytics (Airbnb/ Airstream [2], value storage systems. An LSM store replaces random I/O Yahoo/Flurry [7]), product search and recommendation (Al- with sequential I/O by accumulating large batches of writes ibaba [13]), graph storage (Facebook/Dragon [5], Pinter- in a memory store prior to flushing them to log-structured est/Zen [19]), and many more. disk storage; the latter is continuously re-organized in the The leading approach for implementing write-intensive background through a compaction process for efficiency of key-value storage is log-structured merge (LSM) stores [31]. reads. Though inherent to the LSM design, frequent com- This technology is ubiquitously used by popular key-value pactions are a major pain point because they slow down storage platforms [9, 14, 16, 22,4,1, 10, 11]. The premise data store operations, primarily writes, and also increase for using LSM stores is the major disk access bottleneck, disk wear. Another performance bottleneck in today's state- exhibited even with today's SSD hardware [14, 33, 34]. -
D2.2: Research Data Exchange Solution
H2020-ICT-2018-2 /ICT-28-2018-CSA SOMA: Social Observatory for Disinformation and Social Media Analysis D2.2: Research data exchange solution Project Reference No SOMA [825469] Deliverable D2.2: Research Data exchange (and transparency) solution with platforms Work package WP2: Methods and Analysis for disinformation modeling Type Report Dissemination Level Public Date 30/08/2019 Status Final Authors Lynge Asbjørn Møller, DATALAB, Aarhus University Anja Bechmann, DATALAB, Aarhus University Contributor(s) See fact-checking interviews and meetings in appendix 7.2 Reviewers Noemi Trino, LUISS Datalab, LUISS University Stefano Guarino, LUISS Datalab, LUISS University Document description This deliverable compiles the findings and recommended solutions and actions needed in order to construct a sustainable data exchange model for stakeholders, focusing on a differentiated perspective, one for journalists and the broader community, and one for university-based academic researchers. SOMA-825469 D2.2: Research data exchange solution Document Revision History Version Date Modifications Introduced Modification Reason Modified by v0.1 28/08/2019 Consolidation of first DATALAB, Aarhus draft University v0.2 29/08/2019 Review LUISS Datalab, LUISS University v0.3 30/08/2019 Proofread DATALAB, Aarhus University v1.0 30/08/2019 Final version DATALAB, Aarhus University 30/08/2019 Page | 1 SOMA-825469 D2.2: Research data exchange solution Executive Summary This report provides an evaluation of current solutions for data transparency and exchange with social media platforms, an account of the historic obstacles and developments within the subject and a prioritized list of future scenarios and solutions for data access with social media platforms. The evaluation of current solutions and the historic accounts are based primarily on a systematic review of academic literature on the subject, expanded by an account on the most recent developments and solutions. -
Letter, If Not the Spirit, of One Or the Other Definition
Producing Open Source Software How to Run a Successful Free Software Project Karl Fogel Producing Open Source Software: How to Run a Successful Free Software Project by Karl Fogel Copyright © 2005-2021 Karl Fogel, under the CreativeCommons Attribution-ShareAlike (4.0) license. Version: 2.3214 Home site: https://producingoss.com/ Dedication This book is dedicated to two dear friends without whom it would not have been possible: Karen Under- hill and Jim Blandy. i Table of Contents Preface ............................................................................................................................. vi Why Write This Book? ............................................................................................... vi Who Should Read This Book? ..................................................................................... vi Sources ................................................................................................................... vii Acknowledgements ................................................................................................... viii For the first edition (2005) ................................................................................ viii For the second edition (2021) .............................................................................. ix Disclaimer .............................................................................................................. xiii 1. Introduction ................................................................................................................... -
Artificial Intelligence for Understanding Large and Complex
Artificial Intelligence for Understanding Large and Complex Datacenters by Pengfei Zheng Department of Computer Science Duke University Date: Approved: Benjamin C. Lee, Advisor Bruce M. Maggs Jeffrey S. Chase Jun Yang Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2020 Abstract Artificial Intelligence for Understanding Large and Complex Datacenters by Pengfei Zheng Department of Computer Science Duke University Date: Approved: Benjamin C. Lee, Advisor Bruce M. Maggs Jeffrey S. Chase Jun Yang An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2020 Copyright © 2020 by Pengfei Zheng All rights reserved except the rights granted by the Creative Commons Attribution-Noncommercial Licence Abstract As the democratization of global-scale web applications and cloud computing, under- standing the performance of a live production datacenter becomes a prerequisite for making strategic decisions related to datacenter design and optimization. Advances in monitoring, tracing, and profiling large, complex systems provide rich datasets and establish a rigorous foundation for performance understanding and reasoning. But the sheer volume and complexity of collected data challenges existing techniques, which rely heavily on human intervention, expert knowledge, and simple statistics. In this dissertation, we address this challenge using artificial intelligence and make the case for two important problems, datacenter performance diagnosis and datacenter workload characterization. The first thrust of this dissertation is the use of statistical causal inference and Bayesian probabilistic model for datacenter straggler diagnosis. -
Learning Key-Value Store Design
Learning Key-Value Store Design Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, Zichen Zhu Harvard University ABSTRACT We introduce the concept of design continuums for the data Key-Value Stores layout of key-value stores. A design continuum unifies major Machine Databases K V K V … K V distinct data structure designs under the same model. The Table critical insight and potential long-term impact is that such unifying models 1) render what we consider up to now as Learning Data Structures fundamentally different data structures to be seen as \views" B-Tree Table of the very same overall design space, and 2) allow \seeing" Graph LSM new data structure designs with performance properties that Store Hash are not feasible by existing designs. The core intuition be- hind the construction of design continuums is that all data Performance structures arise from the very same set of fundamental de- Update sign principles, i.e., a small set of data layout design con- Data Trade-offs cepts out of which we can synthesize any design that exists Access Patterns in the literature as well as new ones. We show how to con- Hardware struct, evaluate, and expand, design continuums and we also Cloud costs present the first continuum that unifies major data structure Read Memory designs, i.e., B+tree, Btree, LSM-tree, and LSH-table. Figure 1: From performance trade-offs to data structures, The practical benefit of a design continuum is that it cre- key-value stores and rich applications. -
Myrocks in Mariadb
MyRocks in MariaDB Sergei Petrunia <[email protected]> MariaDB Shenzhen Meetup November 2017 2 What is MyRocks ● #include <Yoshinori’s talk> ● This talk is about MyRocks in MariaDB 3 MyRocks lives in Facebook’s MySQL branch ● github.com/facebook/mysql-5.6 – Will call this “FB/MySQL” ● MyRocks lives there in storage/rocksdb ● FB/MySQL is easy to use if you are Facebook ● Not so easy if you are not :-) 4 FB/mysql-5.6 – user perspective ● No binaries, no packages – Compile yourself from source ● Dependencies, etc. ● No releases – (Is the latest git revision ok?) ● Has extra features – e.g. extra counters “confuse” monitoring tools. 5 FB/mysql-5.6 – dev perspective ● Targets a CentOS-type OS – Compiler, cmake version, etc. – Others may or may not [periodically] work ● MariaDB/Percona file pull requests to fix ● Special command to compile – https://github.com/facebook/mysql-5.6/wiki/Build-Steps ● Special command to run tests – Test suite assumes a big machine ● Some tests even a release build 6 Putting MyRocks in MariaDB ● Goals – Wider adoption – Ease of use – Ease of development – Have MyRocks in MariaDB ● Use it with MariaDB features ● Means – Port MyRocks into MariaDB – Provide binaries and packages 7 Status of MyRocks in MariaDB 8 Status of MyRocks in MariaDB ● MariaDB 10.2 is GA (as of May, 2017) ● It includes an ALPHA version of MyRocks plugin – Working to improve maturity ● It’s a loadable plugin (ha_rocksdb.so) ● Packages – Bintar, deb, rpm, win64 zip + MSI – deb/rpm have MyRocks .so and tools in a separate package. 9 Packaging for MyRocks in MariaDB 10 MyRocks and RocksDB library ● MyRocks is tied RocksDB@revno MariaDB – RocksDB is a github submodule – No compatibility with other versions MyRocks ● RocksDB is always compiled with RocksDB MyRocks S Z n ● l i And linked-in statically a b p ● p Distros have a RocksDB package y – Not using it. -
Information Retrieval and Mining in Distributed Environments Studies in Computational Intelligence,Volume 324 Editor-In-Chief Prof
Alessandro Soro, Eloisa Vargiu, Giuliano Armano, and Gavino Paddeu (Eds.) Information Retrieval and Mining in Distributed Environments Studies in Computational Intelligence,Volume 324 Editor-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail: [email protected] Further volumes of this series can be found on our homepage: springer.com Vol. 313. Imre J. Rudas, J´anos Fodor, and Janusz Kacprzyk (Eds.) Vol. 301. Giuliano Armano, Marco de Gemmis, Computational Intelligence in Engineering, 2010 Giovanni Semeraro, and Eloisa Vargiu (Eds.) ISBN 978-3-642-15219-1 Intelligent Information Access, 2010 Vol. 314. Lorenzo Magnani,Walter Carnielli, and ISBN 978-3-642-13999-4 Claudio Pizzi (Eds.) Vol. 302. Bijaya Ketan Panigrahi,Ajith Abraham, Model-Based Reasoning in Science and Technology, 2010 and Swagatam Das (Eds.) ISBN 978-3-642-15222-1 Computational Intelligence in Power Engineering, 2010 Vol. 315. Mohammad Essaaidi, Michele Malgeri, and ISBN 978-3-642-14012-9 Costin Badica (Eds.) Vol. 303. Joachim Diederich, Cengiz Gunay, and Intelligent Distributed Computing IV, 2010 James M. Hogan ISBN 978-3-642-15210-8 Recruitment Learning, 2010 Vol. 316. Philipp Wolfrum ISBN 978-3-642-14027-3 Information Routing, Correspondence Finding, and Object Vol. 304.Anthony Finn and Lakhmi C. Jain (Eds.) Recognition in the Brain, 2010 Innovations in Defence Support Systems, 2010 ISBN 978-3-642-15253-5 ISBN 978-3-642-14083-9 Vol. 317. Roger Lee (Ed.) Vol. 305. Stefania Montani and Lakhmi C. Jain (Eds.) Computer and Information Science 2010 Successful Case-Based Reasoning Applications-1, 2010 ISBN 978-3-642-15404-1 ISBN 978-3-642-14077-8 Vol. -
Myrocks Deployment at Facebook and Roadmaps
MyRocks deployment at Facebook and Roadmaps Yoshinori Matsunobu Production Engineer / MySQL Tech Lead, Facebook Feb/2018, #FOSDEM #mysqldevroom Agenda ▪ MySQL at Facebook ▪ MyRocks overview ▪ Production Deployment ▪ Future Plans MySQL “User Database (UDB)” at Facebook▪ Storing Social Graph ▪ Massively Sharded ▪ Low latency ▪ Automated Operations ▪ Pure Flash Storage (Constrained by space, not by CPU/IOPS) What is MyRocks ▪ MySQL on top of RocksDB (RocksDB storage engine) ▪ Open Source, distributed from MariaDB and Percona as well MySQL Clients SQL/Connector Parser Optimizer Replication etc InnoDB RocksDB MySQL http://myrocks.io/ MyRocks Initial Goal at Facebook InnoDB in main database MyRocks in main database CPU IO Space CPU IO Space Machine limit Machine limit 45% 90% 21% 15% 45% 20% 15% 21% 15% MyRocks features ▪ Clustered Index (same as InnoDB) ▪ Bloom Filter and Column Family ▪ Transactions, including consistency between binlog and RocksDB ▪ Faster data loading, deletes and replication ▪ Dynamic Options ▪ TTL ▪ Online logical and binary backup MyRocks vs InnoDB ▪ MyRocks pros ▪ Much smaller space (half compared to compressed InnoDB) ▪ Gives better cache hit rate ▪ Writes are faster = Faster Replication ▪ Much smaller bytes written (can use more afordable fash storage) ▪ MyRocks cons (improvements in progress) ▪ Lack of several features ▪ No SBR, Gap Lock, Foreign Key, Fulltext Index, Spatial Index support. Need to use case sensitive collation for perf ▪ Reads are slower, especially if your data fts in memory ▪ More dependent on flesystem -
Etriks Analytical Environment: a Practical Platform for Medical Big Data Analysis
Imperial College of Science, Technology and Medicine Department of Computing eTRIKS Analytical Environment: A Practical Platform for Medical Big Data Analysis Axel Oehmichen Submitted in part fulfilment of the requirements for the degree of Doctor of Philosophy in Computing of Imperial College London December 2018 Abstract Personalised medicine and translational research have become sciences driven by Big Data. Healthcare and medical research are generating more and more complex data, encompassing clinical investigations, 'omics, imaging, pharmacokinetics, Next Generation Sequencing and beyond. In addition to traditional collection methods, economical and numerous information sensing IoT devices such as mobile devices, smart sensors, cameras or connected medical de- vices have created a deluge of data that research institutes and hospitals have difficulties to deal with. While the collection of data is greatly accelerating, improving patient care by devel- oping personalised therapies and new drugs depends increasingly on an organization's ability to rapidly and intelligently leverage complex molecular and clinical data from that variety of large-scale heterogeneous data sources. As a result, the analysis of these datasets has become increasingly computationally expensive and has laid bare the limitations of current systems. From the patient perspective, the advent of electronic medical records coupled with so much personal data being collected have raised concerns about privacy. Many countries have intro- duced laws to protect people's privacy, however, many of these laws have proven to be less effective in practice. Therefore, along with the capacity to process the humongous amount of medical data, the addition of privacy preserving features to protect patients' privacy has become a necessity. -
2014 URCA Abstracts for Oral and Poster Presentations
Oral & Poster Presentation Abstracts Katheryn Adam, Chemistry Faculty Mentor: Marco Bonizzoni, Chemistry An off-the-shelf sensing system for physiologically relevant phosphates We have developed a chemical sensing system that can differentiate biologically relevant phosphates (nucleoside diphosphates, pyrophosphate) in neutral water solution using only commercially available components. Our approach uses a common fluorescent indicator and a poly(amidoamine) (PAMAM) polycationic receptor to construct an indicator displacement assay (IDA). The system crucially relies on multivariate data collection and analysis. In fact, using different phosphates in the dye-displacement assay results in subtle differences in the optical signals; however, it is not possible to capture this information using classical univariate data presentation techniques. Instead, we rely on principal component analysis, a multivariate data analysis technique, to evaluate these differences and thus distinguish between the biologically relevant phosphates. We will also present supporting data reporting on the anion binding capabilities of the PAMAM system acquired using optical spectroscopy methods. Alison Adams, Biological Sciences Faculty Mentor: Laura Reed, Biological Sciences QTL affecting genotype-by-diet interactions of larval triglyceride levels Metabolic Syndrome (MetS) is a complex disease that is becoming increasingly prevalent in the world today. It is identified by an assortment of symptoms such as obesity, insulin resistance, and elevated blood lipids. This disease and its various phenotypes can be modeled in Drosophila melanogaster. In a previous study of MetS, our lab implemented a round-robin crossing scheme on approximately 800 isogenic lines from a recombinant inbred line population, and a linear regression was used to determine genotype, diet, and genotype-by-diet interactions. -
Social Media Evaluation for Non-Profit Organizations
Social media evaluation for non-profit organizations The case of Oxfam Italia Master Thesis Author: Giuseppe Antonio Coco Thesis Supervisor: Jessica Gustafsson Department of Informatics and Media Specialization in Digital Media & Society Uppsala University, Sweden Autumn 2014 Abstract The thesis presents an evaluation of the Facebook page of the Italian non-profit organization Oxfam Italia from November 2013 to March 2014. The research’s aim is to analyze the community which follows the organization, how this community interacts with it and how the moderators of the page communicate with its followers. The research aims also to find ways to increase Oxfam Italia’s performance on Facebook. The theoretical framework focuses on non-profit marketing and its peculiarities, Social Media Marketing and notions such as engagement and brand community. The methods used in the research consist in data mining and content analysis. Data have been gathered from Facebook Insights and through the issuing of FQL queries from the Facebook Graph API. The research found out that Oxfam has more female followers than male (62% vs 36,5%), the age range of them is 25-44 years. Oxfam’s presence, in particular, is very rooted in the region of Tuscany (where its headquarter is). Facebook followers showed a very good attitude toward the organization, even though criticisms are common, and Oxfam used its social media presence mainly to update the followers concerning ongoing activities and to urge to on-line activism. The users’ favourite engagement method was “liking” photographic contents. Keywords Social media evaluation – Non-profits organizations – Non-profits marketing – Facebook – Oxfam Italia – Facebook Graph API – Facebook Insights – content analysis II Table of Contents Abstract .............................................................................................................................. -
Developing Facebook Applications by M.ALI and Amir Latif
Developing Facebook Applications By M.ALI and Aamir Latif Application Development Objective • A very basic introduction to developing applications with Facebook using PHP. • Keep it simple. • Goal is learning. • Level:Basic Application Development Agenda • Presentation [M.ALI] 20 mins • Demonstration [Amir Latif] 15 mins • Question/Answer 10 mins Application Development What is Facebook? • Facebook is a web application that provides a kind of social networking with people near you. • Facebook was founded by Mark Zuckerberg in 2004. • Originally designed to connect different colleges and universities. • Facebook name has origin in magazine that was distributed to new students Application Development Facebook facts • World largest social network with over 350 million users. • An average teen spends about 20 mins daily. • More than 65 million active users at a time. • Consists of more than 500,000 active applications. • 6.5 millions users access FB from mobile • More facts on http://www.facebook.com/press/info.php?statist ics Application Development Facebook application • Developing facebook application mean it is accessible from FB. • Application is exposed to millions of users worldwide. • There is a chance for the application to become a popular one in FB. • Applications can be social applications like super wall,games like crazy taxi and quizzes. Application Development Facebook Applications Some Popular Facebook applications Application Development Farm Ville Application Development Facebook Platform • Facebook platform is a framework for developers to interact with the core features of the facebook site. • Anyone can developed an application in Facebook by pointing their URL to developers.facebook.com • The idea was to enable everyone to provide content to the facebook site.