Enhancing the Programmability of Cloud Object Storage
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
Respecting the Block Interface – Computational Storage Using Virtual Objects
Respecting the block interface – computational storage using virtual objects Ian F. Adams, John Keys, Michael P. Mesnier Intel Labs Abstract immediately obvious, and trade-offs need to be made. Unfor- tunately, this often means that the parallel bandwidth of all Computational storage has remained an elusive goal. Though drives within a storage server will not be available. The same minimizing data movement by placing computation close to problem exists within a single SSD, as the internal NAND storage has quantifiable benefits, many of the previous at- flash bandwidth often exceeds that of the storage controller. tempts failed to take root in industry. They either require a Enter computational storage, an oft-attempted approach to departure from the widespread block protocol to one that address the data movement problem. By keeping computa- is more computationally-friendly (e.g., file, object, or key- tion physically close to its data, we can avoid costly I/O. Var- value), or they introduce significant complexity (state) on top ious designs have been pursued, but the benefits have never of the block protocol. been enough to justify a new storage protocol. In looking We participated in many of these attempts and have since back at the decades of computational storage research, there concluded that neither a departure from nor a significant ad- is a common requirement that the block protocol be replaced dition to the block protocol is needed. Here we introduce a (or extended) with files, objects, or key-value pairs – all of block-compatible design based on virtual objects. Like a real which provide a convenient handle for performing computa- object (e.g., a file), a virtual object contains the metadata that tion. -
Distributing an SQL Query Over a Cluster of Containers
2019 IEEE 12th International Conference on Cloud Computing (CLOUD) Distributing an SQL Query Over a Cluster of Containers David Holland∗ and Weining Zhang† Department of Computer Science, University of Texas at San Antonio Email: ∗[email protected], †[email protected] Abstract—Emergent software container technology is now across a cluster of containers in a cloud. A feasibility study available on any cloud and opens up new opportunities to execute of this with performance analysis is reported in this paper. and scale data intensive applications wherever data is located. A containerized query (henceforth CQ) uses a deployment However, many traditional relational databases hosted on clouds have not scaled well. In this paper, a framework and deployment methodology that is unique to each query with respect to the methodology to containerize relational SQL queries is presented, number of containers and their networked topology effecting so that, a single SQL query can be scaled and executed by data flows. Furthermore a CQ can be scaled at run-time a network of cooperating containers, achieving intra-operator by adding more intra-operators. In contrast, the traditional parallelism and other significant performance gains. Results of distributed database query deployment configurations do not container prototype experiments are reported and compared to a real-world RDBMS baseline. Preliminary result on a research change at run-time, i.e., they are static and applied to all cloud shows up to 3-orders of magnitude performance gain for queries. Additionally, traditional distributed databases often some queries when compared to running the same query on a need to rewrite an SQL query to optimize performance. -
Hypervisors Vs. Lightweight Virtualization: a Performance Comparison
2015 IEEE International Conference on Cloud Engineering Hypervisors vs. Lightweight Virtualization: a Performance Comparison Roberto Morabito, Jimmy Kjällman, and Miika Komu Ericsson Research, NomadicLab Jorvas, Finland [email protected], [email protected], [email protected] Abstract — Virtualization of operating systems provides a container and alternative solutions. The idea is to quantify the common way to run different services in the cloud. Recently, the level of overhead introduced by these platforms and the lightweight virtualization technologies claim to offer superior existing gap compared to a non-virtualized environment. performance. In this paper, we present a detailed performance The remainder of this paper is structured as follows: in comparison of traditional hypervisor based virtualization and Section II, literature review and a brief description of all the new lightweight solutions. In our measurements, we use several technologies and platforms evaluated is provided. The benchmarks tools in order to understand the strengths, methodology used to realize our performance comparison is weaknesses, and anomalies introduced by these different platforms in terms of processing, storage, memory and network. introduced in Section III. The benchmark results are presented Our results show that containers achieve generally better in Section IV. Finally, some concluding remarks and future performance when compared with traditional virtual machines work are provided in Section V. and other recent solutions. Albeit containers offer clearly more dense deployment of virtual machines, the performance II. BACKGROUND AND RELATED WORK difference with other technologies is in many cases relatively small. In this section, we provide an overview of the different technologies included in the performance comparison. -
Erlang on Physical Machine
on $ whoami Name: Zvi Avraham E-mail: [email protected] /ˈkɒm. pɑː(ɹ)t. mɛntl̩. aɪˌzeɪ. ʃən/ Physicalization • The opposite of Virtualization • dedicated machines • no virtualization overhead • no noisy neighbors – nobody stealing your CPU cycles, IOPS or bandwidth – your EC2 instance may have a Netflix “roommate” ;) • Mostly used by ARM-based public clouds • also called Bare Metal or HPC clouds Sandbox – a virtual container in which untrusted code can be safely run Sandbox examples: ZeroVM & AWS Lambda based on Google Native Client: A Sandbox for Portable, Untrusted x86 Native Code Compartmentalization in terms of Virtualization Physicalization No Virtualization Virtualization HW-level Virtualization Containerization OS-level Virtualization Sandboxing Userspace-level Virtualization* Cloud runs on virtual HW HARDWARE Does the OS on your Cloud instance still supports floppy drive? $ ls /dev on Ubuntu 14.04 AWS EC2 instance • 64 teletype devices? • Sound? • 32 serial ports? • VGA? “It’s DUPLICATED on so many LAYERS” Application + Configuration process* OS Middleware (Spring/OTP) Container Managed Runtime (JVM/BEAM) VM Guest Container OS Container Guest OS Hypervisor Hardware We run Single App per VM APPS We run in Single User mode USERS Minimalistic Linux OSes • Embedded Linux versions • DamnSmall Linux • Linux with BusyBox Min. Linux OSes for Containers JeOS – “Just Enough OS” • CoreOS • RancherOS • RedHat Project Atomic • VMware Photon • Intel Clear Linux • Hyper # of Processes and Threads per OS OSv + CLI RancherOS processes CoreOS threads -
IDC Marketscape IDC Marketscape: Worldwide Object-Based Storage 2019 Vendor Assessment
IDC MarketScape IDC MarketScape: Worldwide Object-Based Storage 2019 Vendor Assessment Amita Potnis THIS IDC MARKETSCAPE EXCERPT FEATURES SCALITY IDC MARKETSCAPE FIGURE FIGURE 1 IDC MarketScape Worldwide Object-Based Storage Vendor Assessment Source: IDC, 2019 December 2019, IDC #US45354219e Please see the Appendix for detailed methodology, market definition, and scoring criteria. IN THIS EXCERPT The content for this excerpt was taken directly from IDC MarketScape: Worldwide Object-Based Storage 2019 Vendor Assessment (Doc # US45354219). All or parts of the following sections are included in this excerpt: IDC Opinion, IDC MarketScape Vendor Inclusion Criteria, Essential Guidance, Vendor Summary Profile, Appendix and Learn More. Also included is Figure 1. IDC OPINION The storage market has come a long way in terms of understanding object-based storage (OBS) technology and actively adopting it. It is a common practice for OBS to be adopted for secondary and cold storage needs at scale. Over the recent years, OBS has proven its ability to scale to tens and hundreds of petabytes and is now maturing to support newer workloads such as unstructured data analytics, IoT, AI/ML/DL, and so forth. As the price of flash declines and the data sets continue to grow, the need for analyzing the data is on the rise. Moving data sets from an object store to a high- performance tier for analysis is a thing of the past. Many vendors are enhancing their object offerings to include a flash tier or are bringing all-flash array object storage offerings to the market today. In this IDC MarketScape, IDC assesses the present commercial OBS supplier (suppliers that deliver software-defined OBS solutions as software or appliances much like other storage platforms) landscape. -
An Object Storage Solution for Data Archive Using Supermicro SSG-5018D8-AR12L and Openio SDS
CONTENTS WHITE PAPER 2 NEW CHALLENGES AN OBJECT STORAGE 2 DATA ARCHIVE WITHOUT COMPROMISE SOLUTION FOR DATA 3 OBJECT STORAGE FOR DATA ARCHIVE 4 SSG-5018D8-AR12L AND OPENIO ARCHIVE USING Power Efficiency at its Core Extreme Storage Density SUPERMICRO SSG-5018D8- Robust Networking 5 OPENIO, AN OPEN SOURCE OBJECT AR12L AND OPENIO SDS STORAGE SOLUTION 6 SUPERMICRO SSG-5018D8-AR12L AND OPENIO COMBINED SOLUTION 7 TEST RESULTS 11 CONCLUSIONS White Paper October 2016 Rodolfo Campos Sr. Product Marketing Specialist Super Micro Computer, Inc. 980 Rock Avenue San Jose, CA 95131 USA www.supermicro.com Intel Inside®. Powerful Data Center Outside. WHITE PAPER An Object Storage Solution For Data Archive using Supermicro SSG-5018D8-AR12L and OpenIO SDS NEW CHALLENGES Every minute - 2.5 million messages on Facebook are sent, nearly 430K tweets are posted, 67K photos on Instagram and over 5 million YouTube videos are uploaded. ~50% These are a few examples of how the Big Data market is growing 9 times faster than Digital Storage the traditional IT market. According to IDC reports, in 2012 the World created 4.4 Annual Growth zettabytes of digital data and is estimated to create 44 zettabytes of data by 2020. Cold Storage These large volumes of digital data are being created, shared, and stored on the cloud. As a result, data storage demands are reaching new limits and are in need Capacity of new requirements. Thus, data storage, data movement, and data analytics applications need a new storage platform to keep up with the greater capacity and scaling demands it brings. -
Presto: the Definitive Guide
Presto The Definitive Guide SQL at Any Scale, on Any Storage, in Any Environment Compliments of Matt Fuller, Manfred Moser & Martin Traverso Virtual Book Tour Starburst presents Presto: The Definitive Guide Register Now! Starburst is hosting a virtual book tour series where attendees will: Meet the authors: • Meet the authors from the comfort of your own home Matt Fuller • Meet the Presto creators and participate in an Ask Me Anything (AMA) session with the book Manfred Moser authors + Presto creators • Meet special guest speakers from Martin your favorite podcasts who will Traverso moderate the AMA Register here to save your spot. Praise for Presto: The Definitive Guide This book provides a great introduction to Presto and teaches you everything you need to know to start your successful usage of Presto. —Dain Sundstrom and David Phillips, Creators of the Presto Projects and Founders of the Presto Software Foundation Presto plays a key role in enabling analysis at Pinterest. This book covers the Presto essentials, from use cases through how to run Presto at massive scale. —Ashish Kumar Singh, Tech Lead, Bigdata Query Processing Platform, Pinterest Presto has set the bar in both community-building and technical excellence for lightning- fast analytical processing on stored data in modern cloud architectures. This book is a must-read for companies looking to modernize their analytics stack. —Jay Kreps, Cocreator of Apache Kafka, Cofounder and CEO of Confluent Presto has saved us all—both in academia and industry—countless hours of work, allowing us all to avoid having to write code to manage distributed query processing. -
Unlock Modern Backup and Recovery with Object Storage
Unlock Modern Backup and Recovery With Object Storage E-BOOK It’s a New Era for Data Protection In this age of digital transformation, data is growing at an astounding rate, posing challenges for data protection, long-term data retention and adherence Annual Size of the Global Datasphere 175 ZB to business and regulatory compliance. In just the 180 last two years, 90% of the world’s data has been 160 created by computers, mobile phones, email, social 140 media, IoT smart devices, connected cars and other 120 100 devices constantly generating streams of data1. 80 Zettabytes 60 IDC predicts that the data we generate, collect, and consume will 40 continue to increase significantly—from about 50 zettabytes in 2020 to 175 zettabytes in 2025. With massive collections of data 20 0 amounting, trying to keep pace with the performance required to 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 protect and recover it quickly, while scaling cost effectively, has Years become a major pain point for the enterprise. 1 Jeremy Waite, IBM ‘10 Key Marketing Trends for 2017’ Global Data Growth Prediction 50 zettabytes in 2020 175 zettabytes in 2025 1 zettabyte (ZB) = 175 ZB of data stored on DVDs would circle the 1 trillion gigabytes earth 222 times. 1 Redefine Data Protection Requirements How will your current backup, recovery and long-term retention strategy scale as data reaches uncharted volumes? Consider that for every terabyte of primary data stored, it is not uncommon to store three or more terabytes of secondary data copies for protection, high availability and compliance purposes. -
Retrospect Backup for Windows Data Protection for Businesses
DATA SHEET | RETROSPECT FOR WINDOWS Retrospect Backup for Windows Data Protection for Businesses NEW Retrospect Backup 17: Automatic Onboarding, Nexsan E-Series/Unity Certification, 10x Faster ProactiveAI, Restore Preflight Retrospect Backup protects over 100 Petabytes in over 500,000 homes and businesses in over 100 countries. With broad platform and application support, Retrospect protects every part of your computer environment, on-site and in the cloud. Start your first backup with one click. Complete Data Protection With cross platform support for Windows, Mac, and Linux, Retrospect offers business backup with system recovery, local backup, long-term retention, along with centralized management, end-to-end security, email protection, and extensive customization–all at an affordable price for a small business. Cloud Backup Theft and disaster have always been important reasons for offsite backups, but now, ransomware is the most powerful. Ransomware will encrypt the file share just like any other file. Retrospect integrates with over a dozen cloud storage providers for offsite backups, connects securely to prevent access from malware and ransomware, and lets you transfer local backups to it in a couple clicks. Full System Recovery Systems need the same level of protection as individual files. It takes days to recreate an operating system from scratch, with specific operating system versions, system state, application installations and settings, and user preferences. Retrospect does full system backup and 100 recovery for your entire environment. Petabytes File and System Migration Retrospect offers built-in migration for files, folders, or entire bootable systems, including extended attributes and ACLs, with extensive options for which files to replace if source and 500,000 destination overlap. -
Openio SDS on ARM a Practical and Cost-Effective Object Storage Infrastructure Based on Soyoustart Dedicated ARM Servers
OpenIO SDS on ARM A practical and cost-effective object storage infrastructure based on SoYouStart dedicated ARM servers. Copyright © 2017 OpenIO SAS All Rights Reserved. Restriction on Disclosure and Use of Data 30 September 2017 "1 OpenIO OpenIO SDS on ARM 30/09/2017 Table of Contents Introduction 3 Benchmark Description 4 1. Architecture 4 2. Methodology 5 3. Benchmark Tool 5 Results 7 1. 128KB objects 7 Disk and CPU metrics (on 48 nodes) 8 2. 1MB objects 9 Disk and CPU metrics (on 48 nodes) 10 5. 10MB objects 11 Disk and CPU metrics (on 48 nodes) 12 Cluster Scalability 13 Total disks IOps 13 Conclusion 15 2 OpenIO OpenIO SDS on ARM 30/09/2017 Introduction In this white paper, OpenIO will demonstrate how to use its SDS Object Storage platform with dedicated SoYouStart ARM servers to build a flexible private cloud. This S3-compatible storage infrastructure is ideal for a wide range of uses, offering full control over data, but without the complexity found in other solutions. OpenIO SDS is a next-generation object storage solution with a modern, lightweight design that associates flexibility, efficiency, and ease of use. It is open source software, and it can be installed on ARM and x86 servers, making it possible to build a hyper scalable storage and compute platform without the risk of lock-in. It offers excellent TCO for the highest and fastest ROI. Object storage is generally associated with large capacities, and its benefts are usually only visible in large installations. But thanks to characteristics of OpenIO SDS, this next-generation object storage solution can be cost effective even with the smallest of installations. -
Next-Gen Object Storage and Serverless Computing Agenda
Next-Gen Object Storage and Serverless Computing Agenda 1 About OpenIO 2 SDS: Next-gen Object storage 3 Grid for Apps: Serverless Computing OpenIO About OpenIO OpenIO An experienced team, a robust and mature technology 2017 2016 2006 2007 2009 2012 2014 2015 Global launch OpenIO SDS Idea & 1st Design Production ready! Open 15 PBs OpenIO First release concept dev starts sourced customer company 1st massive prod success over 1PB Lille (FR) | San Francisco | Tokyo OpenIO Quickly Growing Across Geographies And Vertical Markets 35 3 25+ 2 Employees Continents Customers Solutions Mostly engineers, Hem (Lille, France), Paris, Installations ranging OpenIO SDS, next-gen support and pre-sales Tokyo, San Francisco from 3 nodes up to 60 object storage Growing fast Petabytes and billions of objects Teams across EMEA, Grid for Apps, serverless Japan and, soon, US computing framework OpenIO Customers Large Telco Provider Email storage Small objects, high scalability 65 Mln 15 PB mail boxes of storage 650 10 Bln nodes objects 20K services online OpenIO DailyMotion Media & Entertainment High throughput and capacity, 60 PB 80 Mln fat x86 nodes of OpenioIO SDS videos 30% 3 Bln growing/year views per month OpenIO Japanese ISP High Speed Object Storage High number of transactions 6000 10+10 on SSDs Emails per second All-flash nodes 2-sites Indexing Async replication With Grid for Apps OpenIO Teezily E-commerce Website On-premises S3, migrated from Amazon AWS Private Cloud Storage on Public Infrastructure, 350TB 10Bln Cost effective Very small files Objects -
Containerized SQL Query Evaluation in a Cloud
Containerized SQL Query Evaluation in a Cloud Dr. Weining Zhang and David Holland Department of Computer Science The University of Texas at San Antonio {Weining.Zhang, david.holland}@utsa.edu Abstract—Recent advance in cloud computing and light- management system (DBMS) inside a virtual ma- weight software container technology opens up opportu- chine (VM). The user must administer most sys- nities to execute data intensive applications where data tem management activities, including software li- is located. Noticeably, current database services offered censing, installation, configuration, management, on cloud platforms have not fully utilized container tech- nologies. In this paper we present an architecture of a backup, and recovery. This option is difficult to cloud-based, relational database as a service (DBaaS) that scale. Two) Use a NoSQL database, e.g. Apache can package and deploy a query evaluation plan using HBase[1], Google BigQuery[2], and Cassandra[3]. light-weight container technology and the underlying cloud These services are highly efficient and scalable for storage system. We then focus on an example of how a some types of large data, but the user is forced to select-join-project-order query can be containerized and deal with the lack of structured data and strong deployed in ZeroCloud. Our preliminary experimental results confirm that a containerized query can achieve data integrity that is present in relational models. a high degree of elasticity and scalability, but effective Three) Use a managed SQL database service, e.g. mechanisms are needed to deal with data skew. Amazon RDS[4], Google Cloud SQL[5], and MS Index Terms—Database, DBaaS, query evaluation, soft- Azure SQL[6].