The Illusion of Space and the Science of Data Compression And

Total Page:16

File Type:pdf, Size:1020Kb

The Illusion of Space and the Science of Data Compression And The Illusion of Space and the Science of Data Compression and Deduplication Bruce Yellin EMC Proven Professional Knowledge Sharing 2010 Bruce Yellin Advisory Technology Consultant EMC Corporation [email protected] Table of Contents What Was Old Is New Again ......................................................................................................... 4 The Business Benefits of Saving Space ....................................................................................... 6 Data Compression Strategies ....................................................................................................... 9 Data Compression Basics ....................................................................................................... 10 Compression Bakeoff .............................................................................................................. 13 Data Deduplication Strategies .................................................................................................... 16 Deduplication - Theory of Operation ....................................................................................... 16 File Level Deduplication - Single Instance Storage ................................................................. 21 Fixed-Block Deduplication ....................................................................................................... 23 Variable-Block Deduplication .................................................................................................. 24 Content-Aware Deduplication .................................................................................................. 26 Delta Block Optimization ......................................................................................................... 27 Primary and Secondary Storage Optimization ............................................................................ 29 In-Line versus Post-processing ............................................................................................... 31 Primary Storage Optimization - In-line versus Post-Processing .......................................... 33 Secondary Storage Optimization - In-line versus Post-Processing ..................................... 36 Source versus Target .............................................................................................................. 37 Secondary Storage Optimization - Who Makes What? ........................................................... 39 Optimization Software versus Hardware ................................................................................. 40 Data Communications Optimization ............................................................................................ 41 The Law and Storage Optimization ............................................................................................. 43 Storage Optimization “Gotchas” and Misadventures .................................................................. 45 Before You Purchase a Optimization System ............................................................................. 47 Conclusion .................................................................................................................................. 52 Appendix I – MD5 and SHA-1 Algorithms ................................................................................... 55 Appendix II – SNIA Deduplication and VTL Features ................................................................. 57 Footnotes .................................................................................................................................... 58 Disclaimer: The views, processes or methodologies published in this compilation are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies 2010 EMC Proven Professional Knowledge Sharing 2 2009 was a turbulent year for the IT world. Faced with severe macroeconomic pressures and uncertainty, we have all participated in cost containment and reduction plans trying to do more with less. In spite of all the grim financial news, the information age continues to expand1 at an incredible rate forcing data storage to grow 40-60% a year. For example, Americans sent 110 billion text messages in a single month2 with a projected annual rate of 2 trillion messages. The growth is attributed in part to the adoption of business intelligence suites, enterprise and web applications3, and additional regulations. In our own everyday lives, we create and share multimedia documents, depend on email, tweet, text, and instant messages. Responding to the growth, companies add (virtual) servers, network devices, and more storage. Like an iceberg where the bulk of the mass lies beneath the surface, each additional terabyte of primary data might need 5-30 times that amount in additional tape or disk backup capacity. More is spent on floor space, power and cooling. There have been substantial increases in networking costs, especially as a greater emphasis is placed on disaster recovery (DR). And of course, personnel are placed under greater stress, especially when backups to not complete on time. This is clearly not the “green” way to go! The outlook for 2010 and beyond does not provide any relief according to a recent study4. IDC says information is increasing by a factor of 5 while 2,500 -FOLD 5 DVD 2,000 Growth in YEARS 4 RFID budgets are increasing by only 20% and IT staff is Digital TV MP3 players 1,500 increasing by only 10%. They also found the Digital cameras Camera phones, VoIP 1,000 Medical imaging, Laptops, administrative and overhead storage costs are 4-7X 486 Data center applications, Games Exabytes Satellite images, GPS, ATMs, Scanners 500 Sensors, Digital radio, DLP theaters, Telematics the capital expense over the next four years! Peer-to-peer, Email, Instant messaging, Videoconferencing, Exabytes CAD/CAM, Toys, Industrial machines, Security systems, Appliances 0 2008 2009 2010 2011 2012 Improving storage operational efficiency is imperative for organizations with flat or slightly increasing budgets while they try to also improve performance and reliability. IT leaders have begun to tier data, use thin provisioning, set hard quotas, and even archive inactive data. Most managers are also placing big bets on compression and the hottest concept, deduplication. 2010 EMC Proven Professional Knowledge Sharing 3 Compression and deduplication algorithms are powerful weapons to use against “evil” data sprawl. They both effectively reduce physical storage requirements by leveraging CPU Not while I’m cycles. The secret is to know how, when, and where to use here! them. Over the millennia, the common element has been to I Love Data save space. Less space saves money, time, and even fosters Sprawl ! new concepts that might ordinarily be considered impractical. What Was Old Is New Again During World War II, a blivet was a slang expression conveying the idea that it was impossible to get “ten pounds of manure in a five pound bag”. A blivet, also synonymous for a seemingly intractable problem, might have also been an apt description of trying to store 4TB of data on a 2TB disk drive, except for the science of compression. We live in a miraculous world where every so often, technology changes the rules of the game. Space compression is today’s game changer allowing us to reduce data by 30%, 60% and even 90%! Compression, a gem of the IT world, is one of those rare technologies with a lineage going back thousands of years. The ancient Greeks wrote their documents on papyrus, a thick paper-like material from the pith of the papyrus plant5. It was scarce and scriptio continua ("continuous script" in Latin) allowed people to squeeze more words into smaller areas6 by removing spaces in sentences. For example, here is part of a 181 C.E. “letter from Apollonius and Herminus to Herodes and other managers of the public bank, authorizing them to receive the tax on the sale of a slave.”7. Look closely and you will not see a single space! The Romans made extensive use of Latin abbreviations and left out spaces practicing scriptio continua on coins and other mementos to save space. For example, the coin to the right dating back to the Roman Empire uses these 32 letters: “NEROCLAVDCAESARAVGGERPMTRPIMPPP” 2010 EMC Proven Professional Knowledge Sharing 4 Inserting spaces gives us these individual words: NERO (his name) CLAVD (part of his name and this stands for Claudius) CAESAR (an Imperial title with its roots in the family name of Julius Caesar) AVG (is AVGVSTVS (Augustus), the highest authority in Rome) GER (is GERMANICVS, Ruler or Conqueror of Germania) PM (is PONTIFEX MAXIMVS, or supreme priest) TR P(stands for TRIBVNICIA POTESTAS, Power or Potency of the Tribunate) IMP (is IMPERATOR meaning the ruler is the Commander-In-Chief of the armed forces) PP (is PATER PATRIAE or father of the country). The translation has 198 letters, spaces and punctuation: “Caesar Augustus Nero Claudius, High Priest and Ruler of Rome and Germania, Supreme Commander of the armies of Rome, the father of his country, leader of the Triumvirate for as long as he shall live.8” This produced a compression ratio of 198/32 = 6:1. COMPRESSION OR DEDUPLICATION RATIO - The reduction in space expressed as a ratio. If we shrink 30GB down to 10GB, it has a 3:1 ratio. original _ size 30GB Ratio = = = 3:1 compressed _ size 10GB I am not sure who invented the shot glass, but next time you visit a pub, notice how they benefit from compression
Recommended publications
  • Dell Inc (4331) 10-K
    DELL INC (4331) 10-K Annual report pursuant to section 13 and 15(d) Filed on 03/13/2012 Filed Period 02/03/2012 Table of Contents UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 Form 10-K (Mark One) x ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the fiscal year ended February 3, 2012 or o TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the transition period from to Commission file number: 0-17017 Dell Inc. (Exact name of registrant as specified in its charter) Delaware 74-2487834 (State or other jurisdiction of (I.R.S. Employer incorporation or organization) Identification No.) One Dell Way, Round Rock, Texas 78682 (Address of principal executive offices) (Zip Code) Registrant’s telephone number, including area code: 1-800-BUY-DELL Securities registered pursuant to Section 12(b) of the Act: Title of each class Name of each exchange on which registered Common Stock, par value $.01 per share The NASDAQ Stock Market LLC (NASDAQ Global Select Market) Securities registered pursuant to Section 12(g) of the Act: None Indicate by check mark if the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act. Yes o No R Indicate by check mark if the registrant is not required to file reports pursuant to Section 13 or Section 15(d) of the Act. Yes o No R Indicate by check mark whether the registrant (1) has filed all reports required to be filed by Section 13 or 15(d) of the Securities Exchange Act of 1934 during the preceding 12 months (or for such shorter period that the registrant was required to file such reports), and (2) has been subject to such filing requirements for the past 90 days.
    [Show full text]
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
    Case 4:13-md-02420-YGR Document 2321 Filed 05/16/18 Page 1 of 74 1 2 3 4 5 6 7 8 UNITED STATES DISTRICT COURT 9 NORTHERN DISTRICT OF CALIFORNIA 10 OAKLAND DIVISION 11 IN RE: LITHIUM ION BATTERIES Case No. 13-md-02420-YGR ANTITRUST LITIGATION 12 MDL No. 2420 13 FINAL JUDGMENT OF DISMISSAL This Document Relates To: WITH PREJUDICE AS TO LG CHEM 14 DEFENDANTS ALL DIRECT PURCHASER ACTIONS 15 AS MODIFIED BY THE COURT 16 17 18 19 20 21 22 23 24 25 26 27 28 FINAL JUDGMENT OF DISMISSAL WITH PREJUDICE AS TO LG CHEM DEFENDANTS— Case No. 13-md-02420-YGR Case 4:13-md-02420-YGR Document 2321 Filed 05/16/18 Page 2 of 74 1 This matter has come before the Court to determine whether there is any cause why this 2 Court should not approve the settlement between Direct Purchaser Plaintiffs (“Plaintiffs”) and 3 Defendants LG Chem, Ltd. and LG Chem America, Inc. (together “LG Chem”), set forth in the 4 parties’ settlement agreement dated October 2, 2017, in the above-captioned litigation. The Court, 5 after carefully considering all papers filed and proceedings held herein and otherwise being fully 6 informed, has determined (1) that the settlement agreement should be approved, and (2) that there 7 is no just reason for delay of the entry of this Judgment approving the settlement agreement. 8 Accordingly, the Court directs entry of Judgment which shall constitute a final adjudication of this 9 case on the merits as to the parties to the settlement agreement.
    [Show full text]
  • Capacitors Exclusions 2017-06-06.Xlsx
    Case 3:14-cv-03264-JD Document 1705-4 Filed 06/26/17 Page 1 of 13 EXHIBIT D Case 3:14-cv-03264-JD Document 1705-4 Filed 06/26/17 Page 2 of 13 In re Capacitors Antitrust Litigation Exclusion Report Name Exclusion ID # Postmark Date 1 Acer, Inc. 41778552-1 2/10/2017 2 Acer America Corporation 41778552-2 2/10/2017 3 Gateway, Inc. 41778552-3 2/10/2017 4 Gateway U.S. Retail, Inc (f/k/a eMachines, Inc) 41778552-4 2/10/2017 5 Packard Bell B.V. 41778552-5 2/10/2017 6 BlackBerry Limited (f/k/a Research in Motion Limited) 41778553-1 2/15/2017 7 BlackBerry Corporation (f/k/a Research in Motion Corporation) 41778553-2 2/15/2017 8 BlackBerry Singapore Pte. Limited (f/k/a Research in Motion Singapore Pte. Limited) 41778553-3 2/15/2017 9 BlackBerry UK Limited (f/k/a Research in Motion UK Limited 41778553-4 2/15/2017 10 Plexus Corp. 41778554-1 2/14/2017 11 Plexus Asia, Ltd. 41778554-2 2/14/2017 12 Plexus Corp. Limited 41778554-3 2/14/2017 13 Plexus Corporation (UK) Limited 41778554-4 2/14/2017 14 Plexus Deutschland GmbH 41778554-5 2/14/2017 15 Plexus Electronica S. de R.L. de C.V. 41778554-6 2/14/2017 16 Plexus (Hangzhou) Co., Ltd. 41778554-7 2/14/2017 17 Plexus International Services, Inc. 41778554-8 2/14/2017 18 Plexus Intl. Sales & Logistics, LLC 41778554-9 2/14/2017 19 Plexus Manufacturing Sdn.
    [Show full text]
  • In the Court of Chancery of the State of Delaware City
    EFiled: Feb 19 2013 09:26AM EST Transaction ID 49611480 Case No. 8329­ IN THE COURT OF CHANCERY OF THE STATE OF DELAWARE CITY OF ROSEVILLE EMPLOYEES RETIREMENT SYSTEM, Plaintiff, Civil Action No. v. DELL, INC., MICHAEL DELL, JAMES W. BREYER, DONALD J. CARTY, JANET F. CLARK, LAURA CONIGLIARO, KENNETH M. DUBERSTEIN, WILLIAM H. GRAY, III, GERARD J. KLEISTERLEE, KLAUS S. LUFT, ALEX J. MANDL, SHANTANU NARAYEN, ROSS PEROT, JR., DENALI HOLDING INC., DENALI INTERMEDIATE INC., DENALI ACQUIROR INC., SILVER LAKE PARTNERS, L.P., SILVER LAKE PARTNERS III, L.P., SILVER LAKE PARTNERS IV, L.P., SILVER LAKE TECHNOLOGY INVESTORS III, L.P., and MSDC MANAGEMENT, L.P., Defendants. VERIFIED CLASS ACTION COMPLAINT City of Roseville Employees’ Retirement System (“Plaintiff”), by and through its undersigned counsel, upon knowledge as to itself and upon information and belief as to all other matters, alleges as follows: NATURE OF THE ACTION 1. This action challenges Michael Dell’s attempt to take Dell, Inc. (“Dell” or the “Company”) private in a transaction (the “Going Private Transaction”) that offers Dell’s public shareholders an egregiously unfair price and threatens to foreclose them from sharing in any of the benefits to be obtained by the Company’s unfolding turnaround plan. The Going Private Transaction offers Dell’s public shareholders $13.65 per share – a price so patently unfair that it prompted one shareholder to question whether company insiders are “trying to steal the company because of current market conditions.” The $13.65 per share purchase price is approximately 3% less than the price at which the stock was trading just days prior to the Transaction’s announcement, represents only a 25% premium over the stock’s trading price before news of a potential transaction was reported, and amounts to a 34% discount from the prices at which Dell was trading a year ago.
    [Show full text]
  • I Introduction
    PPM Performance with BWT Complexity: A New Metho d for Lossless Data Compression Michelle E ros California Institute of Technology e [email protected] Abstract This work combines a new fast context-search algorithm with the lossless source co ding mo dels of PPM to achieve a lossless data compression algorithm with the linear context-search complexity and memory of BWT and Ziv-Lemp el co des and the compression p erformance of PPM-based algorithms. Both se- quential and nonsequential enco ding are considered. The prop osed algorithm yields an average rate of 2.27 bits per character bp c on the Calgary corpus, comparing favorably to the 2.33 and 2.34 bp c of PPM5 and PPM and the 2.43 bp c of BW94 but not matching the 2.12 bp c of PPMZ9, which, at the time of this publication, gives the greatest compression of all algorithms rep orted on the Calgary corpus results page. The prop osed algorithm gives an average rate of 2.14 bp c on the Canterbury corpus. The Canterbury corpus web page gives average rates of 1.99 bp c for PPMZ9, 2.11 bp c for PPM5, 2.15 bp c for PPM7, and 2.23 bp c for BZIP2 a BWT-based co de on the same data set. I Intro duction The Burrows Wheeler Transform BWT [1] is a reversible sequence transformation that is b ecoming increasingly p opular for lossless data compression. The BWT rear- ranges the symb ols of a data sequence in order to group together all symb ols that share the same unb ounded history or \context." Intuitively, this op eration is achieved by forming a table in which each row is a distinct cyclic shift of the original data string.
    [Show full text]
  • Dc5m United States Software in English Created at 2016-12-25 16:00
    Announcement DC5m United States software in english 1 articles, created at 2016-12-25 16:00 articles set mostly positive rate 10.0 1 3.8 Google’s Brotli Compression Algorithm Lands to Windows Edge Microsoft has announced that its Edge browser has started using Brotli, the compression algorithm that Google open-sourced last year. 2016-12-25 05:00 1KB www.infoq.com Articles DC5m United States software in english 1 articles, created at 2016-12-25 16:00 1 /1 3.8 Google’s Brotli Compression Algorithm Lands to Windows Edge Microsoft has announced that its Edge browser has started using Brotli, the compression algorithm that Google open-sourced last year. Brotli is on by default in the latest Edge build and can be previewed via the Windows Insider Program. It will reach stable status early next year, says Microsoft. Microsoft touts a 20% higher compression ratios over comparable compression algorithms, which would benefit page load times without impacting client-side CPU costs. According to Google, Brotli uses a whole new data format , which makes it incompatible with Deflate but ensures higher compression ratios. In particular, Google says, Brotli is roughly as fast as zlib when decompressing and provides a better compression ratio than LZMA and bzip2 on the Canterbury Corpus. Brotli appears to be especially tuned for the web , that is for offline encoding and online decoding of Web assets, or Android APKs. Google claims a compression ratio improvement of 20–26% over its own Zopfli algorithm, which still provides the best compression ratio of any deflate algorithm.
    [Show full text]
  • Truenas® Privacy and Security Compliance Features
    TRUENAS® PRIVACY AND SECURITY COMPLIANCE FEATURES Risk accountability EPR HIPAA information users PCI DSS ZFS HITECH TrueNAS ePHI branches corporate EPH health internal storage Compliance FreeNAS external process Audit encryption management patient GUI GDPRBackup data GRCFIPS 140-2 FreeBSD technology Governance enterprise NO MATTER ITS SIZE, EVERY BUSINESS TRUENAS PROVIDES FEATURES FOR REAL OPERATES IN A REGULATED ENVIRONMENT SECURITY AND COMPLIANCE Thanks to legislation like the European Union General TrueNAS is a unified file, block and object storage Data Protection Regulation (GDPR), it’s no longer only solution built on the OpenZFS self-healing file system government and medical providers that need to comply that supports hybrid and all-flash configurations. Unlike with strict privacy and security regulations. If your many competing storage systems, each TrueNAS scales business handles credit card information or customer from a few workgroup terabytes to multiple private personal information, you must navigate an alphabet cloud petabytes, all with a common user experience and soup of regulations that each include distinct obligations full data interoperability. and equally-distinct penalties for failing to comply with those obligations. From PCI DSS to the GDPR to TrueNAS uses a myriad of network and storage HIPAA, a common theme of data security stands out as encryption techniques to safeguard your data a fundamental requirement for regulation compliance throughout its life cycle and help assure your regulation and TrueNAS is ready
    [Show full text]
  • Nxadmin CLI Reference Guide Unity Iv Contents
    HYPER-UNIFIED STORAGE nxadmin Command Line Interface Reference Guide NEXSAN | 325 E. Hillcrest Drive, Suite #150 | Thousand Oaks, CA 91360 USA Printed Thursday, July 26, 2018 | www.nexsan.com Copyright © 2010—2018 Nexsan Technologies, Inc. All rights reserved. Trademarks Nexsan® is a trademark or registered trademark of Nexsan Technologies, Inc. The Nexsan logo is a registered trademark of Nexsan Technologies, Inc. All other trademarks and registered trademarks are the property of their respective owners. Patents This product is protected by one or more of the following patents, and other pending patent applications worldwide: United States patents US8,191,841, US8,120,922; United Kingdom patents GB2466535B, GB2467622B, GB2467404B, GB2296798B, GB2297636B About this document Unauthorized use, duplication, or modification of this document in whole or in part without the written consent of Nexsan Corporation is strictly prohibited. Nexsan Technologies, Inc. reserves the right to make changes to this manual, as well as the equipment and software described in this manual, at any time without notice. This manual may contain links to web sites that were current at the time of publication, but have since been moved or become inactive. It may also contain links to sites owned and operated by third parties. Nexsan is not responsible for the content of any such third-party site. Contents Contents Contents iii Chapter 1: Accessing the nxadmin and nxcmd CLIs 15 Connecting to the Unity Storage System using SSH 15 Prerequisite 15 Connecting to the Unity
    [Show full text]
  • Implementing Associative Coder of Buyanovsky (ACB)
    Implementing Associative Coder of Buyanovsky (ACB) data compression by Sean Michael Lambert A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Montana State University © Copyright by Sean Michael Lambert (1999) Abstract: In 1994 George Mechislavovich Buyanovsky published a basic description of a new data compression algorithm he called the “Associative Coder of Buyanovsky,” or ACB. The archive program using this idea, which he released in 1996 and updated in 1997, is still one of the best general compression utilities available. Despite this, the ACB algorithm is still barely understood by data compression experts, primarily because Buyanovsky never published a detailed description of it. ACB is a new idea in data compression, merging concepts from existing statistical and dictionary-based algorithms with entirely original ideas. This document presents several variations of the ACB algorithm and the details required to implement a basic version of ACB. IMPLEMENTING ASSOCIATIVE CODER OF BUYANOVSKY (ACB) DATA COMPRESSION by Sean Michael Lambert A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science MONTANA STATE UNIVERSITY-BOZEMAN Bozeman, Montana April 1999 © COPYRIGHT by Sean Michael Lambert 1999 All Rights Reserved ii APPROVAL of a thesis submitted by Sean Michael Lambert This thesis has been read by each member of the thesis committee and has been found to be satisfactory regarding content, English usage, format, citations, bibliographic style, and consistency, and is ready for submission to the College of Graduate Studies. Brendan Mumey U /l! ^ (Signature) Date Approved for the Department of Computer Science J.
    [Show full text]
  • Ed 377 034 Title Institution Pub Date Note Available
    DOCUMENT RESUME ED 377 034 SE 054 362 TITLE Education & Recycling: Educator's Waste Management Resource and Activity Guide 1994. INSTITUTION California State Dept. of Conservation. Sacramento. Div. of Recycling. PUB DATE 94 NOTE 234p. AVAILABLE FROMCalifornia Department of Conservation, Division of Recycling, 801 K Street, MS 22-57, Sacramento, CA 95814. PUB TYPE Guides Classroom Use Teaching Guides (For Teacher) (052) EDRS PRICE MF01/PC10 Plus Postage. DESCRIPTORS Bilingual Instructional Materials; *Class Activities; Constructivism (Learning); *Educational Resources; Elementary Secondary Education; *Environmental Education; Evaluation Methods; *Recycling; Solid Wastes; Teaching Guides; *Waste Disposal; Worksheets IDENTIFIERS *California ABSTRACT This activity guide for grades K-12 reinforces the concepts of recycling, reducing, and reusing through a series of youth-oriented activities. The guide incorporates a video-based activity, multiple session classroom activities, and activities requiring group participation and student conducted research. Constructivist learning theory was considered during the development of activities. The guide is divided into the following sections:(1) 12 elementary and mieldle school classroom activities;(2) eight middle and high school classroom activities;(3) school recycling programs;(4) trivia, facts, and other information;(5) listing of 338 supplementary materials (activity, booklets, coloring and comic books, books, catalogs, curricula, extras, magazines, recycling programs, and videos);(6) listing of 39 environmental organizations; (7) approximately 1,300 California local government and community contacts; and (8)a glossary. Many activities incorporate science, history and social science, English and languege arts, and mathematics and art. Most activities include methods for teacher and student evaluations. Spanish translations are provided for some activity materials, including letters to parents, several take-home activities and the glossary.
    [Show full text]
  • An Analysis of XML Compression Efficiency
    An Analysis of XML Compression Efficiency Christopher J. Augeri1 Barry E. Mullins1 Leemon C. Baird III Dursun A. Bulutoglu2 Rusty O. Baldwin1 1Department of Electrical and Computer Engineering Department of Computer Science 2Department of Mathematics and Statistics United States Air Force Academy (USAFA) Air Force Institute of Technology (AFIT) USAFA, Colorado Springs, CO Wright Patterson Air Force Base, Dayton, OH {chris.augeri, barry.mullins}@afit.edu [email protected] {dursun.bulutoglu, rusty.baldwin}@afit.edu ABSTRACT We expand previous XML compression studies [9, 26, 34, 47] by XML simplifies data exchange among heterogeneous computers, proposing the XML file corpus and a combined efficiency metric. but it is notoriously verbose and has spawned the development of The corpus was assembled using guidelines given by developers many XML-specific compressors and binary formats. We present of the Canterbury corpus [3], files often used to assess compressor an XML test corpus and a combined efficiency metric integrating performance. The efficiency metric combines execution speed compression ratio and execution speed. We use this corpus and and compression ratio, enabling simultaneous assessment of these linear regression to assess 14 general-purpose and XML-specific metrics, versus prioritizing one metric over the other. We analyze compressors relative to the proposed metric. We also identify key collected metrics using linear regression models (ANOVA) versus factors when selecting a compressor. Our results show XMill or a simple comparison of means, e.g., X is 20% better than Y. WBXML may be useful in some instances, but a general-purpose compressor is often the best choice. 2. XML OVERVIEW XML has gained much acceptance since first proposed in 1998 by Categories and Subject Descriptors the World-Wide Web Consortium (W3C).
    [Show full text]
  • EMC Symmetrix Is “Enterprise Proven”  Gold Standard Remote Replication with SRDF  Performance Leadership Including Flash Drives
    Storage Solutions for Enterprise Consolidation with VMware Peter C. Conway Vice President – Business & Product Operations Symmetrix Product Group EMC Agenda • Overview • Business Continuity • Customer Case Study • Scalable Performance • Customer Case Study • Scaling VMware VDI 22 EMC’s Storage 2008 Portfolio CLARiiON CX4 UltraScale Series DMX-4 950 Symmetrix CX4-960 NS80 NSX Flash drives Celerra CX4 CX4-240 CX4-480 -120 Flash NS40 drives Flash NS40G drives NS20 Fiber Channel and iSCSI NS80G DMX-4 NAS, FC, & iSCSI AX4 Virtual Provisioning EMC ControlCenter Consumer Storage SMC Navisphere Celerra Manager Connectrix SAN Connectivity Smarts PowerPath MPFSi RecoverPoint LifeLine SRDF TimeFinder MirrorView SnapView Mozy Remote Backup Replication Manager Replistor Xtender Family Powered by EMC Fortress RecoverPoint NetWorker Avamar CDP and CRR DLm DL4106 4080 DL4206 Software-as- a-Service (SaaS) DL3D 3000 Invista 2.0 DL210 DL3D SAN Virtualization Centera 1500 EMC Centera Rainfinity Gen 4 LP Node DL4406 Rainfinity Global File Virtualization EMC Disk Library 33 Enterprise Customers Prefer Storage Area Networks* SANs remain the preferred platform for virtualized environments by 2:1 margin and growing • - 12 Months: 43% SAN Favorable vs. 29% for NAS • Today: 46% SAN Favorable vs. 25% for NAS • +12 Months: 59% SAN Favorable vs. 24% for NAS “EMC remains well ahead of the pack as the most preferred storage vendor in virtualized environments, with IBM a distant second.” *Goldman Sachs, IT Spending Survey , 08 September 2008- p. 11 44 Enterprise
    [Show full text]