Data Governance Defined
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  An Analysis of Data Corruption in the Storage StackAn Analysis of Data Corruption in the Storage Stack Lakshmi N. Bairavasundaram∗, Garth R. Goodson†, Bianca Schroeder‡ Andrea C. Arpaci-Dusseau∗, Remzi H. Arpaci-Dusseau∗ ∗University of Wisconsin-Madison †Network Appliance, Inc. ‡University of Toronto {laksh, dusseau, remzi}@cs.wisc.edu, [email protected], [email protected] Abstract latent sector errors, within disk drives [18]. Latent sector errors are detected by a drive’s internal error-correcting An important threat to reliable storage of data is silent codes (ECC) and are reported to the storage system. data corruption. In order to develop suitable protection Less well-known, however, is that current hard drives mechanisms against data corruption, it is essential to un- and controllers consist of hundreds-of-thousandsof lines derstand its characteristics. In this paper, we present the of low-level firmware code. This firmware code, along first large-scale study of data corruption. We analyze cor- with higher-level system software, has the potential for ruption instances recorded in production storage systems harboring bugs that can cause a more insidious type of containing a total of 1.53 million disk drives, over a pe- disk error – silent data corruption, where the data is riod of 41 months. We study three classes of corruption: silently corrupted with no indication from the drive that checksum mismatches, identity discrepancies, and par- an error has occurred. ity inconsistencies. We focus on checksum mismatches since they occur the most. Silent data corruptionscould lead to data loss more of- We find more than 400,000 instances of checksum ten than latent sector errors, since, unlike latent sector er- mismatches over the 41-month period.
- 
												  Active Storage Activeraid ES Data SheetRAID Storage Evolved Welcome to RAID Storage Evolved Key Features Introducing the ActiveRAID ES — the affordable high performance RAID storage solution Optimized RAID performance in Apple Mac for Apple® users. The ActiveRAID ES was designed using the same principles of the OS X® Applications original ActiveRAID, but represents a previously unobtainable level of value by providing the capabilities of the ActiveRAID in a lower cost configuration that does not sacrifice Modern Linux RAID kernel and RAID the performance, reliability or ease of management of the original ActiveRAID. The controller design ActiveRAID ES was carefully designed for use in business critical applications where Proprietary self-optimizing architecture organizations require a high level of data integrity with ease of installation and management, but not the 24/7 365 day duty cycles of a large enterprise or Post-Production or Broadcast facility. Native Mac OS X Storage management suite The ActiveRAID ES provides redundant power, redundant cooling and redundant Fibre Dashboard Widget monitoring Channel connections, and the same powerful, yet easy to use management suite of the original ActiveRAID. The ES differs from the original ActiveRAID in that it has a single high Enterprise performance and reliability performance RAID controller and uses business class hard drives with a smaller cache size. without Enterprise complexity This results in a very robust RAID solution ideal for business class applications. Performance No long Quick Start guides to memorize of the ActiveRAID ES is impressive at up to 774 MB/s*† throughput and will easily handle Performance profiling and statistical most Mac OS X Server applications. The ActiveRAID ES is the ideal storage for business planning tools class applications such as File serving, database services, web hosting, calendar, Wiki, Mail, iChat and much more.
- 
												  Algorithms for Data Cleaning in Knowledge BasesView metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by VFAST - Virtual Foundation for Advancement of Science and Technology (Pakistan) VFAST Transactions on Software Engineering http://vfast.org/journals/index.php/VTSE@ 2018, ISSN(e): 2309-6519; ISSN(p): 2411-6327 Volume 15, Number 2, May-August, 2018 pp. 78-83 ALGORITHMS FOR DATA CLEANING IN KNOWLEDGE BASES 1 1 1 ADEEL ASHRAF , SARAH ILYAS , KHAWAJA UBAID UR REHMAN AND SHAKEEL AHMAD2 ` 1Department of Computer Science, University of Management and Technology, Punjab, Pakistan 2Department of Computer Sciences, Faculty of Computing and Information Technology in rabigh, King Abdulaziz University Jeddah Email: [email protected] ABSTRACT. Data cleaning is an action which includes a process of correcting and identifying the inconsistencies and errors in data warehouse. Different terms are uses in these papers like data cleaning also called data scrubbing. Using data scrubbing to get high quality data and this is one the data ETL (extraction transformation and loading tools). Now a day there is a need of authentic information for better decision-making. So we conduct a review paper in which six papers are reviewed related to data cleaning. Relating papers discussed different algorithms, methods, problems, their solutions and approaches etc. Each paper has their own methods to solve a problem in efficient way, but all the paper have common problem of data cleaning and inconsistencies. In these papers data inconsistencies, identification of the errors, conflicting, duplicates records etc problems are discussed in detail and also provided the solutions. These algorithms increase the quality of data.
- 
												  A Study Over Importance of Data Cleansing in Data Warehouse Shivangi Rana ErVolume 6, Issue 4, April 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Study over Importance of Data Cleansing in Data Warehouse Shivangi Rana Er. Gagan Prakesh Negi Kapil Kapoor Research Scholar Assistant Professor Associate Professor CSE Department CSE Department ECE Department Abhilashi Group of Institutions Abhilashi Group of Institutions Abhilashi Group of Institutions (School of Engg.) (School of Engg.) (School of Engg.) Chail Chowk, Mandi, India Chail Chowk, Mandi, India Chail Chowk Mandi, India Abstract: Cleansing data from impurities is an integral part of data processing and maintenance. This has lead to the development of a broad range of methods intending to enhance the accuracy and thereby the usability of existing data. In these days, many organizations tend to use a Data Warehouse to meet the requirements to develop decision-making processes and achieve their goals better and satisfy their customers. It enables Executives to access the information they need in a timely manner for making the right decision for any work. Decision Support System (DSS) is one of the means that applied in data mining. Its robust and better decision depends on an important and conclusive factor called Data Quality (DQ), to obtain a high data quality using Data Scrubbing (DS) which is one of data Extraction Transformation and Loading (ETL) tools. Data Scrubbing is very important and necessary in the Data Warehouse (DW). This paper presents a survey of sources of error in data, data quality challenges and approaches, data cleaning types and techniques and an overview of ETL Process.
- 
												  Design Tradeoffs for SSD Reliability Bryan SDesign Tradeoffs for SSD Reliability Bryan S. Kim, Seoul National University; Jongmoo Choi, Dankook University; Sang Lyul Min, Seoul National University https://www.usenix.org/conference/fast19/presentation/kim-bryan This paper is included in the Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST ’19). February 25–28, 2019 • Boston, MA, USA 978-1-939133-09-0 Open access to the Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST ’19) is sponsored by Design Tradeoffs for SSD Reliability Bryan S. Kim Jongmoo Choi Sang Lyul Min Seoul National University Dankook University Seoul National University Abstract SSD Redun- Data dancy mgmt. Flash memory-based SSDs are popular across a wide range scheme of data storage markets, while the underlying storage medium—flash memory—is becoming increasingly unreli- SSD reliability Flash Data Data Host enhancement able. As a result, modern SSDs employ a number of in- memory re-read scrubbing system techniques device reliability enhancement techniques, but none of them RBER of UBER offers a one size fits all solution when considering the multi- Thres. Error underlying requirement for voltage correction dimensional requirements for SSDs: performance, reliabil- memories: SSDs: tuning code ity, and lifetime. 10-4 ~ 10-2 10-17 ~ 10-15 In this paper, we examine the design tradeoffs of exist- Figure 1: SSD reliability enhancement techniques. ing reliability enhancement techniques such as data re-read, intra-SSD redundancy, and data scrubbing. We observe that an uncoordinated use of these techniques adversely affects ory [10]. The high error rates in today’s flash memory are the performance of the SSD, and careful management of the caused by various reasons, from wear-and-tear [11,19,27], to techniques is necessary for a graceful performance degrada- gradual charge leakage [11,14,51] and data disturbance [13, tion while maintaining a high reliability standard.
- 
												  What Is Data Preparation?Analytics Canvas Tutorial: What is Data Preparation? nModal Solutions Inc. All Rights Reserved Analytics Canvas Tutorial: Course Introduction What is Data Preparation? Myth: Data preparation is a time-consuming manual process that is heavily prone to errors. Fact: Data preparation, when done with the right tools, empowers users to maintain control, while accelerating the process and providing a better foundation for decision-making. Better decision- making starts with better data. To an extent, data preparation is synonymous with, or related to: data pre-processing, data scrubbing, business intelligence, data cleansing/cleaning, and ETL (“Extract, Transform, and Load”). In this course, we are going to take a broad view of data preparation. Here is the definition we will work with: data preparation is a step in the data analysis process, in which data from one or more sources is cleaned, and transformed and enriched to improve its quality prior to its use. Data- driven decisions Knowledge discovery Data Science Descriptive analytics and reporting Data preparation Data Sources Figure 1: Data Preparation in the Data Analysis Process The figure above illustrates the position of the data preparation in the data analysis process. It is easy to see the importance of data preparation - as the foundation of business decisions. nModal Solutions Inc. All Rights Reserved Analytics Canvas Tutorial: Course Introduction Conclusion Thank you for reading this tutorial. We invite you to continue with the tutorial training to learn more about using Analytics Canvas. nModal Solutions Inc. All Rights Reserved .
- 
												  Data Storage on UnixData Storage on Unix Patrick Louis 2017-11-05 Published online on venam.nixers.net © Patrick Louis 2017 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of the rightful author. First published eBook format 2017 The author has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Contents Introduction 4 Ideas and Concepts 5 The Overall Generic Architecture 6 Lowest Level - Hardware & Limitation 8 The Medium ................................ 8 Connectors ................................. 10 The Drivers ................................. 14 A Mention on Block Devices and the Block Layer ......... 16 Mid Level - Partitions and Volumes Organisation 18 What’s a partitions ............................. 21 High Level 24 A Big Overview of FS ........................... 24 FS Examples ............................. 27 A Bit About History and the Origin of Unix FS .......... 28 VFS & POSIX I/O Layer ...................... 28 POSIX I/O 31 Management, Commands, & Forensic 32 Conclusion 33 Bibliography 34 3 Introduction Libraries and banks, amongst other institutions, used to have a filing system, some still have them. They had drawers, holders, and many tools to store the paperwork and organise it so that they could easily retrieve, through some documented process, at a later stage whatever they needed. That’s where the name filesystem in the computer world emerges from and this is oneofthe subject of this episode. We’re going to discuss data storage on Unix with some discussion about filesys- tem and an emphasis on storage device.
- 
												  Lustre / ZFS at Indiana UniversityLustre / ZFS at Indiana University HPC-IODC Workshop, Frankfurt June 28, 2018 Stephen Simms Manager, High Performance File Systems [email protected] Tom Crowe Team Lead, High Performance File Systems [email protected] Greetings from Indiana University Indiana University has 8 Campuses Fall 2017 Total Student Count 93,934 IU has a central IT organization • UITS • 1200 Full / Part Time Staff • Approximately $184M Budget • HPC resources are managed by Research Technologies division with $15.9M Budget Lustre at Indiana University 2005 – Small HP SFS Install, 1 Gb interconnnects 2006 – Data Capacitor (535 TB, DDN 9550s, 10Gb) 2008 – Data Capacitor WAN (339 TB, DDN 9550, 10Gb) 2011 – Data Capacitor 1.5 (1.1 PB, DDN SFA-10K, 10Gb) PAINFUL MIGRATION using rsync 2013 – Data Capacitor II (5.3 PB, DDN SFA12K-40s, FDR) 2015 – Wrangler with TACC (10 PB, Dell, FDR) 2016 – Data Capacitor RAM (35 TB SSD, HP, FDR-10) 2016 DC-WAN 2 We wanted to replace our DDN 9550 WAN file system • 1.1 PB DC 1.5 DDN SFA10K • 5 years of reliable production, now aging • Recycled our 4 x 40 Gb Ethernet OSS nodes • Upgrade to Lustre with IU developed nodemap • UID mapping for mounts over distance We wanted to give laboratories bounded space • DDN developed project quota were not available • Used a combination of ZFS quotas and Lustre pools We wanted some operational experience with Lustre / ZFS Why ZFS? • Extreme Scaling • Max Lustre File System Size • LDISKFS – 512 PB • ZFS – 8 EB • Snapshots • Supported in Lustre version 2.10 • Online Data Scrubbing • Provides error detection and
- 
												  Solaris ZFS Administration GuideSolaris ZFS Administration Guide Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 819–5461–12 June 2007 Copyright 2007 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. Sun Microsystems, Inc. has intellectual property rights relating to technology embodied in the product that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more U.S. patents or pending patent applications in the U.S. and in other countries. U.S. Government Rights – Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement and applicable provisions of the FAR and its supplements. This distribution may include materials developed by third parties. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, the Solaris logo, the Java Coffee Cup logo, docs.sun.com, Java, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. Legato NetWorker is a trademark or registered trademark of Legato Systems, Inc. The OPEN LOOK and SunTM Graphical User Interface was developed by Sun Microsystems, Inc.
- 
												  Exploiting Data Longevity for Enhancing the Lifetime of FlashExploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory Wonil Choi Mohammad Arjomand Pennsylvania State University Pennsylvania State University [email protected] [email protected] Myoungsoo Jung Mahmut Kandemir Yonsei University Pennsylvania State University [email protected] [email protected] ABSTRACT HDD [11, 16], and this gap has become larger with the recent tech- Storage-class memory (SCM) combines the benefits of a solid-state nology advances. Bridging this gap has the potential to boost sys- memory, such as high-performance and robustness, with the archival tem performance in all kinds of computing systems. This is possible capabilities and low cost of conventional hard-disk magnetic stor- with a high-performance, high-density and low-cost non-volatile age. Among candidate solid-state nonvolatile memory technologies memory (NVM) technology whose access time (as well as its cost- that could potentially be used to construct SCM, flash memory is a per-bit) falls between DRAM and HDD, and is called Storage Class well-established technology and have been widely used in commer- Memory (SCM) [4, 10]. Despite the recent advances in NVM tech- cially available SCM incarnations. Flash-based SCM enables much nologies (such as Phase Change Memory [35] and Magnetic RAM [39]), better tradeoffs between performance, space and power than disk- it is quite unlikely to exploit them in SCM in any near future (be- based systems. However, write endurance is a significant challenge cause of their high fabrication costs). Instead, this paper assumes for a flash-based SCM (each act of writing a bit may slightly dam- a NAND flash-based SCM which has been widely used in various age a cell, so one flash cell can be written 104–105 times, depending kinds of commercial systems ranging from laptops and desktops to on the flash technology, before it becomes unusable).
- 
												  Tivoli IT Asset Management PortfolioFront cover Deployment Guide Series: Tivoli IT Asset Management Portfolio Learn about the IBM Tivoli IT Asset Management portfolio of products Prepare to install asset management products Plan for a deployment Bart Jacob Rajat Khungar Carlos J. Otálora Y. James Pittard TP Raghunathan David Stephenson ibm.com/redbooks International Technical Support Organization Deployment Guide Series: Tivoli IT Asset Management Portfolio July 2008 SG24-7602-00 Note: Before using this information and the product it supports, read the information in “Notices” on page xiii. First Edition (July 2008) This edition applies to Tivoli Asset Management for IT Version 7.1, Tivoli License Compliance Manager Version 2.3, and Tivoli License Manager for z/OS Version 4.2. © Copyright International Business Machines Corporation 2008. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Figures . vii Tables . xi Notices . xiii Trademarks . xiv Preface . xv The team that wrote this book . xv Become a published author . xviii Comments welcome. xviii Part 1. Service Management and Asset Management Overview . 1 Chapter 1. IBM Service Management platform . 3 1.1 IBM Service Management . 4 1.1.1 Why Do Businesses Need IT Service Management . 4 1.1.2 What is IBM Service Management . 6 1.2 Information Technology Infrastructure Library (ITIL) . 8 1.2.1 ITIL Version 3 . 9 1.2.2 Critical Success Factors to Implement ITIL. 11 1.3 IBM Tivoli Unified Process . 12 1.3.1 ITUP Composer . 14 Chapter 2. IBM Tivoli IT Asset Management portfolio .
- 
												  Protecting Oracle Exadata Database Machine with Oracle ZFS Storage Appliance Configuration Best PracticesProtecting Oracle Exadata Database Machine with Oracle ZFS Storage Appliance Configuration Best Practices ORACLE WHITE PAPER | JUNE 2017 Table of Contents Executive Overview 1 Introduction 5 Selecting a Data Protection Strategy 6 Traditional Backup Strategy 6 Incrementally Updated Backup Strategy 9 Leveraging Data Deduplication 9 Best Practices for Oracle ZFS Storage Appliance Systems 11 Configuring an Oracle ZFS Storage Appliance System 12 Choosing a Controller 12 Choosing the Correct Disk Shelves 13 Choosing a Storage Profile 14 Configuring the Storage Pools 15 Using Write-Flash Accelerators and Read-Optimized Flash 16 Selecting InfiniBand or 10 GbE Connectivity 16 Choosing Direct NFS Client 17 Configuring IP Network Multipathing 17 Configuring Oracle Exadata 19 Choosing NFSv3 or NFSv4 19 Configuring Direct NFS Client 20 Configuring Oracle RMAN Backup Services 23 InfiniBand Best Practices 24 Preparing Oracle Database for Backup 24 PROTECTING ORACLE EXADATA DATABASE MACHINE WITH ORACLE ZFS STORAGE APPLIANCE: CONFIGURATION BEST PRACTICES Best Practices for Traditional Oracle RMAN Backup 25 Configuring the Network Shares 25 Oracle RMAN Configuration 28 Best Practices for an Incrementally Updated Backup Strategy 31 Configuring an Oracle ZFS Storage Appliance System 31 Configuring the Oracle RMAN Environment 32 Oracle RMAN Backup and Restore Throughput Performance Sizing for Oracle ZFS Storage ZS5-4 and Oracle ZFS Storage ZS5-2 34 Performance Sizing with Deduplication 36 Conclusion 38 References 39 PROTECTING ORACLE EXADATA DATABASE MACHINE WITH ORACLE ZFS STORAGE APPLIANCE: CONFIGURATION BEST PRACTICES Executive Overview Protecting the mission-critical data that resides on Oracle Exadata Database Machine (Oracle Exadata) is a top priority. The Oracle ZFS Storage Appliance family of products is ideally suited for this task due to superior performance, enhanced reliability, extreme network bandwidth, powerful features, simplified management, and cost-efficient configurations.