Big Data Storage IGATE

Total Page:16

File Type:pdf, Size:1020Kb

Big Data Storage IGATE Big Data Storage Apurva Vaidya Anshul Kosarwal IGATE 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Agenda Abstract Big Data and its characteristics Key requirements of Big Data Storage Triage and Analytics Framework Big Data Storage Choices Hyper scale, Scale Out NAS, Object Storage Impact of Flash memory and tiered storage technologies Big Data Storage Architecture Performance Optimization Conclusion 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Abstract Big Data is a key buzzword in IT. And for this, proper storage options should be in place to handle The amount of data to be stored and BIG DATA processed has exploded to a mind boggling degree. Although "Big Data Analytics" has evolved to get a handle on this information content, the data must be appropriately stored for easy and efficient retrieval. 2014 Storage Developer Conference. © IGATE. All Rights Reserved. “The pessimist – complains about the wind The optimist – expects it to change The leader – adjusts the sails” John Maxwell 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Big Data & Its Characteristics 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Storage Capacity LARGE ANALOG 19 Exabytes DIGITAL 280 Exabytes Beginning of the Digital age Global Information Storage Capacity ANALOG 2.6 Exabytes 94% 2017 DIGITAL 7,235 Exabytes 0.02 Exabytes 50% INFORMATION STORAGE CAPACITYINFORMATION 25% % Digital 3% SMALL 1% 1986 1993 2000 2002 2007 Source: Business Computing World (http://www.businesscomputingworld.co.uk/storage-predictions-for-2014/) 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Key Characteristics of Big Data It’s estimated that 2.5 TRILLION GIGABYTES Of data are created each day The Variety Volume Four V’s DIFFERENT SCALE OF DATA FORMS OF Terabytes of Big DATA Records Transactions Structured Tables, files Data Unstructured Semistructured By 2016 there will be 18.9 Billion Sensor Data Network Connections ? Veracity Velocity UNCERTAINTY ANALYSIS OF DATA Batch 1 IN 3 BUSINESS Near time Real time LEADERS Streams Don’t trust the information they use to make decisions Source: http://www.ibmbigdatahub.com/infographic/four-vs-big-data 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Prioritizing Things • Data growth • Enterprise growth • Analytics/business • System • Attracting & Intelligence performance and retaining new • Mobile scalability customers technologies • Network • Reducing • Cloud computing Concerns congestion and enterprise costs connectivity Business Priorities Business Technology Preferences Technology 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Requirements of Big Data Storage 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Key Requirements of Big Data Storage Provide tiered Scalable storage Self-Managing Highly Available Requirements Widely Accessible Security Self Healing Integrate with legacy applications 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Triage and Analytics Framework 2014 Storage Developer Conference. © IGATE. All Rights Reserved. TAF Overview Device Sensor Logs Data Log Analysis Sensor Data Analytics Pattern Identification Real-time Analytics Triage Predictive Query Expression Analysis Columnar Pattern Matching Storage Hadoop MPP Database Actions 2014 Storage Developer Conference. © IGATE. All Rights Reserved. The way we discovered our needs IGATE has its ‘Triage and Hadoop DAS Storage But as data grew, there was performance Analytics framework’ for proactive diagnostic and drop predictive analysis Working Good May be it was time for us to think about the storage that goes below and its scalability So, in order to OPTIMIZE storage we went on to discover storage options which can scale Fast for computation Hadoop Leverages Individual nodes, servers, CPU, memory and RAM RAID support Scale-out nodes on a Hadoop Efficient For cost and performance framework. Global Namespace Accessed and managed as a single system High Availability 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Big Data Storage Choices 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Storage Options 1 2 3 4 DAS, Hyper Traditional NAS, Object Scale Tape, Scale Storage SAN Storage Out NAS etc Storage 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Traditional Storage Options Traditional Storage Options Fails Business Human Machine When It Comes to BIG DATA Data Information Data Needs UnStructured Data Legacy Storage Semi Options Structured Data Structured Data Traditional Storage Disk Storage Tape Storage Optical Jukebox 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Scale out NAS, Object Storage and HSS 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Scale Out NAS Built to Deliver a flexible and scalable platform for Big Data. Simplify management of big data storage infrastructure. Increase efficiency and reduce costs. Why it can be useful? Simple to scale Scale Out NAS is a very simple to scale platform for Big Data. Predictable The performance and reliability is predictable Efficient It leverages all the resources in an efficient way Available High Availability support is within the FS and hence can be guaranteed Enterprise-proven Technology is matured and time tested 18 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Object Storage Built to Optimized object storage systems purpose built for petabytes scale, which also enables organizations to build cost-efficient storage pools for all their unstructured data needs. Provide a highly reliable, infinitely scalable, flexible pricing options, and the ability to store, retrieve, and leverage data the way you want to. It provides? Maximum Portability Lowest cost of ownership Provide set of interfaces including C++ and Java APIs, REST for web It is delivered on commodity based platform for lowest cost of applications and file gateways to support file-based workflows. ownership. Scalable to Petabytes and beyond Index and Search The object storage systems are scalable to petabytes of data, Indexing of the metadata allows extremely fast filtering and billions of files and beyond. search capabilities. Immutable data Cloud Capability (Multi Tenancy) Provides best option for unstructured immutable data Storage can be carved up into virtual containers based on who owns the data. 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Hyper Scale Storage Built for Coined to represent the scale-out design of IT systems with the ability to massively expand under a controlled and efficient managed, software-defined framework. It enables the construction of storage facilities which caters for very high volumes of processing and manages and store petabytes (PB) or even greater quantities of data. Key Criteria It works with a notion of treating data independent from the hardware, Elasticity making it easy to grow or shrink volumes as needed, with no interruption in services. Scalability It supports not just terabytes, but petabytes of data, right from the start. High performance and It eliminates hot spots, I/O bottlenecks and latency, hence offering very availability fast access to information. It supports any and every file type, benefiting from the collective Standards based and open collaboration inherent in any open-source system, resulting in constant source testing, feedback and improvements. It supports or includes a complete storage OS stack, hence has a modular, Leading-edge architecture stackable architecture which enables data to be stored in native formats and and design which also avoid metadata indexing, instead storing and locating data20 algorithmically. 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Analytical Comparison Scale Out NAS Object Storage Hyper Scale Storage Scale Out NAS caters problem with software Object storage can scale Hyperscale architectures typically include dozens management approach., that enables seamlessly for number of to hundreds of generic servers running either a Scalability virtualization layer to makes the nodes behave objects, Exabyte capacity and clustered application or a hypervisor. like a single system. distributed sites. Assuming that hardware will fail, Scale Out This type of storage supports Automatic replication deliver a high level of data infrastructure is built (on commodity Omni-protocol support protection and resiliency. It also eliminates hot hardware) and designed to deal with a higher (reconfigurable bit-stream spots, I/O bottlenecks and latency, hence offering Accessibility / rate of hardware failures. processing in high speed very fast access to information. Availability network). It also supports wide choice of protocols and rich application ecosystem. Scale-out NAS for big data is intelligent Object storage provides Hyperscale gives stripped storage which provides enough to leverage all the resources in highest efficiency with low rapid and efficient expansion to handle big data in Efficiency storage system, regardless of where they are. overhead, no bandwidth kill different use cases, such as Web serving or for By moving data around it optimizes and lowest management cost. some database applications. performance /capacity. Reliability In Scale Out NAS is achieved by Provides all round data Reliability with hyper scale can be achieved by features like hardware RAID, redundant hot- protection with smart installing number of hyper scale manager servers. Reliability
Recommended publications
  • An Overview of the 50 Most Common Web Scraping Tools
    AN OVERVIEW OF THE 50 MOST COMMON WEB SCRAPING TOOLS WEB SCRAPING IS THE PROCESS OF USING BOTS TO EXTRACT CONTENT AND DATA FROM A WEBSITE. UNLIKE SCREEN SCRAPING, WHICH ONLY COPIES PIXELS DISPLAYED ON SCREEN, WEB SCRAPING EXTRACTS UNDERLYING CODE — AND WITH IT, STORED DATA — AND OUTPUTS THAT INFORMATION INTO A DESIGNATED FILE FORMAT. While legitimate uses cases exist for data harvesting, illegal purposes exist as well, including undercutting prices and theft of copyrighted content. Understanding web scraping bots starts with understanding the diverse and assorted array of web scraping tools and existing platforms. Following is a high-level overview of the 50 most common web scraping tools and platforms currently available. PAGE 1 50 OF THE MOST COMMON WEB SCRAPING TOOLS NAME DESCRIPTION 1 Apache Nutch Apache Nutch is an extensible and scalable open-source web crawler software project. A-Parser is a multithreaded parser of search engines, site assessment services, keywords 2 A-Parser and content. 3 Apify Apify is a Node.js library similar to Scrapy and can be used for scraping libraries in JavaScript. Artoo.js provides script that can be run from your browser’s bookmark bar to scrape a website 4 Artoo.js and return the data in JSON format. Blockspring lets users build visualizations from the most innovative blocks developed 5 Blockspring by engineers within your organization. BotScraper is a tool for advanced web scraping and data extraction services that helps 6 BotScraper organizations from small and medium-sized businesses. Cheerio is a library that parses HTML and XML documents and allows use of jQuery syntax while 7 Cheerio working with the downloaded data.
    [Show full text]
  • Hard Disk Drives
    37 Hard Disk Drives The last chapter introduced the general concept of an I/O device and showed you how the OS might interact with such a beast. In this chapter, we dive into more detail about one device in particular: the hard disk drive. These drives have been the main form of persistent data storage in computer systems for decades and much of the development of file sys- tem technology (coming soon) is predicated on their behavior. Thus, it is worth understanding the details of a disk’s operation before building the file system software that manages it. Many of these details are avail- able in excellent papers by Ruemmler and Wilkes [RW92] and Anderson, Dykes, and Riedel [ADR03]. CRUX: HOW TO STORE AND ACCESS DATA ON DISK How do modern hard-disk drives store data? What is the interface? How is the data actually laid out and accessed? How does disk schedul- ing improve performance? 37.1 The Interface Let’s start by understanding the interface to a modern disk drive. The basic interface for all modern drives is straightforward. The drive consists of a large number of sectors (512-byte blocks), each of which can be read or written. The sectors are numbered from 0 to n − 1 on a disk with n sectors. Thus, we can view the disk as an array of sectors; 0 to n − 1 is thus the address space of the drive. Multi-sector operations are possible; indeed, many file systems will read or write 4KB at a time (or more). However, when updating the disk, the only guarantee drive manufacturers make is that a single 512-byte write is atomic (i.e., it will either complete in its entirety or it won’t com- plete at all); thus, if an untimely power loss occurs, only a portion of a larger write may complete (sometimes called a torn write).
    [Show full text]
  • Data and Computer Communications (Eighth Edition)
    DATA AND COMPUTER COMMUNICATIONS Eighth Edition William Stallings Upper Saddle River, New Jersey 07458 Library of Congress Cataloging-in-Publication Data on File Vice President and Editorial Director, ECS: Art Editor: Gregory Dulles Marcia J. Horton Director, Image Resource Center: Melinda Reo Executive Editor: Tracy Dunkelberger Manager, Rights and Permissions: Zina Arabia Assistant Editor: Carole Snyder Manager,Visual Research: Beth Brenzel Editorial Assistant: Christianna Lee Manager, Cover Visual Research and Permissions: Executive Managing Editor: Vince O’Brien Karen Sanatar Managing Editor: Camille Trentacoste Manufacturing Manager, ESM: Alexis Heydt-Long Production Editor: Rose Kernan Manufacturing Buyer: Lisa McDowell Director of Creative Services: Paul Belfanti Executive Marketing Manager: Robin O’Brien Creative Director: Juan Lopez Marketing Assistant: Mack Patterson Cover Designer: Bruce Kenselaar Managing Editor,AV Management and Production: Patricia Burns ©2007 Pearson Education, Inc. Pearson Prentice Hall Pearson Education, Inc. Upper Saddle River, NJ 07458 All rights reserved. No part of this book may be reproduced in any form or by any means, without permission in writing from the publisher. Pearson Prentice Hall™ is a trademark of Pearson Education, Inc. All other tradmarks or product names are the property of their respective owners. The author and publisher of this book have used their best efforts in preparing this book.These efforts include the development, research, and testing of the theories and programs to determine their effectiveness.The author and publisher make no warranty of any kind, expressed or implied, with regard to these programs or the documentation contained in this book.The author and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.
    [Show full text]
  • Nasdeluxe Z-Series
    NASdeluxe Z-Series Benefit from scalable ZFS data storage By partnering with Starline and with Starline Computer’s NASdeluxe Open-E, you receive highly efficient Z-series and Open-E JovianDSS. This and reliable storage solutions that software-defined storage solution is offer: Enhanced Storage Performance well-suited for a wide range of applica- tions. It caters perfectly to the needs • Great adaptability Tiered RAM and SSD cache of enterprises that are looking to de- • Tiered and all-flash storage Data integrity check ploy a flexible storage configuration systems which can be expanded to a high avail- Data compression and in-line • High IOPS through RAM and SSD ability cluster. Starline and Open-E can data deduplication caching look back on a strategic partnership of Thin provisioning and unlimited • Superb expandability with more than 10 years. As the first part- number of snapshots and clones ner with a Gold partnership level, Star- Starline’s high-density JBODs – line has always been working hand in without downtime Simplified management hand with Open-E to develop and de- Flexible scalability liver innovative data storage solutions. Starline’s NASdeluxe Z-Series offers In fact, Starline supports worldwide not only great features, but also great Hardware independence enterprises in managing and pro- flexibility – thanks to its modular archi- tecting their storage, with over 2,800 tecture. Open-E installations to date. www.starline.de Z-Series But even with a standard configuration with nearline HDDs IOPS and SSDs for caching, you will be able to achieve high IOPS 250 000 at a reasonable cost.
    [Show full text]
  • Use External Storage Devices Like Pen Drives, Cds, and Dvds
    External Intel® Learn Easy Steps Activity Card Storage Devices Using external storage devices like Pen Drives, CDs, and DVDs loading Videos Since the advent of computers, there has been a need to transfer data between devices and/or store them permanently. You may want to look at a file that you have created or an image that you have taken today one year later. For this it has to be stored somewhere securely. Similarly, you may want to give a document you have created or a digital picture you have taken to someone you know. There are many ways of doing this – online and offline. While online data transfer or storage requires the use of Internet, offline storage can be managed with minimum resources. The only requirement in this case would be a storage device. Earlier data storage devices used to mainly be Floppy drives which had a small storage space. However, with the development of computer technology, we today have pen drives, CD/DVD devices and other removable media to store and transfer data. With these, you store/save/copy files and folders containing data, pictures, videos, audio, etc. from your computer and even transfer them to another computer. They are called secondary storage devices. To access the data stored in these devices, you have to attach them to a computer and access the stored data. Some of the examples of external storage devices are- Pen drives, CDs, and DVDs. Introduction to Pen Drive/CD/DVD A pen drive is a small self-powered drive that connects to a computer directly through a USB port.
    [Show full text]
  • Nanotechnology Trends in Nonvolatile Memory Devices
    IBM Research Nanotechnology Trends in Nonvolatile Memory Devices Gian-Luca Bona [email protected] IBM Research, Almaden Research Center © 2008 IBM Corporation IBM Research The Elusive Universal Memory © 2008 IBM Corporation IBM Research Incumbent Semiconductor Memories SRAM Cost NOR FLASH DRAM NAND FLASH Attributes for universal memories: –Highest performance –Lowest active and standby power –Unlimited Read/Write endurance –Non-Volatility –Compatible to existing technologies –Continuously scalable –Lowest cost per bit Performance © 2008 IBM Corporation IBM Research Incumbent Semiconductor Memories SRAM Cost NOR FLASH DRAM NAND FLASH m+1 SLm SLm-1 WLn-1 WLn WLn+1 A new class of universal storage device : – a fast solid-state, nonvolatile RAM – enables compact, robust storage systems with solid state reliability and significantly improved cost- performance Performance © 2008 IBM Corporation IBM Research Non-volatile, universal semiconductor memory SL m+1 SL m SL m-1 WL n-1 WL n WL n+1 Everyone is looking for a dense (cheap) crosspoint memory. It is relatively easy to identify materials that show bistable hysteretic behavior (easily distinguishable, stable on/off states). IBM © 2006 IBM Corporation IBM Research The Memory Landscape © 2008 IBM Corporation IBM Research IBM Research Histogram of Memory Papers Papers presented at Symposium on VLSI Technology and IEDM; Ref.: G. Burr et al., IBM Journal of R&D, Vol.52, No.4/5, July 2008 © 2008 IBM Corporation IBM Research IBM Research Emerging Memory Technologies Memory technology remains an
    [Show full text]
  • Research Data Management Best Practices
    Research Data Management Best Practices Introduction ............................................................................................................................................................................ 2 Planning & Data Management Plans ...................................................................................................................................... 3 Naming and Organizing Your Files .......................................................................................................................................... 6 Choosing File Formats ............................................................................................................................................................. 9 Working with Tabular Data ................................................................................................................................................... 10 Describing Your Data: Data Dictionaries ............................................................................................................................... 12 Describing Your Project: Citation Metadata ......................................................................................................................... 15 Preparing for Storage and Preservation ............................................................................................................................... 17 Choosing a Repository .........................................................................................................................................................
    [Show full text]
  • Data Management, Analysis Tools, and Analysis Mechanics
    Chapter 2 Data Management, Analysis Tools, and Analysis Mechanics This chapter explores different tools and techniques for handling data for research purposes. This chapter assumes that a research problem statement has been formulated, research hypotheses have been stated, data collection planning has been conducted, and data have been collected from various sources (see Volume I for information and details on these phases of research). This chapter discusses how to combine and manage data streams, and how to use data management tools to produce analytical results that are error free and reproducible, once useful data have been obtained to accomplish the overall research goals and objectives. Purpose of Data Management Proper data handling and management is crucial to the success and reproducibility of a statistical analysis. Selection of the appropriate tools and efficient use of these tools can save the researcher numerous hours, and allow other researchers to leverage the products of their work. In addition, as the size of databases in transportation continue to grow, it is becoming increasingly important to invest resources into the management of these data. There are a number of ancillary steps that need to be performed both before and after statistical analysis of data. For example, a database composed of different data streams needs to be matched and integrated into a single database for analysis. In addition, in some cases data must be transformed into the preferred electronic format for a variety of statistical packages. Sometimes, data obtained from “the field” must be cleaned and debugged for input and measurement errors, and reformatted. The following sections discuss considerations for developing an overall data collection, handling, and management plan, and tools necessary for successful implementation of that plan.
    [Show full text]
  • Can We Store the Whole World's Data in DNA Storage?
    Can We Store the Whole World’s Data in DNA Storage? Bingzhe Li†, Nae Young Song†, Li Ou‡, and David H.C. Du† †Department of Computer Science and Engineering, University of Minnesota, Twin Cities ‡Department of Pediatrics, University of Minnesota, Twin Cities {lixx1743, song0455, ouxxx045, du}@umn.edu, Abstract DNA storage can achieve a theoretical density of 455 EB/g [9] and has a long-lasting property of several centuries [10,11]. The total amount of data in the world has been increasing These characteristics of DNA storage make it a great candi- rapidly. However, the increase of data storage capacity is date for archival storage. Many research studies focused on much slower than that of data generation. How to store and several research directions including encoding/decoding asso- archive such a huge amount of data becomes critical and ciated with error correction schemes [11–18], DNA storage challenging. Synthetic Deoxyribonucleic Acid (DNA) storage systems with microfluidic platforms [19–21], and applications is one of the promising candidates with high density and long- such as database on top of DNA storage [9]. Moreover, sev- term preservation for archival storage systems. The existing eral survey papers [22,23] on DNA storage mainly focused works have focused on the achievable feasibility of a small on the technology reviews of how to store data in DNA (in amount of data when using DNA as storage. In this paper, vivo or in vitro) including the encoding/decoding and synthe- we investigate the scalability and potentials of DNA storage sis/sequencing processes. In fact, the major focus of these when a huge amount of data, like all available data from the studies was to demonstrate the feasibility of DNA storage world, is to be stored.
    [Show full text]
  • The Future of DNA Data Storage the Future of DNA Data Storage
    The Future of DNA Data Storage The Future of DNA Data Storage September 2018 A POTOMAC INSTITUTE FOR POLICY STUDIES REPORT AC INST M IT O U T B T The Future O E P F O G S R IE of DNA P D O U Data LICY ST Storage September 2018 NOTICE: This report is a product of the Potomac Institute for Policy Studies. The conclusions of this report are our own, and do not necessarily represent the views of our sponsors or participants. Many thanks to the Potomac Institute staff and experts who reviewed and provided comments on this report. © 2018 Potomac Institute for Policy Studies Cover image: Alex Taliesen POTOMAC INSTITUTE FOR POLICY STUDIES 901 North Stuart St., Suite 1200 | Arlington, VA 22203 | 703-525-0770 | www.potomacinstitute.org CONTENTS EXECUTIVE SUMMARY 4 Findings 5 BACKGROUND 7 Data Storage Crisis 7 DNA as a Data Storage Medium 9 Advantages 10 History 11 CURRENT STATE OF DNA DATA STORAGE 13 Technology of DNA Data Storage 13 Writing Data to DNA 13 Reading Data from DNA 18 Key Players in DNA Data Storage 20 Academia 20 Research Consortium 21 Industry 21 Start-ups 21 Government 22 FORECAST OF DNA DATA STORAGE 23 DNA Synthesis Cost Forecast 23 Forecast for DNA Data Storage Tech Advancement 28 Increasing Data Storage Density in DNA 29 Advanced Coding Schemes 29 DNA Sequencing Methods 30 DNA Data Retrieval 31 CONCLUSIONS 32 ENDNOTES 33 Executive Summary The demand for digital data storage is currently has been developed to support applications in outpacing the world’s storage capabilities, and the life sciences industry and not for data storage the gap is widening as the amount of digital purposes.
    [Show full text]
  • Computer Files & Data Storage
    STORAGE & FILE CONCEPTS, UTILITIES (Pages 6, 150-158 - Discovering Computers & Microsoft Office 2010) I. Computer files – data, information or instructions residing on secondary storage are stored in the form of a file. A. Software files are also called program files. Program files (instructions) are created by a computer programmer and generally cannot be modified by a user. It’s important that we not move or delete program files because your computer requires them to perform operations. Program files are also referred to as “executables”. 1. You can identify a program file by its extension:“.EXE”, “.COM”, “.BAT”, “.DLL”, “.SYS”, or “.INI” (there are others) or a distinct program icon. B. Data files - when you select a “save” option while using an application program, you are in essence creating a data file. Users create data files. 1. File naming conventions refer to the guidelines followed while assigning file names and will vary with the operating system and application in use (see figure 4-1). File names in Windows 7 may be up to 255 characters, you're not allowed to use reserved characters or certain reserved words. File extensions are used to identify the application that was used to create the file and format data in a manner recognized by the source application used to create it. FALL 2012 1 II. Selecting secondary storage media A. There are three type of technologies for storage devices: magnetic, optical, & solid state, there are advantages & disadvantages between them. When selecting a secondary storage device, certain factors should be considered: 1. Capacity - the capacity of computer storage is expressed in bytes.
    [Show full text]
  • Digital Preservation Guide: 3.5-Inch Floppy Disks Caralie Heinrichs And
    DIGITAL PRESERVATION GUIDE: 3.5-Inch Floppy Disks Digital Preservation Guide: 3.5-Inch Floppy Disks Caralie Heinrichs and Emilie Vandal ISI 6354 University of Ottawa Jada Watson Friday, December 13, 2019 DIGITAL PRESERVATION GUIDE 2 Table of Contents Introduction ................................................................................................................................................. 3 History of the Floppy Disk ......................................................................................................................... 3 Where, when, and by whom was it developed? 3 Why was it developed? 4 How Does a 3.5-inch Floppy Disk Work? ................................................................................................. 5 Major parts of a floppy disk 5 Writing data on a floppy disk 7 Preservation and Digitization Challenges ................................................................................................. 8 Physical damage and degradation 8 Hardware and software obsolescence 9 Best Practices ............................................................................................................................................. 10 Storage conditions 10 Description and documentation 10 Creating a disk image 11 Ensuring authenticity: Write blockers 11 Ensuring reliability: Sustainability of the disk image file format 12 Metadata 12 Virus scanning 13 Ensuring integrity: checksums 13 Identifying personal or sensitive information 13 Best practices: Use of hardware and software 14 Hardware
    [Show full text]