Intelligent Storage for HPC: Sun Storagetek QFS and Sun Storagetek Storage Archive Manager

Total Page:16

File Type:pdf, Size:1020Kb

Intelligent Storage for HPC: Sun Storagetek QFS and Sun Storagetek Storage Archive Manager Intelligent Storage for HPC: Sun StorageTek QFS and Sun StorageTek Storage Archive Manager Harriet Coverston Distinguished Engineer Sun Microsystems, Inc. May, 2008 Shared QFS and SAM HPC – Why It Matters to Sun • What happens in HPC today often becomes mainstream beyond HPC in the years ahead > HPC drives leading-edge technologies • HPC applications all have similar requirements > Consolidated Storage > Performance & Scalability > Parallel processing • Expect this same phenomenon in data management > HPC is tackling HUGE data requirements/issues • HPC is a growth target at SUN (one of four) Page 2 Shared QFS and SAM Sun's Advanced HPC Data Management Products • Sun StorageTek QFS – SAN File System > High performance parallel file system > Transparent user interface > Production ready > http://www.sun.com/storagetek/management_software/data_management/qfs • Sun StorageTek Storage Archive Manager (SAM) > Policy based automatic data migration and protection > Full device streaming > Tiered storage > http://www.sun.com/storagetek/management_software/data_management/sam Page 3 Shared QFS and SAM Open Source • Sun opened the source for SAM and Shared QFS, metadata server and clients > http://opensolaris.org/os/project/samqfs • Plans are to post SAM and Shared QFS 5.0 development code periodically > We are developing in the open > We are expanding our community • Sun opened Libsam, the APIs which allow the user to manage data in a SAM-QFS file system from within an application program > http://developers.sun.com/solaris/articles/libsam.html Page 4 Shared QFS and SAM Shared QFS (SQFS) • Large, existing, and loyal customer base > Stable base, shipping since Aug 2002 • Targets large enterprises, grid, and HPC > Clients run on Solaris (SPARC x64 & x86) & Linux > Metadata server runs on Solaris (SPARC & x64) > HA option with SunCluster • Built in HSM with SAM with optional WORM functionality for business compliance • SQFS currently supports 256 nodes • Next release, SQFS will support thousands of nodes > Targets HPC clusters Page 5 Shared QFS and SAM Shared QFS with SAM Integrated Data Management Client Client Client Client n ... Data is accessed directly from LAN devices Network SAN Meta Data Server handles name services Block Block and space Meta Data Server Storage Storage management Block Block Storage Storage Block Block Meta data path Storage Storage Data path SL500 (Tape Library) Page 6 Shared QFS and SAM Shared QFS Customer Benefits • Data consolidation with SAN file sharing > HBO – 5000 hours of programming to manage > “Provided the scalability to store and manage large files created by program-length video with the performance necessary to meet HBO's demanding throughput goals” http://www.sun.com/customers/storage/hbo.xml • Performance and scalability > Tune file system to the application > Near raw I/O performance > File system I/O performance scales linearly with the hardware • Parallel processing w/ multi-node read/write access • SAM provides automatic & continuous data protection Page 7 Shared QFS and SAM Shared QFS Certified w/SunCluster • SunCluster HA failover support > Standalone QFS > HA-NFS over QFS > Shared QFS Metadata Server failover > Supports clients outside the cluster • Oracle RAC runs on Shared QFS with SunCluster for high availability > Oracle certified on 9i and 10g > Shared QFS license is free for this configuration • Shared QFS transactional performance matches raw Page 8 Shared QFS and SAM SAM Customer Benefits • Policy based automatic data migration > Media can be disk, tape, or optical > Local or remote copies > Classification is path, owner, group, size, wildcard & access • Media format is tar > Small files are put into a tar container so data is streamed at device speeds out to the tape • Keeps all data available, but not on high cost storage > Moves data across the tiers according to access patterns • On-demand, transparent file retrieval • Continuous backup – no waiting until midnight Page 9 Shared QFS and SAM Policy Driven Sam Processes ● Transparently Archive from disk cache to removable media without operator intervention based on policies – Time based archiving ● Manage disk space and Release archived files from disk cache based on policies ● Automatically Stage released files back to disk cache when accessed – Option to pre-stage and option to bypass disk cache ● Recycle removable media by repacking media Page 10 Shared QFS and SAM SAM Migration Facility • Move from foreign HSM to SAM > Import metadata into a SAM-QFS file system > Copy foreign HSM data to SAM in the background > Production up and active during the migration process • Migrated German Weather from Amass/DataMgr to SAM – Sun partnered with reseller HMK > Moved 10 million files into a SAM-QFS in 8 hours > Successfully migrated 700+ TB of Amass/DataMgr to SAM in 161 days > Production operational during migration • Migrations for DMF, UniTree, Amass, and Veritas Page 11 Shared QFS and SAM Support for Monitoring SAM and QFS • The monitoring console(shown here) lets admins quickly understand their SAM environment – Potential trouble spots are indicated by severity icons in the left hand panel. • e-mail notifications can be configured to alert admins of problems with file systems, archiving and archive media • System metrics provide archive media reports and file data distribution charts • Faults provide a record of adverse conditions that have occurred in the system (including tape alerts) Page 12 Shared QFS and SAM SAM's Archives are OPEN! • Media format is open, not proprietary – tar format > Files can be recovered with or without SAM – our media format is open, NOT proprietary • Metadata about the data is on the archives > If file system metadata is lost, the archives can be recovered with a procedure we call the “Ultimate Disaster Recovery” Page 13 Shared QFS and SAM Store Data Forever! Future-proof Data Storage for Data Preservation > Archive files are self-describing, standard > No lock-in, open TAR format > Move data to newer media overtime, transparently Page 14 QFS Scalability for HPC 15 Shared QFS and SAM Coping with HPC Storage Complexity • Increasing bandwidth requirements > 2GB/sec per TF = 2TB/sec peak for a petaflop machine > 1TB/sec sustained I/O bandwidth for a petaflop machine • Increasing demand for metadata ops > Finding any one file among trillions of files > Finding anything in the petabytes of data (data mining) • Extreme concurrency > HPC compute scaling means more processors which means more concurrent threads which means more concurrent I/O requests • Seek Efficiency > Disk drive latency is the about the same as 1990 drives and this is the bottleneck Page 16 Shared QFS and SAM Intelligent Storage for HPC • Intelligent, secure storage (T10/1729-D OSD-2) helps solve storage complexities > Move higher-level storage functions out to the devices > Execute these functions in parallel in order to scale > Support secure client access because credentials are checked on every access • Data aware intelligent storage can support > Object caching and pre-fetching > Object seach > Object repair • AND most important, it is a standard (ANSI T10)! Page 17 Shared QFS and SAM Object Storage • Distributes space allocation > Allows bandwidth to scale up with capacity increases • Knowledge of the data pushed to object storage device (OSD) > Better resource utilization in multi-host configurations > Key to multi-host quality-of-service policies > Guaranteed rate I/O, guaranteed latency, etc. • Security at the OSD • Standards based (ANSI T10/1355) Page 18 Shared QFS and SAM Scale SQFS with Intelligent Storage • T10/OSD provides standards-based intelligent object-based storage > Version 1 ratified 2004; version 2 expected soon • Intelligent storage increases horizontal scale > Space allocation moves to the storage nodes > Space allocation is done in parallel by the storage nodes > Bandwidth scales up as capacity increases • Current roadmap plans include releasing Shared QFS with object-based storage end of this year in an OpenSolaris HPC distro > Plans are to support 1000s of nodes Page 19 Shared QFS and SAM History of T10 OSD at Sun • May, 2004. Speaker at DTC, “Intelligent Storage in Commercial HPC” • May, 2006. Speaker at DTC, “Object Storage at Sun” > DARPA Phase 2 filesystem Projects included T10 OSD petascale distributed file system, Storage Pools, and Archive Metadata Database for semantic access. • May, 2007. Speaker at DTC, “OpenSolaris T10 OSD Reference Implementation” • May, 2008. Speaker at DTC, “Intelligent Storage for HPC” Page 20 Shared QFS and SAM RDMA for InfiniBand • Storage access bandwidth, overhead, and latency are all limited by iSCSI over TCP/IP • InfiniBand RDMA increases the bandwidth and reduces overhead • Current roadmap plans include releasing InfiniBand RDMA support for Shared QFS end of this year, 2008 > iSER initiator and target supported in Solaris Page 21 Shared QFS and SAM Scale Shared QFS Page 22 Shared QFS and SAM Scale Shared QFS Page 23 Shared QFS and SAM Intelligent Storage Components INITIATOR TARGET • Shared QFS enhanced • ISCSI target enhanced to support T10 OSD, for iSER RDMA new file system type • Common Multiprotocol “mb” SCSI Target • Solaris initiator device (COMSTAR) driver, sosd • T10 LUN provider • Sun Command SCSI • Object QFS file system, Architecture allocators only, no • iSCSI enhanced for namespace, new file iSER RMDA system type “mat” Page 24 Shared QFS and SAM OpenSolaris.org Projects • All of the object-based storage projects are being developed in the open: • SAM-QFS
Recommended publications
  • The Parallel File System Lustre
    The parallel file system Lustre Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT – University of the State Rolandof Baden Laifer-Württemberg – Internal and SCC Storage Workshop National Laboratory of the Helmholtz Association www.kit.edu Overview Basic Lustre concepts Lustre status Vendors New features Pros and cons INSTITUTSLustre-, FAKULTÄTS systems-, ABTEILUNGSNAME at (inKIT der Masteransicht ändern) Complexity of underlying hardware Remarks on Lustre performance 2 16.4.2014 Roland Laifer – Internal SCC Storage Workshop Steinbuch Centre for Computing Basic Lustre concepts Client ClientClient Directory operations, file open/close File I/O & file locking metadata & concurrency INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Recovery,Masteransicht ändern)file status, Metadata Server file creation Object Storage Server Lustre componets: Clients offer standard file system API (POSIX) Metadata servers (MDS) hold metadata, e.g. directory data, and store them on Metadata Targets (MDTs) Object Storage Servers (OSS) hold file contents and store them on Object Storage Targets (OSTs) All communicate efficiently over interconnects, e.g. with RDMA 3 16.4.2014 Roland Laifer – Internal SCC Storage Workshop Steinbuch Centre for Computing Lustre status (1) Huge user base about 70% of Top100 use Lustre Lustre HW + SW solutions available from many vendors: DDN (via resellers, e.g. HP, Dell), Xyratex – now Seagate (via resellers, e.g. Cray, HP), Bull, NEC, NetApp, EMC, SGI Lustre is Open Source INSTITUTS-, LotsFAKULTÄTS of organizational-, ABTEILUNGSNAME
    [Show full text]
  • A Fog Storage Software Architecture for the Internet of Things Bastien Confais, Adrien Lebre, Benoît Parrein
    A Fog storage software architecture for the Internet of Things Bastien Confais, Adrien Lebre, Benoît Parrein To cite this version: Bastien Confais, Adrien Lebre, Benoît Parrein. A Fog storage software architecture for the Internet of Things. Advances in Edge Computing: Massive Parallel Processing and Applications, IOS Press, pp.61-105, 2020, Advances in Parallel Computing, 978-1-64368-062-0. 10.3233/APC200004. hal- 02496105 HAL Id: hal-02496105 https://hal.archives-ouvertes.fr/hal-02496105 Submitted on 2 Mar 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. November 2019 A Fog storage software architecture for the Internet of Things Bastien CONFAIS a Adrien LEBRE b and Benoˆıt PARREIN c;1 a CNRS, LS2N, Polytech Nantes, rue Christian Pauc, Nantes, France b Institut Mines Telecom Atlantique, LS2N/Inria, 4 Rue Alfred Kastler, Nantes, France c Universite´ de Nantes, LS2N, Polytech Nantes, Nantes, France Abstract. The last prevision of the european Think Tank IDATE Digiworld esti- mates to 35 billion of connected devices in 2030 over the world just for the con- sumer market. This deep wave will be accompanied by a deluge of data, applica- tions and services.
    [Show full text]
  • Evaluation of Active Storage Strategies for the Lustre Parallel File System
    Evaluation of Active Storage Strategies for the Lustre Parallel File System Juan Piernas Jarek Nieplocha Evan J. Felix Pacific Northwest National Pacific Northwest National Pacific Northwest National Laboratory Laboratory Laboratory P.O. Box 999 P.O. Box 999 P.O. Box 999 Richland, WA 99352 Richland, WA 99352 Richland, WA 99352 [email protected] [email protected] [email protected] ABSTRACT umes of data remains a challenging problem. Despite the Active Storage provides an opportunity for reducing the improvements of storage capacities, the cost of bandwidth amount of data movement between storage and compute for moving data between the processing nodes and the stor- nodes of a parallel filesystem such as Lustre, and PVFS. age devices has not improved at the same rate as the disk ca- It allows certain types of data processing operations to be pacity. One approach to reduce the bandwidth requirements performed directly on the storage nodes of modern paral- between storage and compute devices is, when possible, to lel filesystems, near the data they manage. This is possible move computation closer to the storage devices. Similarly by exploiting the underutilized processor and memory re- to the processing-in-memory (PIM) approach for random ac- sources of storage nodes that are implemented using general cess memory [16], the active disk concept was proposed for purpose servers and operating systems. In this paper, we hard disk storage systems [1, 15, 24]. The active disk idea present a novel user-space implementation of Active Storage exploits the processing power of the embedded hard drive for Lustre, and compare it to the traditional kernel-based controller to process the data on the disk without the need implementation.
    [Show full text]
  • On the Performance Variation in Modern Storage Stacks
    On the Performance Variation in Modern Storage Stacks Zhen Cao1, Vasily Tarasov2, Hari Prasath Raman1, Dean Hildebrand2, and Erez Zadok1 1Stony Brook University and 2IBM Research—Almaden Appears in the proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17) Abstract tions on different machines have to compete for heavily shared resources, such as network switches [9]. Ensuring stable performance for storage stacks is im- In this paper we focus on characterizing and analyz- portant, especially with the growth in popularity of ing performance variations arising from benchmarking hosted services where customers expect QoS guaran- a typical modern storage stack that consists of a file tees. The same requirement arises from benchmarking system, a block layer, and storage hardware. Storage settings as well. One would expect that repeated, care- stacks have been proven to be a critical contributor to fully controlled experiments might yield nearly identi- performance variation [18, 33, 40]. Furthermore, among cal performance results—but we found otherwise. We all system components, the storage stack is the corner- therefore undertook a study to characterize the amount stone of data-intensive applications, which become in- of variability in benchmarking modern storage stacks. In creasingly more important in the big data era [8, 21]. this paper we report on the techniques used and the re- Although our main focus here is reporting and analyz- sults of this study. We conducted many experiments us- ing the variations in benchmarking processes, we believe ing several popular workloads, file systems, and storage that our observations pave the way for understanding sta- devices—and varied many parameters across the entire bility issues in production systems.
    [Show full text]
  • Lustre* Software Release 2.X Operations Manual Lustre* Software Release 2.X: Operations Manual Copyright © 2010, 2011 Oracle And/Or Its Affiliates
    Lustre* Software Release 2.x Operations Manual Lustre* Software Release 2.x: Operations Manual Copyright © 2010, 2011 Oracle and/or its affiliates. (The original version of this Operations Manual without the Intel modifications.) Copyright © 2011, 2012, 2013 Intel Corporation. (Intel modifications to the original version of this Operations Man- ual.) Notwithstanding Intel’s ownership of the copyright in the modifications to the original version of this Operations Manual, as between Intel and Oracle, Oracle and/or its affiliates retain sole ownership of the copyright in the unmodified portions of this Operations Manual. Important Notice from Intel INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IM- PLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSO- EVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR IN- FRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL IN- DEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE AT- TORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCON- TRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
    [Show full text]
  • Parallel File Systems for HPC Introduction to Lustre
    Parallel File Systems for HPC Introduction to Lustre Piero Calucci Scuola Internazionale Superiore di Studi Avanzati Trieste November 2008 Advanced School in High Performance and Grid Computing Outline 1 The Need for Shared Storage 2 The Lustre File System 3 Other Parallel File Systems Parallel File Systems for HPC Cluster & Storage Piero Calucci Shared Storage Lustre Other Parallel File Systems A typical cluster setup with a Master node, several computing nodes and shared storage. Nodes have little or no local storage. Parallel File Systems for HPC Cluster & Storage Piero Calucci The Old-Style Solution Shared Storage Lustre Other Parallel File Systems • a single storage server quickly becomes a bottleneck • if the cluster grows in time (quite typical for initially small installations) storage requirements also grow, sometimes at a higher rate • adding space (disk) is usually easy • adding speed (both bandwidth and IOpS) is hard and usually involves expensive upgrade of existing hardware • e.g. you start with an NFS box with a certain amount of disk space, memory and processor power, then add disks to the same box Parallel File Systems for HPC Cluster & Storage Piero Calucci The Old-Style Solution /2 Shared Storage Lustre • e.g. you start with an NFS box with a certain amount of Other Parallel disk space, memory and processor power File Systems • adding space is just a matter of plugging in some more disks, ot ar worst adding a new controller with an external port to connect external disks • but unless you planned for
    [Show full text]
  • Hierarchical Storage Management with SAM-FS
    Hierarchical Storage Management with SAM-FS Achim Neumann CeBiTec / Bielefeld University Agenda ● Information Lifecycle Management ● Backup – Technology of the past ● Filesystems, SAM-FS, QFS and SAM-QFS ● Main Functions of SAM ● SAM and Backup ● Rating, Perspective and Literature June 13, 2006 Achim Neumann 2 ILM – Information Lifecylce Management ● 5 Exabyte data per year (92 % digital, 8 % analog) ● 20% fixed media (disk), 80% removable media ● Managing data from Creation to Cremation (and everything that is between (by courtesy of Sun Microsystems Inc). June 13, 2006 Achim Neumann 3 Different classes of Storagesystems (by courtesy of Sun Microsystems Inc). June 13, 2006 Achim Neumann 4 Backup – Technology of the past ● Backup – protect data from data loss ● Backup is not the problem ● Ability to restore data when needed − Disaster − Hardware failure − User error − Old versions from a file June 13, 2006 Achim Neumann 5 Backup – Technology of the past (cont.) ● Triple data in 2 – 3 years ● Smaller backup window ☞ more data in less time Stream for # Drives for # Drives for #TB 24 h 24 h 6 h 1 12,1 MB/s 2 5 10 121 MB/s 13 49 30 364 MB/s 37 146 June 13, 2006 Achim Neumann 6 Backup – Technology of the past (cont.) ● Limited number of tape drives in library ● Limited number of I/O Channels in server ● Limited scalability of backup software ● Inability to keep all tape drives streaming at one time ● At least every full backup has a copy of the same file ● It's not unusual to have 8 or 10 copies of the same file ☞ excessive media usage and
    [Show full text]
  • Lustre* with ZFS* SC16 Presentation
    Lustre* with ZFS* Keith Mannthey, Lustre Solutions Architect Intel High Performance Data Division Legal Information • All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps • Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. • Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at http://www.intel.com/content/www/us/en/software/intel-solutions-for-lustre-software.html. • Intel technologies may require enabled hardware, specific software, or services activation. Check with your system manufacturer or retailer. • You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. • No license (express or implied, by estoppel or otherwise)
    [Show full text]
  • Decentralising Big Data Processing Scott Ross Brisbane
    School of Computer Science and Engineering Faculty of Engineering The University of New South Wales Decentralising Big Data Processing by Scott Ross Brisbane Thesis submitted as a requirement for the degree of Bachelor of Engineering (Software) Submitted: October 2016 Student ID: z3459393 Supervisor: Dr. Xiwei Xu Topic ID: 3692 Decentralising Big Data Processing Scott Ross Brisbane Abstract Big data processing and analysis is becoming an increasingly important part of modern society as corporations and government organisations seek to draw insight from the vast amount of data they are storing. The traditional approach to such data processing is to use the popular Hadoop framework which uses HDFS (Hadoop Distributed File System) to store and stream data to analytics applications written in the MapReduce model. As organisations seek to share data and results with third parties, HDFS remains inadequate for such tasks in many ways. This work looks at replacing HDFS with a decentralised data store that is better suited to sharing data between organisations. The best fit for such a replacement is chosen to be the decentralised hypermedia distribution protocol IPFS (Interplanetary File System), that is built around the aim of connecting all peers in it's network with the same set of content addressed files. ii Scott Ross Brisbane Decentralising Big Data Processing Abbreviations API Application Programming Interface AWS Amazon Web Services CLI Command Line Interface DHT Distributed Hash Table DNS Domain Name System EC2 Elastic Compute Cloud FTP File Transfer Protocol HDFS Hadoop Distributed File System HPC High-Performance Computing IPFS InterPlanetary File System IPNS InterPlanetary Naming System SFTP Secure File Transfer Protocol UI User Interface iii Decentralising Big Data Processing Scott Ross Brisbane Contents 1 Introduction 1 2 Background 3 2.1 The Hadoop Distributed File System .
    [Show full text]
  • Petascale Data Management: Guided by Measurement
    Petascale Data Management: Guided by Measurement petascale data storage institute www.pdsi-scidac.org/ MPP2 www.pdsi-scidac.org MEMBER ORGANIZATIONS & Lustre • Los Alamos National Laboratory – institute.lanl.gov/pdsi/ • Parallel Data Lab, Carnegie Mellon University – www.pdl.cmu.edu/ • Oak Ridge National Laboratory – www.csm.ornl.gov/ • Sandia National Laboratories – www.sandia.gov/ • National Energy Research Scientific Computing Center • Center for Information Technology Integration, U. of Michigan pdsi.nersc.gov/ www.citi.umich.edu/projects/pdsi/ • Pacific Northwest National Laboratory – www.pnl.gov/ • University of California at Santa Cruz – www.pdsi.ucsc.edu/ The Computer Failure Data Repository Filesystems Statistics Survey • Goal: to collect and make available failure data from a large variety of sites GOALS • Better understanding of the characteristics of failures in the real world • Gather & build large DB of static filetree summary • Now maintained by USENIX at cfdr.usenix.org/ • Build small, non-invasive, anonymizing stats gather tool • Distribute fsstats tool via easily used web site Red Storm NAME SYSTEM TYPE SYSTEM SIZE TIME PERIOD TYPE OF DATA • Encourage contributions (output of tool) from many FSs Any node • Offer uploaded statistics & summaries to public & Lustre 22 HPC clusters 5000 nodes 9 years outage . Label Date Type File Total Size Total Space # files # dirs max size max space max dir max name avg file avg dir . 765 nodes (2008) System TB TB M K GB GB ents bytes MB ents . 1 HPC cluster 5 years PITTSBURGH 3,400 disks
    [Show full text]
  • Understanding Lustre Filesystem Internals
    ORNL/TM-2009/117 Understanding Lustre Filesystem Internals April 2009 Prepared by Feiyi Wang Sarp Oral Galen Shipman National Center for Computational Sciences Oleg Drokin Tom Wang Isaac Huang Sun Microsystems Inc. DOCUMENT AVAILABILITY Reports produced after January 1, 1996, are generally available free via the U.S. Department of Energy (DOE) Information Bridge. Web site http://www.osti.gov/bridge Reports produced before January 1, 1996, may be purchased by members of the public from the following source. National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone 703-605-6000 (1-800-553-6847) TDD 703-487-4639 Fax 703-605-6900 E-mail [email protected] Web site http://www.ntis.gov/support/ordernowabout.htm Reports are available to DOE employees, DOE contractors, Energy Technology Data Exchange (ETDE) representatives, and International Nuclear Information System (INIS) representatives from the following source. Office of Scientific and Technical Information P.O. Box 62 Oak Ridge, TN 37831 Telephone 865-576-8401 Fax 865-576-5728 E-mail [email protected] Web site http://www.osti.gov/contact.html This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof.
    [Show full text]
  • Kratka Povijest Unixa Od Unicsa Do Freebsda I Linuxa
    Kratka povijest UNIXa Od UNICSa do FreeBSDa i Linuxa 1 Autor: Hrvoje Horvat Naslov: Kratka povijest UNIXa - Od UNICSa do FreeBSDa i Linuxa Licenca i prava korištenja: Svi imaju pravo koristiti, mijenjati, kopirati i štampati (printati) knjigu, prema pravilima GNU GPL licence. Mjesto i godina izdavanja: Osijek, 2017 ISBN: 978-953-59438-0-8 (PDF-online) URL publikacije (PDF): https://www.opensource-osijek.org/knjige/Kratka povijest UNIXa - Od UNICSa do FreeBSDa i Linuxa.pdf ISBN: 978-953- 59438-1- 5 (HTML-online) DokuWiki URL (HTML): https://www.opensource-osijek.org/dokuwiki/wiki:knjige:kratka-povijest- unixa Verzija publikacije : 1.0 Nakalada : Vlastita naklada Uz pravo svakoga na vlastito štampanje (printanje), prema pravilima GNU GPL licence. Ova knjiga je napisana unutar inicijative Open Source Osijek: https://www.opensource-osijek.org Inicijativa Open Source Osijek je član udruge Osijek Software City: http://softwarecity.hr/ UNIX je registrirano i zaštićeno ime od strane tvrtke X/Open (Open Group). FreeBSD i FreeBSD logo su registrirani i zaštićeni od strane FreeBSD Foundation. Imena i logo : Apple, Mac, Macintosh, iOS i Mac OS su registrirani i zaštićeni od strane tvrtke Apple Computer. Ime i logo IBM i AIX su registrirani i zaštićeni od strane tvrtke International Business Machines Corporation. IEEE, POSIX i 802 registrirani i zaštićeni od strane instituta Institute of Electrical and Electronics Engineers. Ime Linux je registrirano i zaštićeno od strane Linusa Torvaldsa u Sjedinjenim Američkim Državama. Ime i logo : Sun, Sun Microsystems, SunOS, Solaris i Java su registrirani i zaštićeni od strane tvrtke Sun Microsystems, sada u vlasništvu tvrtke Oracle. Ime i logo Oracle su u vlasništvu tvrtke Oracle.
    [Show full text]