Quick viewing(Text Mode)

Intelligent Storage for HPC: Sun Storagetek QFS and Sun Storagetek Storage Archive Manager

Intelligent Storage for HPC: Sun Storagetek QFS and Sun Storagetek Storage Archive Manager

Intelligent Storage for HPC: Sun StorageTek QFS and Sun StorageTek Storage Archive Manager

Harriet Coverston Distinguished Engineer , Inc. May, 2008 Shared QFS and SAM HPC – Why It Matters to Sun • What happens in HPC today often becomes mainstream beyond HPC in the years ahead > HPC drives leading-edge technologies • HPC applications all have similar requirements > Consolidated Storage > Performance & Scalability > Parallel processing • Expect this same phenomenon in data management > HPC is tackling HUGE data requirements/issues • HPC is a growth target at SUN (one of four)

Page 2 Shared QFS and SAM Sun's Advanced HPC Data Management Products • Sun StorageTek QFS – SAN > High performance parallel file system > Transparent user interface > Production ready > http://www.sun.com/storagetek/management_software/data_management/ • Sun StorageTek Storage Archive Manager (SAM) > Policy based automatic data migration and protection > Full device streaming > Tiered storage > http://www.sun.com/storagetek/management_software/data_management/sam

Page 3 Shared QFS and SAM Open Source • Sun opened the source for SAM and Shared QFS, metadata server and clients > http://opensolaris.org/os/project/samqfs • Plans are to post SAM and Shared QFS 5.0 development code periodically > We are developing in the open > We are expanding our community • Sun opened Libsam, the APIs which allow the user to manage data in a SAM-QFS file system from within an application program > http://developers.sun.com/solaris/articles/libsam.html

Page 4 Shared QFS and SAM Shared QFS (SQFS) • Large, existing, and loyal customer base > Stable base, shipping since Aug 2002 • Targets large enterprises, grid, and HPC > Clients run on Solaris (SPARC x64 & x86) & > Metadata server runs on Solaris (SPARC & x64) > HA option with SunCluster • Built in HSM with SAM with optional WORM functionality for business compliance • SQFS currently supports 256 nodes • Next release, SQFS will support thousands of nodes > Targets HPC clusters Page 5 Shared QFS and SAM Shared QFS with SAM Integrated Data Management Client Client Client Client n ...

Data is accessed directly from LAN devices Network SAN

Meta Data Server handles name services Block Block and space Meta Data Server Storage Storage management Block Block Storage Storage Block Block Meta data path Storage Storage Data path SL500 (Tape Library)

Page 6 Shared QFS and SAM Shared QFS Customer Benefits • Data consolidation with SAN file sharing > HBO – 5000 hours of programming to manage > “Provided the scalability to store and manage large files created by program-length video with the performance necessary to meet HBO's demanding throughput goals” http://www.sun.com/customers/storage/hbo.xml • Performance and scalability > Tune file system to the application > Near raw I/O performance > File system I/O performance scales linearly with the hardware • Parallel processing w/ multi-node read/write access • SAM provides automatic & continuous data protection

Page 7 Shared QFS and SAM Shared QFS Certified w/SunCluster • SunCluster HA failover support > Standalone QFS > HA-NFS over QFS > Shared QFS Metadata Server failover > Supports clients outside the cluster • Oracle RAC runs on Shared QFS with SunCluster for high availability > Oracle certified on 9i and 10g > Shared QFS license is free for this configuration • Shared QFS transactional performance matches raw

Page 8 Shared QFS and SAM SAM Customer Benefits • Policy based automatic data migration > Media can be disk, tape, or optical > Local or remote copies > Classification is path, owner, group, size, wildcard & access • Media format is tar > Small files are put into a tar container so data is streamed at device speeds out to the tape • Keeps all data available, but not on high cost storage > Moves data across the tiers according to access patterns • On-demand, transparent file retrieval • Continuous backup – no waiting until midnight Page 9 Shared QFS and SAM Policy Driven Sam Processes

● Transparently Archive from disk cache to removable media without operator intervention based on policies – Time based archiving ● Manage disk space and Release archived files from disk cache based on policies ● Automatically Stage released files back to disk cache when accessed – Option to pre-stage and option to bypass disk cache ● Recycle removable media by repacking media

Page 10 Shared QFS and SAM SAM Migration Facility • Move from foreign HSM to SAM > Import metadata into a SAM-QFS file system > Copy foreign HSM data to SAM in the background > Production up and active during the migration process • Migrated German Weather from Amass/DataMgr to SAM – Sun partnered with reseller HMK > Moved 10 million files into a SAM-QFS in 8 hours > Successfully migrated 700+ TB of Amass/DataMgr to SAM in 161 days > Production operational during migration • Migrations for DMF, UniTree, Amass, and Veritas

Page 11 Shared QFS and SAM Support for Monitoring SAM and QFS

• The monitoring console(shown here) lets admins quickly understand their SAM environment – Potential trouble spots are indicated by severity icons in the left hand panel. • e-mail notifications can be configured to alert admins of problems with file systems, archiving and archive media • System metrics provide archive media reports and file data distribution charts • Faults provide a record of adverse conditions that have occurred in the system (including tape alerts)

Page 12 Shared QFS and SAM SAM's Archives are OPEN! • Media format is open, not proprietary – tar format > Files can be recovered with or without SAM – our media format is open, NOT proprietary • Metadata about the data is on the archives > If file system metadata is lost, the archives can be recovered with a procedure we call the “Ultimate Disaster Recovery”

Page 13 Shared QFS and SAM Store Data Forever! Future-proof Data Storage for Data Preservation > Archive files are self-describing, standard > No lock-in, open TAR format > Move data to newer media overtime, transparently

Page 14 QFS Scalability for HPC

15 Shared QFS and SAM Coping with HPC Storage Complexity • Increasing bandwidth requirements > 2GB/sec per TF = 2TB/sec peak for a petaflop machine > 1TB/sec sustained I/O bandwidth for a petaflop machine • Increasing demand for metadata ops > Finding any one file among trillions of files > Finding anything in the petabytes of data (data mining) • Extreme concurrency > HPC compute scaling means more processors which means more concurrent threads which means more concurrent I/O requests • Seek Efficiency > Disk drive latency is the about the same as 1990 drives and this is the bottleneck Page 16 Shared QFS and SAM Intelligent Storage for HPC • Intelligent, secure storage (T10/1729-D OSD-2) helps solve storage complexities > Move higher-level storage functions out to the devices > Execute these functions in parallel in order to scale > Support secure client access because credentials are checked on every access • Data aware intelligent storage can support > Object caching and pre-fetching > Object seach > Object repair • AND most important, it is a standard (ANSI T10)!

Page 17 Shared QFS and SAM • Distributes space allocation > Allows bandwidth to scale up with capacity increases • Knowledge of the data pushed to object storage device (OSD) > Better resource utilization in multi-host configurations > Key to multi-host quality-of-service policies > Guaranteed rate I/O, guaranteed latency, etc. • Security at the OSD • Standards based (ANSI T10/1355)

Page 18 Shared QFS and SAM Scale SQFS with Intelligent Storage • T10/OSD provides standards-based intelligent object-based storage > Version 1 ratified 2004; version 2 expected soon • Intelligent storage increases horizontal scale > Space allocation moves to the storage nodes > Space allocation is done in parallel by the storage nodes > Bandwidth scales up as capacity increases • Current roadmap plans include releasing Shared QFS with object-based storage end of this year in an OpenSolaris HPC distro > Plans are to support 1000s of nodes

Page 19 Shared QFS and SAM History of T10 OSD at Sun • May, 2004. Speaker at DTC, “Intelligent Storage in Commercial HPC” • May, 2006. Speaker at DTC, “Object Storage at Sun” > DARPA Phase 2 filesystem Projects included T10 OSD petascale distributed file system, Storage Pools, and Archive Metadata Database for semantic access. • May, 2007. Speaker at DTC, “OpenSolaris T10 OSD Reference Implementation” • May, 2008. Speaker at DTC, “Intelligent Storage for HPC”

Page 20 Shared QFS and SAM RDMA for InfiniBand • Storage access bandwidth, overhead, and latency are all limited by iSCSI over TCP/IP • InfiniBand RDMA increases the bandwidth and reduces overhead • Current roadmap plans include releasing InfiniBand RDMA support for Shared QFS end of this year, 2008 > iSER initiator and target supported in Solaris

Page 21 Shared QFS and SAM Scale Shared QFS

Page 22 Shared QFS and SAM Scale Shared QFS

Page 23 Shared QFS and SAM Intelligent Storage Components INITIATOR TARGET • Shared QFS enhanced • ISCSI target enhanced to support T10 OSD, for iSER RDMA new file system type • Common Multiprotocol “mb” SCSI Target • Solaris initiator device (COMSTAR) driver, sosd • T10 LUN provider • Sun Command SCSI • Object QFS file system, Architecture allocators only, no • iSCSI enhanced for namespace, new file iSER RMDA system type “mat” Page 24 Shared QFS and SAM OpenSolaris.org Projects • All of the object-based storage projects are being developed in the open: • SAM-QFS project > http://www.opensolaris.org/os/project/samqfs • OSD Project > http://www.opensolaris.org/os/project/osd • iSER Project > http://www.opensolaris.org/os/project/iser • Common Multiprotocol SCSI target > http://www.opensolaris.org/os/project/comstar

Page 25 Shared QFS and SAM Scale Shared QFS

Page 26 Shared QFS and SAM Intelligent Storage Is The Future ... • Further de-couple the physical storage from applications > A paradigm shift for file systems > Storage nodes become peers of compute nodes > Storage nodes can be hybrids of disks, tapes, optical, DRAM, , etc. • Intelligent object-based storage enables a platform for new innovations in storage > Underlying storage technologies can evolve independently of the data that they store and the protocols that access them!

Page 27 Harriet Coverston [email protected] Shared QFS and SAM HPC Storage Solutions What We Hear from Our Customers Scalable Compute Engine Long-term Retention Data Server (Scratch) and Archive (Archive) • Maximum aggregate B/W at • Deep repository for massive lowest possible costs amounts of data • Scalable to hundreds of Petabytes • User access to data regardless • Scalable to GB/sec B/W of location • Low latency, temporary storage, • High performance movement in accessible by all compute nodes and out of the Compute Engine Data Cache • Ease of deployment and administration • Ease of deployment and policy based administration • “Reasonable” availability and reliability • Self protecting – no backups Seamless Transfer

Page 29 HPC Storage Solutions High Bandwidth Long-Term Scalable Storage Cluster Data Retention with with SAM-QFS Compute Cluster Near Line Archive Metadata Servers

IB Network Load SAN

Archive Home Directories Data Movers Tier 1 Archive Object Storage Farm Sun Microsystems, confidential, internal use only Shared QFS and SAM Scalable Storage Cluster High Bandwidth Data Engine

• Achieves extreme scale and aggregate Compute Cluster bandwidth required by “Fusion” for a low Metadata price/performance Servers • Leverages both the SunFire x4500 storage ervers & Lustre - the defacto standard HPC parallel file system • Built with industry standard open technologies IB Network Infiniband, open source software Load • High speed, low-latency interconnects with simplified cabling via Magnum • Ideal for HPC cache & temp. storage Data Movers • Pre-integrate systems – arrive at your site Object Storage Farm ready to run

Page 31 Shared QFS and SAM Long-Term Retention & Archive Staging, Storing & Maintaining HPC Data

Near Line Archive • Provides a massive on-line/near-line repository to compliment the Scalable Storage Cluster • Leverages Sun StorageTek Tape Libraries, Modular Arrays and SAM-QFS • Policy driven engine to automate moving data sets in to and results out of the Object SAN Storage Farm • Enables Tape Libraries as a large near-line repository Archive • Stores data in open formats (TAR) allowing Data Movers technology refresh and avoiding vendor lock- in Home Directories • Pre-integrate systems – arrive at your site Tier 1 Archive ready to run

Page 32 Shared QFS and SAM HPC Solution: Lustre + QFS + SAM

> The combination of Lustre, QFS and SAM, and the X4500 create a truly unique and innovative storage solution that addresses the broadest range of HPC storage requirements in the industry.

http://www.blogs.sun.com/HPC/entry/video_open_storage_for_hpc Page 33 Shared QFS and SAM TACC Storage HPC Storage Solutions Compute Engine Long-term Retention Data Cache and Archive • Will scale to over • Will scale to over > 72 GB/sec. sustained bandwidth > 200 Petabytes of near-line > 1.728 Petabytes of raw capacity > 3.1 Petabytes of on-line • Configuration includes > Configuration includes > 72 SunFire x4500s > 5 StorageTek SL8500s > Over 3,000 500GB drives > 48 StorageTek T10000Bs > 8 racks > 10 StorageTek 6540s > 6 SunFire Metadata servers with SAM-QFS Seamless Transfer

Page 34 Shared QFS and SAM

Customer Snapshot: Media, Entertainment & Internet Services HBO • Digital repository for its SD and HD programming • Sun StorageTek QFS high-performance file-sharing software provided the scalability needed to store the large program-length video files and the powerful performance required to meet HBO's demanding throughput goals. • Sun technology ensured cross-platform compatibility and aligned with HBO's strategy to develop its Video Network systems in the Java programming language. • http://www.sun.com/customers/storage/hbo.xml

Page 35 Shared QFS and SAM Customer Success Story

● Deliver secure, highly cost-effective access to user image files Challeng ● Consolidate management of numerous Line-of-Business storage e silos ● Simplify storage administration while maintaining flexibility ● Fix the “backup problem”

● Partnered with ACS, designed a new SAM-QFS policy Solutio managed n archive with integrated protection. 30 PB build-out ● Used extensive PS-led POC to prove performance, availability and scale ● Tied to Sun exclusive offerings: , NIC's with TOE engines (Atlas and Neptune), X4600 “movers” Benefit ● Reduced backup time of main Photo archive from “impossible” s to Zero ● 100X scale with same architecture ● Capital savings estimated at $400k over 2 years ● Media Migration no longer an issue

Page 36 Shared QFS and SAM Customer Success Story

Challeng ● Ensure secure, highly available access to medical image files e ● Consolidate management of numerous departmental application and storage silos ● Simplify storage administration while maintaining flexibility

Solutio ● Partnering with Phillips, designed a new PACS system and n storage infrastructure ● Used the SAN to drive consolidation and promote shared storage across the company ● Deployed a comprehensive ILM solution to take advantage of disk and tape storage tiers with SAM-FS as central mgmt.

Benefit ● Reduced backup time of image database by 80% s ● Improved compliance with HIPPA ● Capital savings of $400k over 5 years ● Improved image access time by 300%

Page 37 Shared QFS and SAM Customer Success Story Digital Content Archiving

● Industry leader in video post-production ● Locations in US and EAME ● Digital Media Environment > Managing massive amounts of shared digital content > View, edit, store uncompressed data between global facilities ● Implemented tiered storage solution from Sun > SAM-QFS, 6540, X4500, SL8500, T10000 ● Streamlined digital file-based workflows > Archiving content cost effectively > Generating new revenue streams

Page 38 Shared QFS and SAM Customer Success Story

• Digital Asset Management w/Open Text Artesia > Retrieve, reuse, repurpose content realtime > Implemented in 4 months > Improved production productivity 10%-40%

Page 39 Customer Success Story

Industry: Healthcare Sun Products & Services: Revenues: N/A Storage & Services

Challenge • Meet HIPAA data protection requirements • Centralize data storage • Reduce network expense and complexity

Solution • NYU Medical Center implemented the Sun StorageTek lLM storage infrastructure and created Ca toiers 3t -setoraffgeect syisveteml yto provide disaster recovery capabilitiesR. unning the Results • Patient records retrievable in seconds Extended • Improved collaboration among healthcare providers • ILM strategy provides automated data reEtennttieonrprise • Improved data integrity and accessibility “ Sun StorageTek created a tailored information lifecycle management solution to meet our regulatory, business and budgetary objectives. Our PACS application performs better and is a key element to delivering better patient care and enhanced practitioner productivity. ” — Chris Petillo, Director of PACS, NYU Medical Center