HPC Storage Systems
Total Page:16
File Type:pdf, Size:1020Kb
HPC Storage Systems Nasir YILMAZ Cyberinfrastructure Engineer HPC Storage ● Performance (Input/Output Per Second / iops) ● Availability (uptime) ● File System (FS) ● Recovery requirements ( Snapshot / Backup / Replication ) ● Cost of the media ( SAS,SATA,Tape ) Refer to Storage and Quota in our Google site: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/storage Available Storage Options on HPC Tier Type of Storage Typical Workload Storage System/ Accessibility File System Tier 0 HPC Storage High Performance / Panasas mounted in HPC as Parallel Computing PanFS /home /mnt/pan /scratch Tier 1 Research Storage High Performance / Dell Fluid FS / mounted in HPC as Parallel Computing Qumulo FS /mnt/projects Tier 2 Research Dedicated High Volume Storage ZFS (Zettabyte mounted in HPC as Storage (RDS) FS) /mnt/rds Tier 3 Research Archive Nearline Storage / Spectra for Archival purpose Object Storage BlackPearl Tier 4 Cloud Storage Cloud Storage Google Drive, Google Drive & Box are free for Box, Amazon S3 Case and can be accessed from HPC via Rclone or WebDav Storage Mount Points in HPC (df -hTP) Comments Mount Points Quota Lifetime 700/920 GiB for HPC Group valid as HPC default user space in HPC /home 150/260 GiB for Guests users Snapshots,Tape /scratch/pbsjobs 1TiB for /scratch/pbsjobs per any group space for temporary job 14 days /scratch/users 1TiB on /scratch/users for members files, No Backups according to the storage space acquired by the PI Snapshot /mnt/pan lease term according to the /mnt/projects storage space acquired by the PI Snapshot,Replication lease term 5 Years storage system acquired by the PI Snapshot,Replication /mnt/rds Warranty DATA LIFECYCLE power of the 2 Useful Commands - id id - print real and effective user and group IDs id -g, --group print only the effective group ID id -gn,--name print a name instead of a number Useful Commands - du du - estimate file space usage -h --human-readable -h --max-depth=1 | sort -hr -m --block-size=1M -k --block-size=1K -sk --time Useful Commands - df df -- report file system disk space usage -h --human-readable print sizes in human readable format (e.g., 1K 234M 2G) -T --print file system type -H likewise, but use powers of 1000 not 1024 Monitoring your Quota Panasas /home & /scratch ● Check /home and /scratch space usage for your group including you (updated few times a day). For instantaneous usage use panfs_quota command. $ quotachk <CaseID> ● Check the breakdown list of usage for all users in a group (updated few times a day): $ quotagrp <PI_CaseID> HPC Storage (Panasas) /mnt/pan $ pan_df /mnt/pan/... Research Storage (FluidFS) $ quotaprj <PI_CaseID> If you want to know which directory is occupying lots of your space $ du -h --max-depth=1 Reference: HPC Storage & Quota Managing your Space Important Note: You will receive soft quota limit warnings with email the respective subject line if anybody in the groups exceed the Soft Quota limit . Once the hard quota limit is reached, nobody in the group are not able to run jobs. As soon as you receive the Soft Quota warning, manage the space by: ● Compressing the directory using tar command as showed: $ tar -czf <dir-name>.tar.gz <dir-name> ● Delete files that you no longer need $ rm -r <unnecessary folder> ● If you continually have many jobs running, delete the scratch space immediately by including the following line at the end of the job file to limit /scratch quota $ rm -r "$PFSDIR"/* ● Transfering the Files from HPC to your local space ○ Visit HPC site Transferring File @HPC ● Contact us at [email protected] for questions ● Emptying the scratch space via job script as soon as the job completes #!/bin/bash #SBATCH -N 1 -n 1 #SBATCH --time=2:10:00 module load ansys cp flowinput-serial.in flow-serial.cas $PFSDIR # copy to scratch cd $PFSDIR #Run fluent fluent 2ddp -g < flowinput-serial.in cp -ru * $SLURM_SUBMIT_DIR # copy back from scratch rm -r $PFSDIR/* # Empty the scratch space ● If you are keeping the scratch space for analysis and want to delete them later, use find /scratch/pbsjobs -maxdepth 1 -user <CaseID> -name "*" | xargs rm -rf How to check usage /scratch/pbsjobs/ For instantaneous usage, use: ● panfs_quota /scratch/pbsjobs ● panfs_quota -G /scratch/pbsjobs # for group quota For the breakdown list of usage (updated few times a day) ● quotachk <CaseID> ● quotagrp <PI_CaseID> For most detailed output (It may take longer but it gives instantaneous usage) ● find /scratch/pbsjobs/ -user <caseid> -exec du -sh {} + 2>/dev/null > myscratch.log -- find searches recursively on the target directory -- '-user <caseid>' targets the directories owned by account -- '-exec du -sh {} +' performs the disk usage evaluation -- '2>/dev/null' redirects the standard error away from standard output, necessary here because so many of the job directories belong to other accounts. -- '> myscratch.log' redirect standard output to a file. Data Transfer Tools ● Globus ● Sftp/Scp Tools such as ○ Cyberduck,FileZilla,WinSCP,WebDrive ● Rclone https://rclone.org/ ○ Google Drive ○ Box ○ Dropbox HPC Guide to Data Transfer Storage Type Supported at Approved for Approved for Approved for Approved for Cost CWRU Public Data Internal Use Only Data Restricted Data Restricted TiB/$ Regulated Data HPC Storage ✔ ✔ ✔ ❌ ❌ Research Storage ✔ ✔ ✔ ❌ ❌ Research ✔ ✔ ✔ ❌ ❌ Dedicated Storage Research Archive ✔ ✔ ✔ ❌ ❌ SRE ✔ ✔ ✔ ✔ ✔ Amazon S3 ✔ ✔ ✔ ✔ ❌ Dropbox ❌ ❌ ❌ ❌ ❌ BOX hosted by ✔ ✔ ✔ ❌ ❌ Free CWRU Google Drive File ✔ ✔ ✔ ❌ ❌ Free Stream CLOUD STORAGE from Amazon AWS S3 AWS EBS AWS EFS AWS GLACIER Can be publicly accessible Accessible only via the given EC2 Machine Accessible via several EC2 machines and AWS services Web interface File System interface Web and file system interface Object Storage Block Storage Object storage Object Storage Scalable Hardly scalable Scalable Slower than EBS and EFS Faster than S3 and EFS Faster than S3, slower than EBS Good for storing backups Is meant to be EC2 drive Good for shareable applications Good for Archiving and workloads Questions ?.