What Happens When the Science DMZ Meets the Commodity Internet?
Total Page:16
File Type:pdf, Size:1020Kb
What happens when the Science DMZ meets the Commodity Internet? Presenter: Joe Breen Work done: Brian Haymore, Sam Liston University of Utah Center for High Performance Computing Boil and Bubble, Toil and Trouble, What do you get with a Science DMZ, That's mixed up with unlimited storage Fast and free? Image credit: http://www.zastavki.com/eng/Holidays/Halloween/wallpaper-24660.htm Start with a researcher Offer him candy (unlimited storage) Tell her its free Mix in a "frictionless" Science DMZ environment with 40G and 10G Data Transfer Nodes Image: https://fasterdata.es.net/science-dmz/science-dmz-architecture/ Mix in a well built commodity cloud service that can consume lots of data quickly Throw in an open source parallel tool that knows how to efficiently utilize a cloud provider’s API Google Drive Amazon S3 Openstack Swift / Rackspace cloud files / Memset Memstore Dropbox Google Cloud Storage Amazon Cloud Drive The local filesystem http://rclone.org/ What do you get? 14+Gig spikes, 5-8Gb/s of sustained traffic One 10G commodity pipe fills completely, traffic rolls to next available close peering point, and fills it too R&E routes yanked temporarily by Cloud vendor NOC to allow service to other commodity users and to better understand nature of congestion A call from a cloud provider NOC early in the morning asking to stop. (at least for a bit) Almost 100TB of data moved in 2.5 days Very happy researchers who want more What's an HPC center to do? Start mixing more... Multiple vendors now offering Apps/unlimited storage targeted at EDUcation *individual* users. Same vendors offer multi-tier Cloud storage for purchase for archival storage, mid-level storage, highly available storage, specific application storage Today's HPC researcher use cases for large personal cloud storage • Using as another storage bin for keeping points in time source code, input files, etc. • Using instead of a USB drive - a temporary location for: • Saving Snippets of code • Looking at single files • Moving data from a national resource that requires cleaning • Sharing data to a distributed international audience Potential other use cases for personal cloud storage or organization cloud storage • Use as large archive for different data sets for individual researcher • Use as archive for a research group or a collection of collaborating groups • "service account" with Linux extended ACLs to tar up and backup multiple collaborating research groups using the same file systems • allows a simple formula for groups to replicate data in another location -- flexibility for users at their level and convenience • Back off info from scratch drives - shuffle back and forth from Cloud provider and from HPC scratch – Accommodate over DMZ DTN box • ... The list continues … Some of these vendors have Research & Education peering directly, some of them only maintain Commodity peering Commodity peering not designed for bursty research or long term large data set file transfers Large research data set transfer Normal Commodity traffic Commodity peering points serve lots of companies and businesses Do we need additional tools in our ecosystem? How do we protect the community, the vendors, and the collaborative network environments, AND, encourage the innovation? Should we more aggressively leverage the emerging capabilities of the national Research and Education Software Defined Networking backbone? Match = Action Match flow (source IP, source port, destination IP, destination port…) => Action apply QoS or block per flow Data Transfer Nodes are generally multi-tenant, serving the transfer needs of several groups simultaneously. Image: https://fasterdata.es.net/science-dmz/science-dmz-architecture/ How might we work with one tenant AND continue to service other tenants well without null routing whole networks or even individual DTNs? Does the community need a more precision tool? OR Image: http://onsurg.com/instrument-handling-10-blade/ Summary • Science DMZs are frictionless, Commodity connections to Cloud vendors are not • Cloud vendors are coming up with new business plans that researchers like to use creatively • Do emerging technologies allow us to create new tools that might enable the innovation and protect the ecosystem? .