What happens when the Science DMZ meets the Commodity ?

Presenter: Joe Breen Work done: Brian Haymore, Sam Liston Center for High Performance Computing Boil and Bubble, Toil and Trouble, What do you get with a Science DMZ, That's mixed up with unlimited storage Fast and free?

Image credit: http://www.zastavki.com/eng/Holidays/Halloween/wallpaper-24660.htm Start with a researcher Offer him candy (unlimited storage) Tell her its free Mix in a "frictionless" Science DMZ environment with 40G and 10G Data Transfer Nodes

Image: https://fasterdata.es.net/science-dmz/science-dmz-architecture/ Mix in a well built commodity service that can consume lots of data quickly Throw in an parallel tool that knows how to efficiently utilize a cloud provider’s API

Google Drive S3 Openstack Swift / files / Memset Memstore Amazon Cloud Drive The local filesystem http://rclone.org/ What do you get? 14+Gig spikes, 5-8Gb/s of sustained traffic One 10G commodity pipe fills completely, traffic rolls to next available close peering point, and fills it too R&E routes yanked temporarily by Cloud vendor NOC to allow service to other commodity users and to better understand nature of congestion A call from a cloud provider NOC early in the morning asking to stop. (at least for a bit) Almost 100TB of data moved in 2.5 days Very happy researchers who want more What's an HPC center to do? Start mixing more... Multiple vendors now offering Apps/unlimited storage targeted at EDUcation *individual* users. Same vendors offer multi-tier Cloud storage for purchase for archival storage, mid-level storage, highly available storage, specific application storage Today's HPC researcher use cases for large personal cloud storage • Using as another storage bin for keeping points in time source code, input files, etc. • Using instead of a USB drive - a temporary location for: • Saving Snippets of code • Looking at single files • Moving data from a national resource that requires cleaning • Sharing data to a distributed international audience Potential other use cases for personal cloud storage or organization cloud storage • Use as large archive for different data sets for individual researcher • Use as archive for a research group or a collection of collaborating groups • "service account" with extended ACLs to tar up and backup multiple collaborating research groups using the same file systems • allows a simple formula for groups to replicate data in another location -- flexibility for users at their level and convenience • Back off info from scratch drives - shuffle back and forth from Cloud provider and from HPC scratch – Accommodate over DMZ DTN • ... The list continues … Some of these vendors have Research & Education peering directly, some of them only maintain Commodity peering Commodity peering not designed for bursty research or long term large data set file transfers

Large research data set transfer

Normal Commodity traffic

Commodity peering points serve lots of companies and businesses Do we need additional tools in our ecosystem?

How do we protect the community, the vendors, and the collaborative network environments, AND, encourage the innovation? Should we more aggressively leverage the emerging capabilities of the national Research and Education Software Defined Networking backbone?

Match = Action

Match flow (source IP, source port, destination IP, destination port…) => Action apply QoS or block per flow Data Transfer Nodes are generally multi-tenant, serving the transfer needs of several groups simultaneously.

Image: https://fasterdata.es.net/science-dmz/science-dmz-architecture/

How might we work with one tenant AND continue to service other tenants well without null routing whole networks or even individual DTNs? Does the community need a more precision tool?

OR

Image: http://onsurg.com/instrument-handling-10-blade/ Summary

• Science DMZs are frictionless, Commodity connections to Cloud vendors are not • Cloud vendors are coming up with new business plans that researchers like to use creatively • Do emerging technologies allow us to create new tools that might enable the innovation and protect the ecosystem?