<<

ESnet Planning, Status, and Future Issues ASCAC, August 2008

William E. Johnston, ESnet Department Head and Senior Scientist Joe Burrescia, General Manager Mike Collins, Chin Guok, and Eli Dart, Engineering Brian Tierney, Advanced Development Jim Gagliardi, Operations and Deployment Stan Kluz, Infrastructure and ECS Mike Helm, Federated Trust Dan Peterson, Security Officer Gizella Kapus, Business Manager and the rest of the ESnet Team Lawrence Berkeley National Laboratory

[email protected], this talk is available at www.es.net

Networking for the Future of Science

1 DOE Office of Science and ESnet – the ESnet Mission • ESnet is an Office of Science (“SC”) facility in the Office of Advanced Scientific Computing Research (“ASCR”) • ESnet’s primary mission is to enable the large-scale science that is the mission of the Office of Science (SC) and that depends on: – Sharing of massive amounts of data – Thousands of collaborators world-wide – Distributed data processing – Distributed data management – Distributed simulation, visualization, and computational steering • In order to accomplish its mission ESnet provides high-speed networking and various collaboration services to Office of Science laboratories – As well as to many other DOE programs on a cost recovery basis

2 ESnet Stakeholders and their Role in ESnet • SC/ASCR Oversight of ESnet – High-level oversight through the budgeting process – Near term input is provided by weekly teleconferences between ASCR ESnet Program Manager and ESnet – Indirect long term input is through the process of ESnet observing and projecting network utilization of its large-scale users – Direct long term input is through the SC Program Offices Requirements Workshops (more later) • Site input to ESnet – Short term input through many daily (mostly) email interactions – Long term input through bi-annual ESCC (ESnet Coordinating Committee – all of the Lab network principals) meetings • SC science collaborators input – Through numerous meeting, primarily with the networks that serve the science collaborators – mostly US and European R&E networks

3 Talk Outline I. How are SC program requirements communicated to ESnet and what are they

II. ESnet response to SC requirements

III. Re-evaluating the ESnet strategy and identifying issues for the future

IV. Research and development needed to secure the future

4 I. SC Science Program Requirements • Requirements are determined by 1) Exploring the plans and processes of the major stakeholders: • 1a) Data characteristics of instruments and facilities – What data will be generated by instruments coming on-line over the next 5-10 years (including supercomputers)? • 1b) Examining the future process of science – How and where will the new data be analyzed and used – that is, how will the process of doing science change over 5-10 years? 2) Observing current and historical network traffic patterns • What do the trends in network patterns predict for future network needs?

5 (1) Exploring the plans of the major stakeholders • Primary mechanism is SC network Requirements Workshops • Workshop agendas and invitees are determined by the SC science Program Offices • Two workshops per year • Workshop schedule – BES (2007 – published) – BER (2007 – published) – FES (2008 – published) – NP (2008 – published) – IPCC (Intergovernmental Panel on Climate Change) special requirements (BER) (August, 2008) – ASCR (Spring 2009) – HEP (Summer 2009) • Future workshops - ongoing cycle – BES, BER – 2010 – FES, NP – 2011 – ASCR, HEP – 2012 – (and so on...) • Workshop reports: http://www.es.net/hypertext/requirements.html Major Facilities Examined • Some of these are done outside (in addition to) the Requirements Workshops • Advanced Scientific Computing • Biological and Environmental Research (ASCR) (BER) – Bioinformatics/Genomics – NERSC (supercomputer center) – Climate Science (LBNL) – IPCC – NLCF (supercomputer center) (ORNL) • Fusion Energy Sciences (FE) – Magnetic Fusion Energy/ITER – ACLF (supercomputer center) (ANL) • High Energy Physics (HEP) – LHC (Large Hadron Collider, • Basic Energy Sciences (BES) CERN), Tevatron (FNAL) – Advanced Light Sources • Nuclear Physics (NP) • Macromolecular Crystallography – RHIC (Relativistic Heavy Ion – Chemistry/Combustion Collider) (BNL) – Spallation Neutron Source (ORNL) • These are representative of the data generating

‘hardware infrastructure’ of DOE science 7 Requirements from Instruments and Facilities • Bandwidth – Adequate network capacity to ensure timely movement of data produced by the facilities • Connectivity – Geographic reach sufficient to connect users and analysis systems to SC facilities • Services – Guaranteed bandwidth, traffic isolation, end-to-end monitoring – Network service delivery architecture • SOA / Grid / “Systems of Systems”

8 Requirements from Instruments and Facilities - Services • Fairly consistent requirements are found across the large-scale sciences • Large-scale science uses distributed systems in order to: – Couple existing pockets of code, data, and expertise into “systems of systems” – Break up the task of massive data analysis into elements that are physically located where the data, compute, and storage resources are located • Such systems – are data intensive and high-performance, typically moving terabytes a day for months at a time – are high duty-cycle, operating most of the day for months at a time in order to meet the requirements for data movement – are widely distributed – typically spread over continental or inter-continental distances – depend on network performance and availability, but these characteristics cannot be taken for granted, even in well run networks, when the multi-domain network path is considered • The system elements must be able to get guarantees from the network that there is adequate bandwidth to accomplish the task at hand • The systems must be able to get information from the network that allows graceful failure and auto-recovery and adaptation to unexpected network conditions that are short of outright failure See, e.g., [ICFA SCIC]9 The International Collaborators of DOE’s Office of Science, Drives ESnet Design for International Connectivity

Most of ESnet’s traffic (>85%) goes to and comes from outside of ESnet. This reflects the highly collaborative nature of large-scale science (which is one of the main focuses of DOE’s Office of Science). = the R&E source or destination of ESnet’s top 100 sites (all R&E) (the DOE Lab destination or source of each flow is not shown) Aside • At present, ESnet traffic is dominated by data flows from large instruments – LHC, RHIC, Tevatron, etc. • Supercomputer traffic is a small part of ESnet’s total traffic, though it has the potential to increase dramatically – However not until appropriate system architectures are in place to allow high-speed communication among supercomputers

11 Other Requirements • Assistance and services are needed for smaller user communities that have significant difficulties using the network for bulk data transfer – Part of the problem here is that WAN network environments (such as the combined US and European R&E networks) are large, complex systems like supercomputers and you cannot expect to get high performance when using this “system” in a “trivial” way – this is especially true for transferring a lot of data over distances > 1000km

12 Science Network Requirements Aggregation Summary Science Drivers End2End Near Term 5 years Traffic Characteristics Network Services Reliability End2End End2End Band Science Areas / Band width width Facilities

ASCR: - 10Gbps 30Gbps • Bulk data • Guaranteed bandwidth ALCF • Remote control • Deadline scheduling • Remote file system • PKI / Grid sharing ASCR: - 10Gbps 20 to 40 Gbps • Bulk data • Guaranteed bandwidth NERSC • Remote control • Deadline scheduling • Remote file system • PKI / Grid sharing ASCR: - Backbone Backbone •Bulk data • Guaranteed bandwidth Bandwidth Bandwidth NLCF •Remote control • Deadline scheduling Parity Parity •Remote file system • PKI / Grid sharing BER: - 3Gbps 10 to 20Gbps • Bulk data • Collaboration services Climate • Rapid movement of • Guaranteed bandwidth GB sized files • PKI / Grid • Remote Visualization BER: - 10Gbps 50-100Gbps • Bulk data • Collaborative services EMSL/Bio • Real-time video • Guaranteed bandwidth • Remote control BER: - 1Gbps 2-5Gbps • Bulk data • Dedicated virtual circuits JGI/Genomics • Guaranteed bandwidth Science Network Requirements Aggregation Summary Science Drivers End2End Near Term 5 years Traffic Characteristics Network Services Reliability End2End End2End Science Areas / Band width Band width Facilities

BES: - 5-10Gbps 30Gbps • Bulk data • Data movement middleware Chemistry and • Real time data streaming Combustion BES: - 15Gbps 40-60Gbps • Bulk data • Collaboration services Light Sources • Coupled simulation and • Data transfer facilities experiment • Grid / PKI • Guaranteed bandwidth BES: - 3-5Gbps 30Gbps •Bulk data • Collaboration services Nanoscience •Real time data streaming • Grid / PKI Centers •Remote control FES: - 100Mbps 1Gbps • Bulk data • Enhanced collaboration services International Collaborations • Grid / PKI • Monitoring / test tools FES: - 3Gbps 20Gbps • Bulk data • Enhanced collaboration service Instruments and • Coupled simulation and Facilities experiment • Grid / PKI • Remote control FES: - 10Gbps 88Gbps • Bulk data • Easy movement of large checkpoint files Simulation • Coupled simulation and experiment • Guaranteed bandwidth • Remote control • Reliable data transfer Science Network Requirements Aggregation Summary Science Drivers End2End Near Term 5 years Traffic Characteristics Network Services Reliability End2End End2End Science Areas / Band width Band width Facilities

Immediate Requirements and Drivers for ESnet4 HEP: 99.95+% 73Gbps 225-265Gbps • Bulk data • Collaboration services LHC (CMS and (Less than 4 • Coupled analysis • Grid / PKI Atlas) hours per year) workflows • Guaranteed bandwidth • Monitoring / test tools

NP: - 10Gbps 20Gbps •Bulk data • Collaboration services (2009) CMS Heavy Ion • Deadline scheduling • Grid / PKI

NP: - 10Gbps 10Gbps • Bulk data • Collaboration services CEBF (JLAB) • Grid / PKI

NP: Limited outage 6Gbps 20Gbps • Bulk data • Collaboration services duration to RHIC • Grid / PKI avoid analysis pipeline stalls • Guaranteed bandwidth • Monitoring / test tools II. ESnet Response to the Requirements • ESnet4 was built to address specific Office of Science program requirements. The result is a much more complex and much higher capacity network.

ESnet3 2000 to 2005: • A routed IP network with sites singly attached to a national core ring • Very little peering redundancy

ESnet4 in 2008: • The new Science Data Network (blue) is a switched network providing guaranteed bandwidth for large data movement • All large science sites are dually connected on metro area rings or dually connected directly to core ring for reliability • Rich topology increases the reliability of the network 16 New ESnet Services • Virtual circuit service providing schedulable bandwidth guarantees, traffic isolation, etc – ESnet OSCARS service • http://www.es.net/OSCARS/index.html • Successfully deployed in early production today • Additional R&D is needed in many areas of this service • Assistance for smaller communities in using the network for bulk data transfer – fasterdata.es.net – web site devoted to information on bulk data transfer, host tuning, etc. established – Other potential approaches • Various latency insensitive forwarding devices in the network (R&D)

17 Building the Network as Opposed to Planning the Budget • Aggregate capacity requirements like those above indicate how to budget for a network but do not tell you how to build a network • To actually build a network you have to look at where the traffic originates and ends up and how much traffic is expected on specific paths • So far we have specific bandwidth and path (collaborator location) information for – LHC (CMS, CMS Heavy Ion, Atlas) – SC Supercomputers – CEBF/JLab – RHIC/BNL this specific information has lead to the current and planned configuration of the network for the next several years

18 How do the Bandwidth – Path Requirements Map to the Network? (Core Network Planning - 2010)

LHC/CERN Seattle 45 50 PNNL 40 15 20 B N Port. o A s USLHC L Boise to ) ht n N A C USLHC ig f h rL A ic ta o a S M A g ( o Clev.

N Denver Phil Y KC C BNL SLC s. itt P Wash. DC FNAL Sunnyvale 10 LLNL 5 ORNL La s Raleigh Ve LANL Tulsa LA ga Nashville GA s 5 Albuq. 20 SC SD ? o Atlanta g 20 ie D 5 n a Jacksonville S El Paso ESnet IP switch/router hubs 20 Baton ESnet SDN switch hubs Houston Rouge ESnet IP core Layer 1 optical nodes - eventual ESnet Points of Presence ESnet Science Data Network core (N X 10G) ESnet SDN core, NLR links (backup paths) Layer 1 optical nodes not currently in ESnet plans Lab supplied link Lab site Lab site – independent dual connect. LHC related link MAN link XX Committed path capacity, Gb/s International IP Connections ESnet 4 Core Network – December 2008

LHC/CERN

Seattle PNNL

B N Port. o A s USLHC L Boise to ) ht n N A C USLHC ig f h rL A ic ta o a S M A g ( o Clev.

20G N Denver 20G Phil Y KC C BNL SLC s. itt G P 20 Wash. DC FNAL Sunnyvale LLNL 20G ORNL La s Raleigh Ve LANL Tulsa G LA ga Nashville 20 GA s Albuq. SC SD ? o Atlanta g ie D n a Jacksonville S El Paso ESnet IP switch/router hubs Baton ESnet SDN switch hubs Houston Rouge ESnet IP core Layer 1 optical nodes - eventual ESnet Points of Presence ESnet Science Data Network core (N X 10G) ESnet SDN core, NLR links (backup paths) Layer 1 optical nodes not currently in ESnet plans Lab supplied link Lab site Lab site – independent dual connect. LHC related link MAN link ESnet aggregation switch International IP Connections 20 ESnet4 Metro Area Rings, December 2008 West Chicago MAN Long Island MAN LHC/CERN 600 W. Chicago USLHCNet Seattle BNL PNNL 32 AoA, NYC Starlight B N Port. o A s USLHC L Boise to ) ht n N A USLHC ig f rL A Sta th o USLHCNet 111- 8 (NEWY) M A Chicago Clev. ( FNAL ANL 20G N Denver 20G Phil Y KC C BNL SLC s. itt G P 20 Wash. DC FNAL Sunnyvale LLNL 20G The goal of the MANs is to get theORNL big La s Raleigh Ve LANL Labs direct,Tulsa high-speed, redundant G LA ga Nashville 20 San FranciscoGA s accessAlbuq. to the ESnet core network SC BaySD Area? MAN Newport News - Elite o Atlanta g LBNLie D Wash., n Jacksonville a El Paso S JGI DC MATP SLAC NERSC Atlanta MAN Baton ORNLHouston (backup)Rouge JLab ESnet IP core • Upgrade SFBAMAN switchesNashville – 12/08-1/09 ESnet Science DataELITE Network core

SUNN Wash., DC SNLL • LI MAN expansion,56 Marietta BNL diverse entry – 7-8/08ESnet SDN core, NLR links (existing) (SOX) ODU LLNL • FNAL and BNL dual ESnet connection - ?/08Lab supplied link LHC related link SN • Dual connections for large data centers V1 180 Peachtree MAN link (FNAL, BNL) Houston International IP Connections 21 ESnet 4 As Planned for 2010

LHC/CERN

Seattle PNNL

B N Port. o A s USLHC L 40G Boise to ) ht n N A C USLHC ig f h rL 50G A 40G ic ta o a S M A g ( o Clev.

50G N Denver 50G Phil Y KC C BNL SLC s. itt G P 50G 50 40G Wash. DC FNAL Sunnyvale LLNL 50G 30G 40G 40G ORNL La s Raleigh Ve LANL Tulsa G LA ga Nashville 50 GA s Albuq. 30G 30G SC 40G SD ? o Atlanta g ie D n a 40G Jacksonville S El Paso 40G ESnet IP switch/router hubs Baton ESnet IP switch only hubs Houston Rouge ESnet IP core ESnet SDN switch hubs ESnet aggregation switch ESnet Science Data Network core ESnet SDN core, NLR links (existing) Layer 1This optical growth nodes - eventua in thel ESnet network Points of capPresenceacity is based onLab suppliedthe current link Layer 1 opticalESnet nodes 5 not yr. currently budget in ESnet as planssubmitted by SC/ASCRLHC relatedto OMB link MAN link Lab site Lab site – independent dual connect. International IP Connections 22 MAN Capacity Planning - 2010 600 W. Chicago 25 CERN 40 Seattle 20 USLHCNet25 (28) BNL (8 Portland ) (>1 λ) 15 CERN 32 AoA, NYC 5λ (29) StarlightBoise 65 Boston (9) 5λ (7) Chicago 4λ (11) Clev. 5λ (10) . NYC 5λ tts (32) (13) 5λ Pi (25) Denver 2) ) (1 4 Philadelphia Salt KC (1 (15) λ USLHCNet 1008080 5 lis 5λ (26) Lake (1 o 4λ 6) p ) City a (21) 7

Sunnyvale n 2 Wash. DC 4λ 5λ ia ( d 4λ In (23) 5 (0) (22) 3λ (30)

5λ Raleigh LA 4λ Tulsa Nashville FNAL 5 OC48 Albuq. ANL 10 (24) 4λ 4λ 40 (3) 3λ (4) San Diego (1) 3λ Atlanta (20) (2) 20 (19) 5 4λ Jacksonville El Paso (17) 4λ ESnet IP switch/router hubs (6) (5) 10 Baton ESnet IP core (1λ) ESnet IP switch only hubs Houston Rouge ESnet Science Data Network core ESnet SDN switch hubs ESnet SDN core, NLR links (existing) Lab supplied link Layer 1 optical nodes at eventual ESnet Points of Presence LHC related link Layer 1 optical nodes not currently in ESnet plans MAN link International IP Connections Lab site (20) Internet2 circuit number 23

to GÉANT to

ST

USHLCNet USHLCNet MIT/ O

PSFC B BNL

CLARA

AMPATH

N

A

L

N

A (S. America)

M

t

e

N PPPL GFDL PU Physics

R

E JLAB

S

Y

N

2

t

e

n P

r e Italy, UK, etc etc UK, Italy, t n I o

Equinix P - Germany, France, GÉANT DOE funded 2 G R t A X NSF/IRNC L e Y A n F N r M e O W t n 10 Gb/s) A E I

N H

S SRS

Offices Lab DC A W

NNSA

OSTI TLA L A

International (1-10 Gb/s) (1-10 International Gb/s10 SDN core (I2, NLR) 10Gb/score IP ( MAN rings links Lab supplied OC12GigEthernet / OC3 (155 Mb/s) 45 Mb/s and less ORAU

DOE GTN N

R

NETL X

CERN

O V

O

E

(USLHCnet: S L C DOE+CERN funded) CA*net4

C I NOAA

2 H

t

e

n

r C

H e t

n p I o S

GP A

IU N

N S

C

C

I x i U

in SINet (Japan) (BINP) Russia L O

HI-S u is Geography C q H ht E g et li N r only representational ta C

S H R L L ARM L S

S L A KCP U N N E

N

F A M

A

S is mplexity) of ESnet

N

A K ASCGNet) X

MREN StarTap Taiwan (TANet2, E T N

A

p P o

A

P V G

N P

R

E L F

D E

L B N L

Allied Signal A

A - L

SNLA E

(Russia, China) (Russia, O CA*net4 France GLORIAD (Kreonet2 Korea D Salt Lake Salt

Other R&E peering points R&E peering Other Specific R&E network peers NREL M

N

U

U

KAREN / REANNZKAREN Transpac2 Internet2 Korea (kreonet2) (SINet) Japan SINGAREN ODN Japan Telecom America B L

V A L

R&E

N S I networks A

L in its high degree of interconnectedness Facilities and Collaborators (12/2008) NSTEC V -N YUCCA MTYUCCA

L Much of the utility (and co E

T

O H ¾

L

e C L

G S

I

L

v I E

GA

N

L a B

KAREN/REANNZ ODN Japan Telecom America NLR-Packetnet Internet2 Korea (Kreonet2)

N

K N O L

V W S

L N B

L

c PAIX-PA etc. Equinix, a

U C

P

e S v R

PNNL a

W

c A a I P

Internet2 T NASA C A

S A E Ames S D S ESnet core hubs

S

O

1 L CUDI commercial peering points V JGI ESnet Provides Global High-Speed Connectivity for DOE

Japan (SINet) Japan Australia (AARNet) (CA*net4 Canada Taiwan (TANet2) Singaren Transpac2 CUDI N SLAC N (S. America) NNSA Sponsored (13+) NNSA Sponsored (3) Joint Sponsored NOAA) (NSF LIGO, Sponsored Other (6) Sponsored Laboratory NERSC LBNL

Office Of Science Sponsored (22) S N UCSD Physics AU

U ~45end user sites

S AU One Consequence of ESnet’s New Architecture is that

1800 Site Availability is Increasing 1600 ESnetESnet Availability Availability 2/2007 through2/2007 1/2008 through 1/2008 1400 1200 2007 Site Availability Feb 1000 Mar s “5 nines” (>99.995%) “4 nines” (>99.95%) “3 nines” (>99.5%) Apr 800 May 600 Jun Jul All Causes Aug Outage Minute 400 Sep 200 Oct

Site Outage Minutes (total) Site Outage Minutes Nov 0 Dec Jan GA 99.916 GA INL 99.909 INL JGI 99.988 JGI MI T 99. 947 Y12 99.863 Y12 LBL 99.996 LBL BNL 99.966 BNL BJC 99.862 SRS 99.704 SRS KCP 99.871 KCP IARC 99.985 IARC LIGO 99.998 LIGO OSTI 99.851 OSTI JLab 99.984 JLab MSRI 99.994 MSRI FNAL 99.999 FNAL 99.998 LLNL NREL 99.965 NREL LANL 99.972 LANL SNLL 99.997 SNLL ANL 100.000 ANL SNLA 99.971 SNLA PPPL 99.985 PNNL 99.998 PNNL ORAU 99.857 ORAU 99.756 NOAA Yucca 99.917 Yucca NSTEC 99.991NSTEC SLAC 100.000 SLAC ORNL 100.000 ORNL NERSC 99.998 Pantex 99.967 Pantex Lamont 99.754 Lamont Bechtel 99.885 Bechtel LLNL-DC 99.991 LANL-DC 99.990 DOE-ALB 99.973 DOE-GTN 99.997DOE-GTN

1800 99.852 Ames-Lab DOE-NNSA 99.917 1600 ESnet 12 Month Customer Availability 1400 ESnetJuly Availability 2008 8/2007 through 7/2008 1200 1000 “3 9’s (>99.5%) Aug s 800 “5 nines” (>99.995%) “4 nines” (>99.95%) Sep Oct 600 Nov inute 2008 Site Availability Dec 400 Jan All Causes utage M

O Feb 200 Mar Apr 0 May Site Outage Minutes (total) Site Outage Minutes Jun 99.705 S 99.985 99.999 99.870 99.965 99.968 99.968 99.968 99.998 99.955 99.989 I 99.998 I 99.997 P L 99.997L 100.000 A 99. 925 C L 99.998L C C C A 99.996A IT 99.992 TI 99.953 R J G 12 99.967 L 100.000L Jul R E C TN 99.918 TN S J O S G AA 99. 960 NL 99.992 NL AU S B INL 99.978 INL M Lab 99.905 Lab P LB S Y TE K LAC NLL 99.999 NLL B R J NLA 99.996 NLA NL 100.000 NL R ucca 99.954 ucca O -ALB P IAR M -G LLNL 99.988 LLNL S S ant ex 99.955 S 99.995 LANL NR R ANL 100.000 ANL E Y NNL 100.000 NNL O P echtel 99.813 NO E LIG P NS -NNS FNAL 100.000 FNAL NE P O B E Lamont 100.000Lamont LLNL-DC DO LANL-DC DO Ames-Lab 99.991 Ames-Lab DO III. Re-evaluating the Strategy and Identifying Issues • The current strategy (that lead to the ESnet4, 2012 plans) was developed primarily as a result of the information gathered in the 2003 and 2003 network workshops, and their updates in 2005-6 (including LHC, climate, RHIC, SNS, Fusion, the supercomputers, and a few others) [workshops] • So far the more formal requirements workshops have largely reaffirmed the ESnet4 strategy developed earlier

• However – is this the whole story?

26 Where Are We Now? How do the science program identified requirements compare to the network capacity planning? • The current network is built to accommodate the known, path-specific needs of the programs • However this is not the whole picture: The core path capacity planning (see map above) so far only accounts for 405 Gb/s out of 789 Gb/s identified aggregate requirements provided by the science programs

Synopsis of “Science Network Requirements Aggregation Summary,” 6/2008 Accounted for in current ESnet path 5 year requirements planning Unacc’ted for Requirements (aggregate Gb/s) 789 405 384

ESnet Planned Aggregate Capacity (Gb/s) Based on 5 yr. Budget

2006 2007 2008 2009 2010 2011 2012 2013

ESnet “aggregate” 57.50 192 192 842 1442 1442 1442 2042

• The planned aggregate capacity growth of ESnet matches the know requirements • The “extra” capacity indicated above is needed to account for the fact that there is much less than complete flexibility in mapping specific path requirements to the aggregate capacity planned network and we won’t know specific paths until several years into building the network

• Whether this approach works is TBD, but indications are that it probably will 27 Is ESnet Planned Capacity Adequate? E.g. for LHC? (Maybe So, Maybe Not) • Several Tier2 centers (mostly at Universities) are capable of 10Gbps now – Many Tier2 sites are building their local infrastructure to handle 10Gbps – We won’t know for sure what the “real” load will look like until the testing stops and the production analysis begins ¾ Scientific productivity will follow high-bandwidth access to large data volumes ⇒ incentive for others to upgrade • Many Tier3 sites are also building 10Gbps-capable analysis infrastructures – this was not in LHC plans a year ago – Most Tier3 sites do not yet have 10Gbps of network capacity – It is likely that this will cause a “second onslaught” in 2009 as the Tier3 sites all upgrade their network capacity to handle 10Gbps of LHC traffic ¾ It is possible that the USA installed base of LHC analysis hardware will consume significantly more network bandwidth than was originally estimated – N.B. Harvey Newman (HEP, Caltech) predicted this eventuality years ago 28 Reexamining the Strategy: The Exponential Growth of HEP Data is “Constant” For a point of “ground truth” consider the historical growth of the size of HEP data sets – The trends as typified by the FNAL traffic will continue. 1.E+19

1.E+18 present historical estimated 1.E+17 1 Exabyte 1.E+16

1.E+15

1.E+14 1 Petabyte

1.E+13

1.E+12 HEP experiment data size 1.E+11 Expon. (HEP experiment data size)

Experiment Generated Data, Bytes 1.E+10

1.E+09 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018

Data courtesy of Harvey Newman, Caltech, and Richard Mount, SLAC Reexamining the Strategy • Consider network traffic patterns – “ground truth” – What do the trends in network patterns predict for future network needs

10000.0 Apr, 2006 ESnet Traffic Increases by 1 PBy/mo. 10X Every 47 Months, on Nov, 2001 100 TBy/mo. 1000.0 Average Mar, 2010 Jul, 1998 10 PBy/mo. 10 TBy/mo. 100.0 53 months Oct, 1993 1 TBy/mo. 10.0

Aug, 1990 40 months observation, 1990-2008 100 MBy/mo. .1, 1, 10, 100, 1000 1.0 Exponential fit and projection 2 years forward 57 months Terabytes / month Terabytes

0.1 38 months

0.0 Jan, 90 Jan, 91 Jan, 92 Jan, 93 Jan, 94 Jan, 95 Jan, 96 Jan, 97 Jan, 98 Jan, 99 Jan, 00 Jan, 01 Jan, 02 Jan, 03 Jan, 04 Jan, 05 Jan, 06 Jan, 07 Jan, 08 Jan, 09 Log Plot of ESnet Monthly Accepted Traffic, January, 1990 – April 2008 Where Will the Capacity Increases Come From? • ESnet4 planning assumes technology advances will provide100Gb/s optical waves (they are 10 Gb/s now) which gives a potential 5000 Gb/s core network by 2012 • The ESnet4 SDN switching/routing platform is designed to support new 100Gb/s network interfaces • With capacity planning based on the ESnet 2010 wave count, together with some considerable reservations about the affordability of 100 Gb/s network interfaces, we can probably assume some fraction of the 5000 Gb/s of potential core network capacity by 2012 depending on the cost of the equipment – perhaps 20% – about 1000-2000 Gb/s of aggregate capacity ¾ Is this adequate to meet future needs? Not Necessarily!

31 Network Traffic, Physics Data, and Network Capacity

100000000Ignore the units of the quantities being graphed they are normalized to 1 in 1990, just look at the long-term trends: All of the “ground truth” measures are growing significantly faster than ESnet projected capacity 10000000 2010 value Projection Historical 0.6704x ESnet traffic -- 2010 value y = 0.8699e 1000000 HEP exp. data xx 40 PBy a 40 Pby at ESnet capacity -- l d xx de Climate modeling data o 100000 4 PBy4 Pby m te Expon. (ESnet traffic) a 0.5244x lim y = 0.4511e Expon. (HEP exp. data) C y = 2.3747e 0.5714x 10000 Expon. (ESnet capacity) Expon. (Climate modeling data) c y = 0.1349e 0.4119x affi t tr 1000 ne ES

ata 100 t d en ap rim dm pe oa ex ty r P aci 10 HE cap net ES

1 All Three Data Series are Normalized toAll Three Data Series “1” at Jan. 1990 Jan, 90 Jan, 91 Jan, 92 Jan, 93 Jan, 94 Jan, 95 Jan, 96 Jan, 97 Jan, 98 Jan, 99 Jan, 00 Jan, 01 Jan, 02 Jan, 03 Jan, 04 Jan, 05 Jan, 06 Jan, 07 Jan, 08 Jan, 09 Jan, 10 Jan, 11 Jan, 12 Jan, 13 Jan, 14 Jan, 15 0 32 ¾ Issues for the Future Network • The current estimates from the LHC experiments and the supercomputer centers have the currently planned ESnet 2011 wave configuration operating at capacity and there are several other major sources that will be generating significant data in that time frame (e.g. Climate) • The significantly higher exponential growth of traffic (total accepted bytes) vs. total capacity (aggregate core bandwidth) means traffic will eventually overwhelm the capacity – “when” cannot be directly deduced from aggregate observations, but if you add this fact • Nominal average load on busiest backbone paths in June 2006 was ~1.5 Gb/s - In 2010 average load will be ~15 Gbps based on current trends and 150 Gb/s in 2014 My (wej) guess is that capacity problems will start to occur by 2015-16 without new technology approaches

33 Issues for the Future Network • The “casual” increases in overall network capacity based on straightforward commercial channel capacity that have sufficed in the past are less likely to easily meet future needs due to the (potential) un-affordability of the hardware – the few existing examples of >10G/s interfaces are ~10x more expensive than the 10G interfaces (~$500K each – not practical)

34 Where Do We Go From Here? • The Internet2-ESnet partnership optical network is build on dedicated fiber and optical equipment

– The current optical network is configured with 10 × 10G waves / fiber path and more waves will be added in groups of 10 up to 80 waves • The current wave transport topology is essentially static or only manually configured - our current network infrastructure of routers and switches assumes this • With completely flexible traffic management extending down to the optical transport level we should be able to extend the life of the current infrastructure by moving significant parts of the capacity to the specific routes where it is needed ¾ We must integrate the optical transport with the “network” and provide for dynamism / route flexibility at the optical level in order to make optimum use of the available capacity

35 Internet2 and ESnet Optical Node in the Future

ESnet ESnet IP core Internet2 IP ESnet core metro-area M320 T640 networks SDN core grooming switch Ciena device CoreDirector support devices: •measurement •out-of-band access •monitoring •security optical interface to support devices: •measurement dynamically R&E •out-of-band access ESnet allocated and Regional •monitoring SDN core routed waves Nets •…….

ESnet and Internet2 managed control plane for dynamic wave management

fiber west fiber east

Infinera DTN Internet2/Infinera/Level3 National Optical Infrastructure

fiber north/south 36 IV. Research and Development Needed to Secure the Future • In order for “R&D” to be useful to ESnet it must be “directed R&D” – that is, R&D that has ESnet as a partner so that the result is deployable in the production network where there are many constraints arising out of operational requirements – Typical undirected R&D either produces interesting results that are un-deployable in a production network or that have to be reimplemented in order to be deployable

37 Research and Development Needed to Secure the Future: Approach to R&D • Partnership R&D is a successful “directed R&D” approach that is used with ESnet’s OSCARS virtual circuit system that provides bandwidth reservations and integrated layer 2/3 network management – OSCARS is a partnership between ESnet, Internet2 (university network), USC/ISI, and several European network organizations – because of this it has been successfully deployed in several large R&E networks Research and Development Needed to Secure the Future: Approach to R&D • OSCARS … • DOE has recently informed ESnet that funding for OSCARS R&D will end with this year – presumably because their assessment is that research is “done” and the R&D program will not fund development • This is a persistent problem in the DOE R&D programs and is clearly described in the ASCAC networking report [ASCAC, Stechel and Wing] “In particular, ASCR needs to establish processes to review networking research results, as well as to select and fund promising capabilities for further development, with the express intent to accelerate the availability of new capabilities for the science community. Research and Development Needed to Secure the Future: Example Needed R&D • To best utilize the total available capacity we must integrate the optical (L1) transport with the “network” (L2 and L3) and provide for dynamism / route flexibility at all layers – The L1 control plane manager approach currently being considered is based on an extended version of the OSCARS dynamic circuit manager – but a good deal of R&D is needed for the integrated L1/2/3 dynamic route management – For this – or any such new approach to routing – to be successfully (and safely) introduced into the production network it will first have to be developed and extensively tested in a testbed that has characteristics (e.g. topology and hardware) very similar to the production network Research and Development Needed to Secure the Future: Example Needed R&D • It is becoming apparent that another aspect of the most effective utilization of the network requires the ability to transparently direct routed IP traffic onto SDN – There are only ideas in this area at the moment Research and Development Needed to Secure the Future • End-to-end monitoring as a service: Provide useful, comprehensive, and meaningful information on the state of end-to-end paths, or potential paths, to the user – – perfSONAR, and associated tools, provide real time information in a form that is useful to the user (via appropriate abstractions) and that is delivered through standard interfaces that can be incorporated in to SOA type applications (See [E2EMON] and [TrViz].) – Techniques need to be developed to: 1) Use “standardized” network topology from all of the networks involved in a path to give the user an appropriate view of the path 2) Monitoring for virtual circuits based on the different VC approaches of the various R&E nets • e.g. MPLS in ESnet, VLANs, TDM/grooming devices (e.g. Ciena Core Directors), etc., and then integrate this into a perfSONAR framework

42 Research and Development Needed to Secure the Future: Data Transfer Issues Other than HEP and an Approach • Assistance and services are needed for smaller user communities that have significant difficulties using the network for bulk data transfer • This issue cuts across several SC Science Offices • These problems MUST be solved if scientists are to effectively analyze the data sets produced by petascale machines • Consider some case studies …..

43 Data Transfer Problems – Light Source Case Study • Light sources (ALS, APS, NSLS, etc) serve many thousands of users – Typical user is one scientist plus a few grad students – 2-3 days of beam time per year – Take data, then go home and analyze data – Data set size up to 1TB, typically 0.5TB • Widespread frustration with network-based data transfer among light source users – WAN transfer tools not installed – Systems not tuned – Lack of available expertise for fixing these problems – Network problems at the “other end” – typically a small part of a university network • Users copy data to portable hard drives or burn stacks of DVDs today, but data set sizes will probably exceed hard disk sizes in the near future 44 Data Transfer Problems – Combustion Case Study • Combustion simulations generate large data sets • User awarded INCITE allocation at NERSC, 10TB data set generated • INCITE allocation awarded at ORNL Æ need to move data set from NERSC to ORNL • Persistent data transfer problems – Lack of common toolset – Unreliable transfers, low performance – Data moved, but it took almost two weeks of babysitting the transfer

45 Data Transfer Problems – Fusion Case Study • Large-scale fusion simulations (e.g. GTC) are run at both NERSC and ORNL • Users wish to move data sets between supercomputer centers • Data transfer performance is low, workflow software unavailable or unreliable • Data must be moved between systems at both NERSC and ORNL – Move data from storage to WAN transfer resource – Transfer data to other supercomputer center – Move data to storage or onto computational platform

46 Proper Configuration of End Systems is Essential • Persistent performance problems exist throughout the DOE Office of Science – Existing tools and technologies (e.g. TCP tuning, GridFTP) are not deployed on end systems or are inconsistently deployed across major resources – Performance problems impede productivity – Unreliable data transfers soak up scientists’ time (must babysit transfers) • Default system configuration is inadequate – Most system administrators don’t know how to properly configure a computer for WAN data transfer – System administrators typically don’t know where to look for the right information – Scientists and system administrators typically don’t know that WAN data transfer can be high performance, so they don’t ask for help – WAN transfer performance is often not a system administration priority 47 High Performance WAN Data Transfer is Possible • Tools and technologies for high performance WAN data transfer exist today – TCP tuning documentation exists – Tools such as GridFTP are available and are used by sophisticated users – DOE has made significant contribution to these tools over the years • Sophisticated users and programs are able to get high performance – User groups with the size and resources to “do it themselves” get good performance (e.g. HEP, NP) – Smaller groups do not have the internal staff and expertise to manage their own data transfer infrastructures, and so get low performance • The WAN is the same in the high and low performance cases but the end system configurations are different

48 Data Transfer Issues Other than HEP and an Approach • DOE/SC should task one entity with development, support and advocacy for WAN data transfer software – Support (at the moment GridFTP has no long-term funding) – Port to new architectures – we need these tools to work on petascale machines and next-generation data transfer hosts – Usability – scientific productivity must be the goal of these tools, so they must be made user-friendly so scientists can be scientists instead of working on data transfers – Consistent deployment – all major DOE facilities must deploy a common, interoperable, reliable data transfer toolkit (NERSC, ORNL, light sources, nanocenters, etc) – Workflow engines, GridFTP and other file movers, test infrastructure • These problems MUST be solved if scientists are to effectively analyze the data sets produced by petascale machines

49 Research and Development Needed to Secure the Future • Artificial (network device based) reduction of end-to- end latency as seen by the user application is needed in order to allow small, unspecialized systems (e.g. a Windows laptop) do “large” data transfers with good throughput over national and international distances – There are several approaches possible here and R&D is needed to determine the “right” direction – The answer to this may be dominated by deployment issues that are sort of outside ESnet’s realm – for example deploying data movement “accelerator” systems at user facilities such as the Light Sources and Nanotechnology Centers

50 New in ESnet – Advanced Technologies Group / Coordinator

• Up to this point individual ESnet engineers have worked in their “spare” time to do the R&D, or to evaluate R&D done by others, and coordinate the implementation and/or introduction of the new services into the production network environment – and they will continue to do so • In addition to this – looking to the future – ESnet has implemented a more formal approach to investigating and coordinating the R&D for the new services needed by science – An ESnet Advanced Technologies Group / Coordinator has been established with a twofold purpose: 1) To provide a unified view to the world of the several engineering development projects that are on-going in ESnet in order to publicize a coherent catalogue of advanced development work going on in ESnet. 2) To develop a portfolio of exploratory new projects, some involving technology developed by others, and some of which will be developed within the context of ESnet. • A highly qualified Advanced Technologies lead – Brian Tierney – has been hired and funded from current ESnet operational funding, and by next year a second staff person will be added. Beyond this, growth of the effort will be driven by new funding obtained specifically for that purpose. 51 Needed in ESnet – Science User Advocate • A position within ESnet to act as a direct advocate for the needs and capabilities of the major SC science users of ESnet – At the moment ESnet receives new service requests and requirements in a timely way, but no one acts as an active advocate to represent the user's point of view once ESnet gets the requests – Also, the User Advocate can suggest changes and enhancements to services that the Advocate sees are needed to assist the science community even if the community does not make this connection on their own

52 ¾Summary • Transition to ESnet4 is going smoothly – New network services to support large-scale science are progressing – OSCARS virtual circuit service is being used, and the service functionality is adapting to unforeseen user needs – Measurement infrastructure is rapidly becoming widely enough deployed to be very useful • Revaluation of the 5 yr strategy indicates that the future will not be qualitatively the same as the past – and this must be addressed – R&D, testbeds, planning, new strategy, etc. • New ESC hardware and service contract are working well – Plans to deploy replicate service are delayed to early CY 2009 • Federated trust - PKI policy and Certification Authorities – Service continues to pick up users at a pretty steady rate – Maturing of service - and PKI use in the science community generally

53 References

[OSCARS] For more information contact Chin Guok ([email protected]). Also see http://www.es.net/oscars [Workshops] see http://www.es.net/hypertext/requirements.html [LHC/CMS] http://cmsdoc.cern.ch/cms/aprom/phedex/prod/Activity::RatePlots?view=global [ICFA SCIC] “Networking for High Energy Physics.” International Committee for Future Accelerators (ICFA), Standing Committee on Inter-Regional Connectivity (SCIC), Professor Harvey Newman, Caltech, Chairperson. http://monalisa.caltech.edu:8080/Slides/ICFASCIC2007/ [E2EMON] Geant2 E2E Monitoring System –developed and operated by JRA4/WI3, with implementation done at DFN http://cnmdev.lrz-muenchen.de/e2e/html/G2_E2E_index.html http://cnmdev.lrz-muenchen.de/e2e/lhc/G2_E2E_index.html [TrViz] ESnet PerfSONAR Traceroute Visualizer https://performance.es.net/cgi-bin/level0/perfsonar-trace.cgi [ASCAC] “Data Communications Needs: Advancing the Frontiers of Science Through Advanced Networks and Networking Research” An ASCAC Report: Ellen Stechel, Chair, Bill Wing, Co-Chair, February 2008 54