ENERGY SCIENCES NETWORK ESnet: Advanced NETWORKING for SCIENCE

Researchers around the world using advanced computing for scientific discovery are connected via the DOE-operated (ESnet). By providing a reliable, high-performance communications infrastructure, ESnet facilitates the large-scale, collaborative science endeavors fundamental to Office of Science missions.

Energy Sciences Network tive science. These include: sharing of massive In many ways, the dramatic achievements of 21st amounts of data, supporting thousands of collab- century scientific discovery—often involving orators worldwide, distributed data processing enormous data handling and remote collabora- and data management, distributed simulation, tion requirements—have been made possible by visualization, and computational steering, and accompanying accomplishments in high-per- collaboration with the U.S. and international formance networking. As increasingly advanced research and education (R&E) communities. supercomputers and experimental research facil- To ensure that ESnet continues to meet the ities have provided researchers with powerful requirements of the major science disciplines a tools with unprecedented capabilities, advance- new approach and a new architecture are being ments in networks connecting scientists to these developed. This new architecture includes ele- tools have made these research facilities available ments supporting multiple, high-speed national to broader communities and helped build greater backbones with different characteristics—redun- collaboration within these communities. The dancy, quality of service, and circuit oriented DOE Office of Science (SC) operates the Energy services—all the while allowing interoperation of Sciences Network (ESnet). Established in 1985, these elements with the other major national and ESnet currently connects tens of thousands of international networks supporting science. researchers at 27 major DOE research facilities to universities and other research institutions in the Evolving Science Environments To ensure that ESnet U.S. and around the world (figure 1; sidebar Large-scale collaborative science—big facilities, continues to meet the “What is ESnet?,” p50). massive amounts of data, and thousands of col- requirements of the major ESnet’s mission is to provide an interoperable, laborators—is a key element of the DOE SC scien- science disciplines a new effective, reliable, high-performance network tific mission. The science community that approach and a new communications infrastructure, along with participates in DOE’s large collaborations and architecture are being selected leading-edge Grid-related and collabora- facilities is almost equally divided between labo- developed. tion services in support of large-scale, collabora- ratories and universities, and also has a significant

48 S CIDAC REVIEW S UMMER 2007 WWW. SCIDACREVIEW. ORG

A. T OVEY ESnet–Connecting DOE Labs to the World of Science

Canada Asia-Pacific (CANARIE) Russia and China (CANARIE) (Sinet/Singaren Asia-Pacific (GLORIAD) CERN TWAREN) (ASCC/Kreonet2/Sinet/ (USLHCNet) TWAREN/Singaren)

Seattle

Asia-Pacific Europe (Sinet) (GÉANT) Boise Australia Boston (AARnet) Chicago Europe (GÉANT) Cleveland Sunnyvale Asia-Pacific Denver (ASCC/Sinet/ Kansas City Washington WIDE) ORNL San Diego Los Angeles Albuquerque

Australia Atlanta (AARnet) South America (AMPATH) Jacksonville

Latin America Houston (CUDI/CLARA)

Production IP Core (10 Gbps) MANs (20-30 Gbps) SDN Hubs Future Hubs or Backbone Loops for Site Access Science Data Network (SDN) IP Core Core (20-30-50 Gbps) Major International Connections

Figure 1. ESnet connects the DOE laboratories to collaborators at research and education institutions around the world. international component. This component con- to analysis, simulation, and storage facilities. sists of very large international facilities and inter- High network reliability is required for intercon- national teams of collaborators participating in necting components of distributed large-scale U.S.-based experiments, often requiring the move- science computing and data systems and to sup- ment of massive amounts of data between the lab- port various modes of remote instrument oper- oratories and these international facilities and ation. New network services are needed to collaborators. Distributed computing and storage provide bandwidth guarantees for data transfer systems for data analysis, simulations, and instru- deadlines, remote data analysis, real-time inter- ment operation are becoming common; for data action with instruments, and coupled computa- analysis in particular, Grid-style distributed sys- tional simulations. tems predominate. This science environment is very different from Requirements from Data, Instruments, Facilities, that of a few years ago and places substantial new and Science Practice demands on the network. High-speed, highly There are some 20 major instruments and facili- reliable connectivity between laboratories and ties currently operated or being built by the DOE U.S. and international R&E institutions (sidebar SC, in addition to the Large Hadron Collider “U. S. and International Science,” p51) is required (LHC) at CERN (figure 2, p50) and the ITER fusion to support the inherently collaborative, global research project in France (figure 4, p51). To date, nature of large-scale science. Increased capac- ESnet has characterized 14 of these for their ity is needed to accommodate a large and steadily future networking requirements. DOE facilities— increasing amount of data that must traverse the such as the Relativistic Heavy Ion Collider (RHIC) network to get from instruments to scientists and at BNL, the Spallation Neutron Source at ORNL,

S CIDAC REVIEW S UMMER 2007 WWW. SCIDACREVIEW. ORG 49

ENERGY SCIENCES NETWORK

What Is ESnet?

As the primary DOE network providing single largest supporter of basic research in laboratory-to-laboratory and university-to- production Internet service to almost all of the physical sciences in the , the university link. The keys to ensuring this are DOE’s laboratories and other sites, ESnet DOE SC supports 25,500 Ph.D.s, postdoctoral engineering, operations, and constant serves an estimated 50,000 to100,000 DOE and graduate students, and 21,500 users of monitoring. ESnet has worked with the users, as well as more than 18,000 non-DOE facilities, half of whom are at universities. and the international R&E researchers from universities, other The goal is to provide the same level of community to establish a suite of monitoring government agencies, and private industry, service and connectivity from a DOE tools that can be used to provide a full mesh that use SC facilities. The connections with laboratory to U.S. and European R&E of paths that continuously check all major university researchers are critical. As the institutions as the connectivity between any interconnection points. CERN

Figure 2. An aerial view of CERN. The Large Hadron Collider (LHC) ring is 27 km in circumference and provides two counter-rotating, 7 TeV proton beams to collide in the middle of the detectors.

the National Energy Research Scientific Comput- geographic footprint are determined by the loca- ing (NERSC) Center at LBNL, and the Leadership tion of the instruments and facilities, and the Computing Centers at ORNL and ANL, as well as locations of the associated collaborative commu- the LHC at CERN—are typical of the hardware nity, including remote and/or distributed com- infrastructure for science in the 21st century. puting and storage used in the analysis systems. These facilities generate four types of network These locations also establish requirements for requirements: bandwidth, connectivity and geo- connectivity to the network infrastructure that graphic footprint, reliability, and network serv- supports the collaborators. Reliability require- ices. Bandwidth needs are determined by the ments are driven by how closely coupled the quantity of data produced and the need to move facility is with remote resources. For example, the data for remote analysis. Connectivity and off-line data analysis—where an experiment

50 S CIDAC REVIEW S UMMER 2007 WWW. SCIDACREVIEW. ORG

U.S. and International Science A. T

ESnet is a network in the OVEY traditional sense of the word, connecting end user sites to various other networks. A large fraction of all the national data traffic supporting U.S. science is carried by three networks—ESnet, Internet2, and National Lambda Rail. These three entities represent the architectural scope of science-oriented networks. Internet2 is a backbone network connecting U.S. regional networks to each other and to international networks. National Lambda Rail is a collection of light paths, or lambda channels, that are used to construct specialized R&E networks.

Figure 3. The large-scale data flows in ESnet reflect the scope of DOE SC collaborations. ESnet’s top 100 data flows generate 50% of all ESnet traffic (ESnet handles about 3 x109 flows/month) Of the top 100 flows, 91 are from the DOE Labs (not shown) to other R&E institutions (shown on the map).

runs and generates data, and the data is analyzed ITER after the fact—may be tolerant of some level of network outages. On the other hand, when remote operation or analysis must occur within the operating cycle time of an experiment, or when other critical components depend on the connection, then very little network downtime is acceptable. The reliability issue is critical and drives much of the design of the network. In the past, networks typically provided a single net- work service—best-effort delivery of data pack- ets—on which are built all of today’s higher-level applications (FTP, email, Web, and socket libraries for application-to-application commu- nication), and best-effort Internet Protocol (IP) multicast (where a single outgoing packet is, sometimes unreliably, delivered to multiple receivers). In considering future uses of the net- work by the science community, several other network services have been identified as require- ments, including bandwidth guarantees, traffic isolation, and reliable multicast. Bandwidth guarantees (sidebar “OSCARS: Guaranteed Bandwidth Service,” p52) are typically needed for online analysis, which always involves time constraints. Another type of application Figure 4. An illustration of the ITER project tokamak—a device used in fusion energy requiring bandwidth guarantees is distributed research.

S CIDAC REVIEW S UMMER 2007 WWW. SCIDACREVIEW. ORG 51

ENERGY SCIENCES NETWORK

OSCARS: Guaranteed Bandwidth Service

DOE has funded the On-demand Secure establish a schedulable, guaranteed records; a Bandwidth Scheduler Subsystem Circuits and Advance Reservation System bandwidth circuit service within the boundary (BSS) that tracks reservations and maps the (OSCARS) project to develop and deploy the of the ESnet network. The second realm is state of the network (present and future) for various technologies that provide dynamically interdomain, to provide end-to-end services managing the allocation of virtual circuits; provisioned circuits and various qualities of between DOE laboratories and U.S. and and a Path Setup Subsystem (PSS) that sets service that can be integrated into a European universities. up and tears down the on-demand virtual production net environment. Such circuits are The OSCARS architecture is based on circuits. called “virtual circuits” because they are Web/Grid Services and consists of modules OSCARS, and similar systems in the other defined in software and thus are mutable, as for: user/application request and claiming; networks, is well into development and opposed to hardware established circuits. the Authentication, Authorization, and prototype deployment. A number of OSCARS There are two realms in which OSCARS Auditing Subsystem (AAAS) to handle access circuits are currently being tested between must operate. The first is intradomain, to control, enforce policy, and generate usage DOE labs and European R&E institutions.

From the analysis of workflow systems, such as those used by high nential growth of the total traffic handled by historical traffic patterns, energy physics data analysis. The inability of one ESnet (figure 5 and figure 6). This traffic trend several clear trends element, such as a computer, in the workflow represents a 10–fold increase every 47 months emerge that result in requirements for the system to adequately communicate data to on average since 1990 (figure 6). ESnet traffic just evolution of the network. another will ripple through the entire workflow passed the one petabyte per month level with environment, slowing down other participating about 1.5 Gb/s average, steady-state load on the systems as they wait for required intermediate New York–Chicago–San Francisco path. If this results, thus reducing the overall effectiveness of trend continues—and all indications are that it the entire system. will accelerate—the network must be provi- Traffic isolation is required because today’s pri- sioned to handle an average of 15 Gb/s in four mary transport mechanism, Transmission Con- years. This implies a minimum backbone band- trol Protocol (TCP), is not ideal for transporting width of 20 Gb/s, because the network peak large amounts of data across large distances, such capacity must be at least 40% higher than the as between continents. While other protocols are average load in order for today’s protocols to better suited to this task, these are not compati- function properly with bursts of traffic, which ble with the fair-sharing of TCP transport in a is normal and expected. In addition, the current best-effort network, and are thus typically penal- traffic trend suggests that 200 Gb/s of core net- ized by the network in ways that reduce their work bandwidth will be required in eight years. effectiveness. A service that can isolate the bulk This can only be achieved within a reasonable data transport protocols from best-effort traffic budget by using a network architecture and is needed to address this problem. implementation approach that allows for cost- Reliable multicast is a service that, while not effective scaling of hub-to-hub circuit band- entirely new, must be enhanced to increase its width. effectiveness. Multicast provides for delivering a The second major change in traffic is the result single data stream to multiple destinations with- of a dramatic increase in the use of parallel file out having to replicate the entire stream at the mover applications, such as GridFTP. This has source, as is the case, for example, when using a led to the most profound change in traffic pat- separate TCP-based connection from the source terns in the 21-year history of ESnet. Histori- to each receiver. This is important when the data cally, the peak system-to-system, or “workflow,” to be delivered to multiple sites are too volumi- bandwidth of the largest network users has nous to be replicated at the source and sent to increased along with the increases in total net- each receiving site individually. Today, IP multi- work traffic. But over the past 18 months, the cast provides this capability, but in a fragile and peak bandwidth of the largest user systems has limited manner. been coming down and the number of flows that they generate has been going up, while the total Past Traffic Patterns Drive Future Plans traffic continues to increase exponentially. This From the analysis of historical traffic patterns, reduction in peak workflow bandwidth, several clear trends emerge that result in require- together with an overall increase in bandwidth, ments for the evolution of the network. results from the decomposition of single large The first and most obvious pattern is the expo- flows into many smaller parallel flows. In other

52 S CIDAC REVIEW S UMMER 2007 WWW. SCIDACREVIEW. ORG

A. T words, the same types of changes that happened 1,400 OVEY in computational algorithms as parallel comput- ing systems became prevalent are now happen- 1,200 ing in data movement—that is, parallel 1,000 input/output (I/O) channels operating across the h 800 network. The third clear traffic trend is that over two 600 years the impact of the top few hundred work- 400 flows has grown from negligible (before mid- Terabytes/Mont 2004) to more than 50% of all traffic in ESnet by 200 mid-2006! This is illustrated in figure 5, where the 0 top part of the traffic bars shows the portion of Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul 2000 2001 2002 2003 2004 2005 2006 the total generated by the top 100 hosts. The fourth significant pattern comes from look- Figure 5. Total ESnet traffic by month, 2000–2006. The segmented bars (from mid- ing at the sources and destinations of the top data 2004 and forward) show that fraction of the total traffic in the top 1,000 data flows, transfer systems, an examination that shows two generated from large-scale science facilities. There are typically several billion flows per things. First, the vast majority of the transfers can month in total, most of which are minuscule compared to the top 1,000 flows. easily be identified as science traffic since the A. T

transfers are between two scientific institutions OVEY April 2006 with systems that are named in ways that reflect 10,000 1 PBy/mo. the name of the science group. Second, for the past November 2001 100 TBy/mo. 53 months several years the majority of the large data trans- 1,000 2 July 1998 R = 0.9898 fers have been between institutions in the U.S., 10 TBy/mo. 40 months Europe, and Japan, reflecting the strongly interna- h 100 October 1993 tional character of large science collaborations 1 TBy/mo. 57 months organized around large scientific instruments. 10 August 1990 100 MBy/mo. Enabling Future Science 1.0 38 months Terabytes/Mont Based both on the projections of the science pro- grams and the changes in observed network traf- 0.1 fic and patterns over the past few years, it is clear 0.0 that the network must evolve substantially in 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 order to meet the needs of the DOE SC mission. 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 1 1 1 1 2 1 1 1 1 2 2 2 2 2 y 1 y 1 y 2 The current trend in traffic patterns—the large- ry ry ry ry ary ary ary ary ary ary ary ary scale science projects giving rise to the top 100 uary uary uar uar uar data flows that represent about one half of all net- Janu Janua Jan Jan Janua Jan Janua Janua Jan Janu Janu Jan Janu Janu Janu Janu Janu work traffic—will continue to evolve. As the LHC Figure 6. A log plot of ESnet traffic since 1990. experiments ramp up in 2006 and 2007, the data to the Tier-1 centers, Fermi National Accelerator This evolution in traffic patterns and volume Based both on the Laboratory (FNAL) and BNL, will increase will result in the top 100 to 1,000 flows account- projections of the science between 200- and 2,000-fold. A comparable ing for a very large fraction of the traffic in the programs and the amount of data will flow out of the Tier-1 centers network, even as total ESnet traffic volume grows. changes in observed network traffic and to the Tier-2 centers (U.S. universities) for analy- This means that the large-scale science data flows patterns over the past few sis. The DOE National Leadership Class Facility will overwhelm everything else on the network. years, it is clear that the supercomputer at ORNL anticipates a new model The current few Gb/s of average traffic on the network must evolve of computing in which simulation tasks are dis- backbone will increase to 40 Gb/s (LHC traffic; substantially in order to tributed between the central facility and a collec- sidebar “Data Analysis for the Large Hadron Col- meet the needs of the tion of remote “end stations” that will generate lider,” p54) and then increase to probably dou- DOE SC mission. substantial network traffic. As climate models ble that amount as the other science disciplines achieve the sophistication and accuracy antici- move into a collaborative production, simulation, pated in the next few years, the amount of climate and data analysis mode on a scale similar to the data that will move into and out of the NERSC LHC. This will get the backbone traffic to 100 Center will increase dramatically. Similarly, the Gb/s as predicted by the science requirements experiment facilities at the new Spallation Neu- analysis three years ago. tron Source and Magnetic Fusion Energy facili- ties will start using the network in ways that ESnet’s Evolution require fairly high bandwidth with guaranteed The architecture of the network must change to quality of service. accommodate this growth and the change in the

S CIDAC REVIEW S UMMER 2007 WWW. SCIDACREVIEW. ORG 53

ENERGY SCIENCES NETWORK

Data Analysis for the Large Hadron Collider I LLUSTRATION The major high energy physics (HEP) CERN (Event Experiments) experiments of the next 20 years will break Event Simulations : A. T new ground in our understanding of the Online System fundamental interactions, structures and ~ Pbyte/s OVEY

100 MByte/s S ~ OURCE symmetries that govern the nature of matter and space-time. Among the principal goals are : H. B. N Event Reconstruction

to find the mechanism responsible for mass in EWMAN Fermilab USA Regional Center

the Universe, and the Higgs particles , C associated with mass generation, as well as 2.5 – 40 Gb/ss ALIFORNIA the fundamental mechanism that led to the I NSTITUTE predominance of matter over antimatter in the Regional Centers France,,

observable cosmos. Germany, and Italy OF T ECHNOLOGY The largest collaborations today, such as 0.6 - 2.5 Gb/s those building experiments for CERN’s Large Institutes Hadron Collider (LHC; figures 2, 7–9) program, 0.6 – 2.5 Gb/s Tier 2 Centers each encompass some 2,000 physicists from 100 – 1,000 Mb/s 150 institutions in more than 30 countries. Physics Data Cache The current generation of operational experiments—BaBar at the Stanford Linear Accelerator Center (SLAC) and D0 and CDF at FNAL, as well as the experiments at the RHIC Workstations program at BNL—face similar challenges. Figure 7. This chart depicts an example of how information flows from HEP facilities, such as BaBar, for example, has already accumulated CERN, to researchers at centers and institutes around the world. Such HEP experiments will be datasets approaching a petabyte. collecting, cataloguing, and analyzing several petabytes of data per year by 2009. Collaborations on this global scale would not have been attempted if the physicists could not access, processing, and analysis of massive management and underlying resource access plan on high capacity networks to interconnect datasets. These datasets will increase in size and management services—to facilitate the the physics groups throughout the lifecycle of from petabytes to exabytes (1018 bytes) within management of worldwide computing and data the experiment, and to make possible the the next decade. Equally important is highly resources that must all be brought to bear on construction of Data Grids capable of providing capable middleware—the Grid data the data analysis problem of HEP.

The strategy for the next- types of traffic. The general requirements for the Meeting the Science Requirements generation ESnet is new architecture are that it provide: high-speed, The strategy for the next-generation ESnet is based on a set of scalable, and reliable production IP networking; based on a set of architectural principles that lead architectural principles connectivity for university and international col- to four major network elements and a new net- that lead to four major network elements and a laboration; highly reliable site connectivity to work service for managing large data flows. One new network service for support lab operations as well as science; global of the architectural principles involves using ring managing large data Internet connectivity; support for the high-band- topologies for path redundancy in every part of flows. width data flows of large-scale science, including the network, rather than just in the core. Another scalable, reliable and very high-speed network principle provides multiple independent connec- connectivity to DOE labs; and synamically pro- tions everywhere to guard against hardware and visioned, virtual circuits with guaranteed quality fiber failures. A third principle provisions one of service (for dedicated bandwidth and for traf- core network—the IP network—specialized for fic isolation). handling the huge number (3x109 per month) of In order to meet these requirements, the capac- small data flows (hundreds to thousands of bytes ity and connectivity of the network must increase each) of the general IP traffic. Provisioning a sec- to include fully redundant connectivity for every ond core network, the Science Data Network site, high-speed access to the core for every site (SDN), is the last architectural principle. The SDN (at least 20 Gb/s, generally, and 40–100 Gb/s for is specialized for the relatively small number some sites) and a 100 Gb/s national core/back- (hundreds to thousands) of massive data flows bone bandwidth by 2008 in two independent (gigabytes to terabytes each) of large-scale sci- backbones. ence, which by volume already accounts for 50%

54 S CIDAC REVIEW S UMMER 2007 WWW. SCIDACREVIEW. ORG

CERN

Figure 8. Several images from the CERN high energy physics facility. At the upper left, a view of the LHC tunnel with a worker inside; at the upper right, detail of the sensor from the first half tracker inner barrel (TIB); the two lower panels are views of the ATLAS Experiment detector. Physicists depend on ESnet to transport data from HEP experiments to researchers around the world. CERN

Figure 9. A simulated event of the collision of two protons in the ATLAS experiment viewed along the beam pipe. The colors of the tracks emanating from the center show the different types of particles emerging from the collision.

S CIDAC REVIEW S UMMER 2007 WWW. SCIDACREVIEW. ORG 55

ENERGY SCIENCES NETWORK

“We must build the second generation of the Internet so that our Science Services leading universities and national laboratories can Authentication and Trust Federation sharing and its policies are driven entirely by the communicate in speeds Cross-site identity authentication and identity science communities that it serves. That is, the trust 1,000 times faster than today, to develop new federation is critical for distributed, collaborative requirements of the science communities are formally medical treatments, new science in order to enable routine sharing of negotiated and encoded in the Certification Policy sources of energy, new computing and data resources, as well as other Grid and Certification Practice Statement of the ways of working services. ESnet provides a comprehensive service to Certification Authorities. together ...” support secure authentication. Managing cross-site trust agreements among many organizations is crucial Voice, Video, and Data P RESIDENT W ILLIAM J. for authorization in collaborative environments. ESnet Tele-Collaboration Service C LINTON 1997 State of the Union Address assists in negotiating and managing the cross-site, The human communication aspect of collaboration, cross-organization, and international trust especially in geographically dispersed scientific relationships to provide policies that are tailored to collaborations, represents an important and highly collaborative science. successful ESnet Science Service. This service provides audio, video, and data teleconferencing with Public Key Infrastructure the central scheduling essential for global Grid computing and data analysis systems rely on collaborations. The ESnet collaboration service Public Key Infrastructure (PKI) for their security. supports more than a thousand DOE researchers and ESnet provides PKI and X.509 identity certificates collaborators worldwide with video conferences, audio that are the basis of secure, cross-site authentication conferencing, and data conferencing. Web-based, of people and Grid systems. The ESnet PKI is focused automated registration and scheduling is provided for entirely on enabling science community resource all of these services.

of all ESnet traffic and will completely dominate duction IP and large-scale science traffic, and mul- it in the near future. tiple connections between the SDN core, the IP These architecture principles lead to four major core, and the sites. The fourth element is com- elements for building the new network. posed of loops off the core rings to provide for The first element is a high-reliability IP core net- dual connections to remote sites where MANs are work based on high-speed, highly capable IP not practical. routers to support internet access for both science These elements are structured to provide a net- and lab operational traffic and some backup for work with fully redundant paths for all of the SC the science data carried by SDN, science collabo- Labs. The IP and SDN cores are independent of ration services, and peering with all of the net- each other and both are ring-structured for works needed for reliable access to the global resiliency. These two national cores are intercon- Internet. The second element involves an SDN nected at several locations with ring-structured core network based on layer 2 (Ethernet) and/or MANs that also incorporate the DOE labs into the layer 1 (optical) switches for: multiple 10 Gb/s cir- ring. This will eliminate all single points of failure cuits with a rich topology for very high total except where multiple fibers may be in the same bandwidth to support large-scale science traffic conduit, as is frequently the case between metro- and for the redundancy needed to ensure high politan area points of presence and the physical reliability; dynamically provisioned guaranteed sites. In the places where metropolitan rings are bandwidth circuits to manage large, high-speed not practical, such as for the geographically iso- science data flows; dynamic sharing of some opti- lated labs, resiliency is obtained with dual con- cal paths with the R&E community for managing nections to one of the core rings. peak traffic situations and for providing special- ized services such as all-optical, end-to-end paths Services Supporting Science for uses that do not yet have encapsulation inter- The evolution of ESnet is being guided by the faces (such as InfiniBand); and an alternate path results of several studies to determine the key for production IP traffic. The third element, Met- requirements of the DOE research community. ropolitan Area Network (MAN) rings, connect These studies identified various middleware serv- labs to the core(s) to provide more reliable (ring) ices that, in addition to the network and its serv- and higher bandwidth (multiple 10 Gb/s circuits) ices, need to be in place in order to provide an site-to-core connectivity, support for both pro- effective distributed science environment.

56 S CIDAC REVIEW S UMMER 2007 WWW. SCIDACREVIEW. ORG

History of ESnet

ESnet has been supporting the scientific then Director of Energy Research, ESnet was initially constructed to link U.S. missions of DOE for nearly 20 years. ESnet recommended that HEPnet and MFEnet centers of Energy Research, but then also was originally developed to serve the needs of initiatives be combined in order to provide a began providing international connectivity in the HEP and Magnetic Fusion Energy (MFE) more functional and efficient network to serve collaboration with the NSF, NASA, and the research communities, which had been the Energy Research community, in general. Defense Advanced Research Projects Agency dependent upon their own networks, HEPnet The Energy Sciences Network Steering (DARPA). Now an integral part of the Internet, and MFEnet, respectively. In the mid-1980s, Committee was formed, and an evolutionary ESnet is still expanding, currently with the HEP and MFE researchers recognized the need model—endorsing a phased approach aimed upgrade to ESnet4. More details about the for substantially improved computer network at long-term goals—for the development of the history of ESnet are available online, at: facilities. In response, Dr. Alvin Trivelpiece, new network was proposed. http://www.es.net/hypertext/esnet-history.html

These services are called “science services,” and fusion energy and high energy physics networking are simply services that support the practice of initiatives to lay the foundation for ESnet. science. Examples include trust management for Today, as an integral part of the DOE SC, ESnet collaborative science, cross-site trust policies provides seamless, multiprotocol connectivity negotiation, long-term key and proxy credential among a variety of scientific facilities and com- management, human collaboration communica- puting resources in support of collaborative tion, end-to-end monitoring for Grid/distributed research, both nationwide and internationally. application debugging and tuning, and persistent High-performance computing has now become hierarchy roots for metadata and knowledge a critical tool for scientific and engineering management systems. research. In many fields of research, computa- Because of its established characteristics, ESnet tional science and engineering have become as is a natural provider for a number of these serv- important as the more traditional methods of the- ices. For example, ESnet is trusted, persistent, and ory and experiment. Additionally, the construc- has a large (nearly comprehensive within DOE) tion of large experimental facilities used by user base. ESnet also has the facilities to provide international collaborations has driven require- reliable access and high availability of services ments for large-scale data transfer, often to mul- through assured network access to replicated tiple sites at the same time. Progress and services at geographically diverse locations. productivity in such fields depend on interactions However, given the small staff of an organiza- between people and machines located at widely tion like ESnet, a constraint on the scope of such dispersed sites, interactions that can only occur services is that they must be scalable in the sense rapidly enough via high-performance computer that, as the service user base grows, ESnet inter- networks. The ubiquity of networks has provided action with the users does not grow. researchers with unexpected capabilities and There are three such services that ESnet pro- unique opportunities for collaborations. vides to the DOE and/or its collaborators (sidebar These benefits have only whetted the scientific “Science Services”). Federated trust is policy community’s appetite for still higher levels of net- established by the international science collabo- work performance to support wider network ration community to meet its needs. Public Key usage, the transmission of ever-greater volumes Infrastructure certificates provide remote, multi- of information at faster rates, and the use of more institutional identity authentication. Human col- sophisticated applications. The scientific commu- laboration services involve technologies such as nity is also increasingly sensitive to the impor- video, audio, and data conferencing. tance of securely protecting privacy and intellectual property. The mission of ESnet is to Today, as an integral Conclusions satisfy these needs as fully as possible for DOE part of the DOE SC, ESnet provides ESnet can trace its origins to a dialup modem serv- researchers and, with its new architecture, ESnet seamless, multiprotocol ice provided to users of the Magnetic Fusion Energy is working hard to meet these needs for years to connectivity among a Computer Center, known today as the NERSC Cen- come. G variety of scientific ter. Over the years, remote access terminals facilities and computing Contributor: Dr. William E. Johnston is the ESnet replaced the dialups, and leased telephone lines resources in support of were deployed and then replaced with satellite con- Department Head and Senior Scientist at LBNL collaborative research, nections. In 1985, DOE responded to the growing Further Reading both nationwide and demand for networking by combining separate http://www.es.net/ internationally.

S CIDAC REVIEW S UMMER 2007 WWW. SCIDACREVIEW. ORG 57