A common connection: CMS (LHC@CERN) Computing

Kajari Mazumdar Department of High Energy Physics Tata Institute of Fundamental Research Mumbai

Accelerating International Collaboration and Science through Connective Computation

University of Chicago Centre, Delhi March 10, 2015 Introduction • Raison de etre of LHC: discover/rule out existence of Higgs boson particle (if it existed any time in nature?) • Yes, indeed! about 1 pico-second after the Big Bang.

• Today, after 13.7 billion years later, CERN-LHC recreates the condition of the very early universe.

• Higgs boson discovered within 3 years of start of data taking at LHC.

• Experiments study the aftermath of violent collisions of protons/ions -- using very complicated, mammoth detectors -- with ~ 1 million electronic channels taking data every 25 nano-sec.  digital summary of information recorded as collision event.

• 300 publications per experiment with data collected in Run1 (~4 years)

• LHC Computing Grid is the backbone of the success of LHC project. March 10, 2015 K. Mazumdar 2 CERN-LHC 2009-2035: Physics exploitation  Nobel prize in 2013, …..? Higgs boson discovery at CERN-LHC in 2012

March 10, 2015 K. Mazumdar 3 What happens in LHC experiment

Selection only 1 in 1013 events. we do not know which one!  need to check with all!

Enormous computing resources needed.

March 10, 2015 K. Mazumdar 4 In hard numbers: • 2 big experiments: ATLAS & CMS.

• LHC collides 6-8 hundred million protons/second for several years.

• Digital information of each collision : 1 – 10 MB

• Capacity for Information storage: ~ 1000 collisions /sec.

• Physicists must sift through ~ 20 Petabytes (1015 ) of data annually.

Higgs search: ~ looking for a particular person in a thousand world populations of today.

March 10, 2015 K. Mazumdar 5 Large Collaborative Research

• Distributed analysis/computing is essential and designed accordingly. • Made possible by sharing resources across the globe:  Crosses organisational, national and international boundaries.

• High Energy Physics (HEP) is often compute as well as data intensive. • Experiments performed by several thousand scientists over several years. • CERN-LHC project is an excellent example of large scale collaboration, dealing with huge amount of data.

• WWW was born in early 1989 to satisfy the needs of previous generation of high energy experiments carried out by scientists across the world : sharing information. Fine example of application in societal benefit from fundamental research.

Becoming necessary in various other fields of research.

March 10, 2015 K. Mazumdar 6 From Web to

World-wide LHC Computing GRID (conceptualized in ~2000)  natural evolution of internet technology in data analysis.

1. Share more than information. 2. Provide resources and services to store/serve ~ 20 PB data/year 3. Provide access to all interesting physics events to ~ 4000 collaborators/expt.

Solution through WLCG : Efficient use of resources at many institutes. • Minimize constraints due to user localisation and resource variety • Decentralize control and costs of computing infrastructure

 much faster delivery of physics! March 10, 2015 K. Mazumdar 7 LHC Computing Grid • Key for grid computing: LAN speed is comparable to the speed of computing processors.

• Success due to high speed link across countries. India also pays dividends. • Contributes to few percent level to the computing efforrts of CMS experiment

• Typical numbers in LHC for each experiment: # grid WLCG users : ~1000/week, 1M jobs/week # darasets available for analysis across grid: 50k

• Today ~ 200 sites across the world active 24X7 for LHC experiments. • ~100 PB disk

• Increasing bandwidth  fast evolution of LHC computing model in terms of management and processing of data volume  From hierarchical structure to parallel/horizontal connections. March opportunistic 10, 2015 usage of resources K. Mazumdar 8

Initial structure of LHC GRID  connecting computers across globe Several Petabytes/sec. Online data Tier 0 CERN Experimental recording computer site centre, 10 Gbps Geneva Tier 1 ASIA Germany National centres CERN USA (Taiwan) Italy France

Tier 2 1-2.5 Gbps Regional groups in India China Pakistan a continent/nation Korea Taiwan Indiacms T2_IN_TIFR BARC Delhi Panjab Univ. Different Universities, TIFR Univ. Institutes in a country

Individual scientist’s PC,laptop, ..

March 10, 2015 K. Mazumdar 9 Example: TIER-3 Center for CMS Experiment @ University of Delhi

 Caters to the needs of the High Energy Physics Group of the Physics Department of the University of Delhi

 Allows one to fetch CMS data from Tier-2 centres, process data locally, generate monte carlo etc. Kirti Ranjan Tier-3@DU: Hardware Resources Master Server A typical situation Specifications: HP DL180 G6 Processor: 2xIntel Xeon Quad Core (64 bit) Kirti Ranjan Memory : 32GB each Hard disk: 146 GB

Worker Node (Total 3 Server) Specifications:HP DL160 G6 Processor: 4xIntel Xeon Quad Core (64 bit) Memory : 16GB each Hard disk: 146 GB Specifications:HP DL160 G8 (New Node Added) Processor: 2xIntel Xeon Quad Core (64 bit) Memory : 16GB each Hard disk: 500 GB

One KVM Switch, Console and 1GB Ethernet Switch StorageDisk Space : 12 TB Storage Area 24TB Storage (New Storage Added)

Power BackUp: 2*6KV UPS Cooling : AC: 2* 2 Tons each

Behind the scene Tier 0: 1M jobs/day. Peak data-transfer rates: 10 gigabytes per second (2 full DVDs of data/s).

Users don’t bother about the source of computing  plug your computer in to the Internet, it will get the computer power you need to do the job!

The Grid infrastructure links together computing resources such as PCs, workstations, servers, storage elements, and provides the mechanism needed to access them.

The pervasive nature of grid  simply access the Grid through web browser.

The Grid is a utility: you ask for computer power or storage capacity and you get it.

Middleware is the technical "glue" that allows different computers to "stick" together.

We have now experts here offering us to make computing from India easy.

March 10, 2015 K. Mazumdar 12 Examples for application of Grid technology

WISDOM : search for a cure for malaria MammoGrid : grid for hospitals to share and analyse AstroGrid: astronomy mammograms to improve breast cancer treatment. BIRN: human disease CaBIG: cancer MathCell : grid-managed multi-purpose environment neuGRID: Neuroscience for further research in biology and bioinformatics outGRID: Neuroscience P12S2: learn more about the spread of plant diseases

Climateprediction.net: Climate research Virolab: infectious disease Compute Against Cancer: Cancer research ...@home:

FightAIDS@Home: HIV/AIDS research FusionGrid: fusion energy Folding@Home: Disease research NEES : earthquakes

LHC@home: High energy physics GridRepublic: Many different research projects GIMPS: Mathematics SETI@home: Extraterrestrial intelligence

The : Many different projects all with humanitarian aims

March 10, 2015 K. Mazumdar 13 Summary

• We, from India, are lucky to be part of LHC family.

• Grateful for the support received to place India in LHC Grid Map.

• Time for India to exert leverage from LHC experience.

• Grid computing is changing the way the world is doing science ( as well as business, entertainment, social science and more.)

• Efforts must start now to make it happen in India,  first the academic community.

• Experts are ready to help us in making the leap.  Computing aspect of your research can be made easier/bigger!

March 10, 2015 K. Mazumdar 14 Warm welcome to the participants.

Many thanks to University of Chicago Center, Delhi.

March 10, 2015 K. Mazumdar 15 Middleware projects

3G Bridge ESnet Alchemi GLOBUS BioGrid gLite Condor GRIDBUS DCache GridSphere DOE SciDAC IGE EMI

ESnet GLOBUS gLite GRIDBUS March 10, 2015 K. Mazumdar GridSphere IGE LHC Grid Map ~ 200 sites across the world active 24X7 ~ 100 PB disk, 300,000 cores

CMS and ALICE Tier2 GRID computing March 10, 2015 centersK. Mazumdar in TIFR (Mumbai) and VECC (Kolkata).17