<<

France Grilles, the national grid initiative

V. Breton Credit: L. Gydé, L. Maigne, F. Malek, T. Priol Grid paradigm: all users are equal…  Grids are about sharing computing and storage resources • Resources are distributed • Access to the resources is distributed  Access to grid resources requires • A certificate delivered by a national certificate authority • To be registered in a Virtual Organization • An account on a User Interface  All users on Earth of a Virtual Organization • access the same resources • share the same services • Only difference: network performances RENATER: the French National Research and Education Network

3 Scientific disciplines most actively using the grid in

 Complex sciences  Earth Sciences  High Energy Physics  Life Sciences  Universe sciences

Visit France-Grilles web site to see presentations at France-Grilles inaugural day: www.france-grilles.fr LHC: 4 experiments 7 TeV collisions, 1031 particles/sec/cm2

5 Complex sciences: the example of cheese refining

1,5 months of simulation done in 2 days on 1600 CPUs to define the best environment for camembert refinement

Credit: CEMAGREF - INRA A human entreprise…

 64 engineers and technicians in 23 sites in France to operate the grid • 32 FTE  Today about 1000 owners of active grid certificates

Grid cerficates emied in 2009 (%) BRGM 0,6 2,0 0,8 0,2 CEA 2,4 0,9 0,3 CEMAGREF 1,7 8,7 8,5 CNRS Ecoles d'ingénieurs Etranger INRA 74,1 INRIA INSERM Structures privées Universités production production Resources

6 Lille

431

1 61 Strasbourg Île de France (GRIF) Brest 4 2 of the grid

1 20 Nantes 54 70 130 Annecy

Clermont-Fd (TIDRA) Number of Storage (AuverGrid) cores (Toctets) 4 40 <100 <100 Bordeaux (CIGri) 100-1000 100-1000

3 1000 - 10000 26 1000-10000 6 14 79 1 Marseille > 10 000 > 10000

2 22.000 cores LCG ≈ backbone Number of certificates 8 10 27 sept Emitted in 2009 2010 15 PB of storage CC-IN2P3 ≈ marrow The second largest contributor to EGI

30 May 2010 janv-10 25 sept-09 May 2009 20 janv-09 sept-08 15 May 2008 janv-08 sept-07 10 May 2007 janv-07 5 sept-06 May 2006 0 janv-06 sept-05 juin-06 juin-07 juin-08 juin-09 juin-10 sept-05 sept-06 sept-07 sept-08 sept-09 mars-06 mars-07 mars-08 mars-09 mars-10 Dec 2005 Dec 2007 Dec 2008 Dec 2009 0 5000000 10000000 15000000 20000000 Dec 2006

Absolute contribuon (in KSI2K.hours) per Relave contribuon (in %) since month since September 2005 September 2005 A National Grid Infrastructure integrating…  Thematic grids • LCG (High Energy Physics) • Decrypthon (life sciences) • GRISBI (bioinformatics)  Regional Grids

 Virtual Organizations Resources seen through Virtual Organizations CPU me used from September 2009  Thematic VOs to August 2010 (in 1K.SI2K.Hours)

• LCG France ≈ 100 Millions Astrophysics - 20518 693573 1245551 Astroparcles 808311 KSI2K.hours since Biomedical sciences

1416762 September 2009 Computaonal Chemistry 3268230 Earth Sciences

Complex Systems CPU me used from September 2009 to August 2010 (in 1K.SI2K.Hours)

1062768 141512 223943 1243791 auvergrid  Regional or Campus 94196 521866 cppm VOs 82161 apc 1402876 ipnl 3338942 ipno 1047635 iscpif lal lpnhe La formation Contact: Virginie Dutruel [email protected]

 Publics visés: administrateurs systèmes et utilisateurs  Stratégie • Formations tournantes sur les sites • Inscription au programme de formation permanente des établissements 800 30 26 27 27 27 700 25 25 600 21 668 20 640 655 587 20 500 15 15 15 519 trainees 400 474 15 387 396 404 person.days 390 352 300 304 courses 339 281 10 229 WARNING: TUTORIALS 200 267 202 192 159 5 in FRANCE are given in 100 114

FRENCH! 0 0 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 International collaborations on the grid  Through European projects • African / Mediterranean countries

o EUMedGrid FP6 & FP7 projects • South America

o EELA FP6 and GISELA FP7 projects  International Associated Laboratories • Asia (China,Japan, Korea, Vietnam) • Grid as an infrastructure for collaborations

• Vietnam • Korea International Associated Laboratory – LIA  An LEA or an LIA is a "laboratory without walls" and is not a legal entity. It brings together at most three laboratories from CNRS and other foreign countries. These laboratories contribute human and material resources to a common, jointly-defined project designed to "add value" to their individual pursuits. An LEA/LIA agreement is for 4 years, renewable twice. The laboratories comprising an LEA or LIA retain their independence, their regular status, their director and their separate locations. An overall director of the LEA/LIA is appointed on a revolving basis if so desired. This type of collaboration does not include longterm research stays by the researchers involved in the project. An LEA/LIA receives earmarked funding from the CNRS and the partner institution, for equipment, scientific missions, associate research posts, etc. It is coordinated by a scientific management committee, which determines the research program to be submitted to the steering committee. The latter is composed of representatives of the two partner institutions as well as established scientists from outside the LEA/LIA. When to apply for LEA/LIA status? Proposals for the creation of an LEA/LIA may be filed at any time with a laboratory's scientific department. Who makes the decision to approve an LEA/LIA proposal? The decision to create an LEA/LIA is made by the CNRS and its foreign partner institution. When the proposal has been accepted, an agreement is established between the Director General of the CNRS and the supervisory board of the partner institution Priorities for 2011

 Improving the experience of grids to new communities  Improve the relationships between system administrators and users  Ease access through a national multidisciplinary VO with additional services  Integrating academic clouds to the production grid  Strengthening collaboration with experimental grid and HPC communities  Collaboration with to develop web services running on grid and HPC resources Life sciences activities on the french NGI Life sciences activities on the French NGI  Early adoption of the grid paradigm: 2001  Topics addressed: • bioinformatics (phylogenetics, proteomics) • Structural biology • In silico drug discovery • Neurosciences • Epidemiology • Medical physics • Medical imaging Life sciences applications on regional grid initiatives

AUVERGRID:~1554 cores/271TB

CIMENT: 2200 cores/290TB

GRIF: ~1500 cores/350TB

Grille Aquitaine: ~200 cores/8TB

MSFG (Montpellier Sud France Grilles): ~104 cores /10TB

Strasbourg Grand Est: ~1200 cores/550TB

Tidra: ~10000 cores/50PB Biomed community in France

LPC, LIMOS, Clermont-Ferrand LPTA, Montpellier CREATIS, BBE, IBCP, Lyon I3S, Nice IPHC, Strasbourg TIMC, INSERM 438, RMN Bioclinique, LECA, LPMMC, IN, LPSC, LSP, SIMAP : Grenoble

GRISBI: Specific infrastructure for bioinformatics DECRYPTHON: Grid to help curing Muscular dystrophy Life Sciences Grid community  Offer community-level services • e.g. Medical data management  Make grid accessible to non-specialists, deliver adapted user interfaces • GUI and Web portals  Integrate into exisng problem solving environments • Deal with legacy • Help porng exisng applicaons Examples

• bioinformatics (phylogenetics, proteomics) • Structural biology • In silico drug discovery • Neurosciences • Epidemiology • Medical physics • Medical imaging Bioinformatics: monitoring influenza virus evolution (CNRS, IBT, IFI)

Daily download

NCBI International database of influenza Local database genome sequences of influenza Grid DB genome parsing sequences New candidate drugs

Phylogenec analysis Daily updated data on virus evolution Virtual screening

22 Structural biology: recalculating protein 3D structures in PDB

 The PDB data base gathers publicly available 3D protein structures • Full of bugs  Goal: redo the structures by recalculating the diffraction patterns

PDB-files 42.752 X-ray structures 36.124 Successfully recalculated ~36.000 Improved R-free 12.500/17000 CPU time estimate 21.7 CPU years Real time estimate 1 month on Embrace VO on EGEE

R.P Joosten et al, Journal of Applied Cristallography, (2009) 42, 1-9 In silico drug discovery Docking compounds coming from biodiversity (CNRS, IFI, INPC, IOIT)

PDB database > 50.000 3D structures including biological targets for cancer, malaria, AIDS...

Question: are these products potentially active against cancer, malaria, AIDS ?

Hanoï Local DataBase of Natural chemical INPC products extracted from local biodiversity

Answer: focussed list of biological targets on which the compound is most active in silico Epidemiology Cancer surveillance network (ANR GINSENG) Medical physics GateLab (ANR VIP)

 User interface for launching Gate on distributed environments  Execution on GPUs, CPUs, clusters  Current functionalities • Parses simulation (mac) file • Finds local inputs files and copies them on PET camera the grid Radiotherapy • Submits simulation • Keeps track of simulation history • Allows to choose the Gate executable among a list of releases • Allows to split the simulation automatically in a number of jobs depending on an estimation of the total CPU time • Stop and merge (new feature) Small animal imaging Medical Imaging and Medical Image Processing MammoGrid & MammoGrid+ • Second Opinion • Cancer Screening • Educaon and Training • Reference Database / Repository

27 The Life Sciences Grid Community Support Centre

 International Support Centre for Life Science grid/cloud users • To act as a single, representative contact point for International Infrastructures Worldwide (such as EGI • To act as a single, representative contact point for the Funding Agencies (such as European Commission) and other Stakeholders in order to promote the requirements and needs of the community • To accompany users and deliver best experience • At the interface between VOs and operations  Other benefits • Reduce operational burden • Prevents community scattering in a decentralized EGI nebula  Grouping several VOs • biomed, enmr, lsgrid, vlemed • including the generic “biomed” VO that enables international collaboration and large-scale computations  Supported by several NGIs and projects • Dutch, French, German, Italian, Spanish, Swiss • EGI-Inspire, Lifewatch ESFRI  See https://dav.healthgrid.org/lsvrc/LSVRC_proposition_09-08-2010-final.pdf

28 Focus on system radiobiology Impact of radiation on living systems  Biological effects of radiation are known for more than 100 years • Radiation induced mutations contribute to biological evolution • High exposure to radiation may cause cancer  Understanding biological response to radiation exposure is needed for • low-dose radiation carcinogenesis • better cancer treatment, including less secondary effects • Evaluating the impact of very long term storage of radioactive waste on local ecosystem Virtuous cycle

 Modeling the impact of radiation on Biologists, computer scientists living cells: Geant4 DNA  Validation: need for relevant Biological observables to characterize biological systems models Chemists, • Cell survival rate physicists • DNA single or double strain breaks • Molecular biology: genomic mutations, Geant4 DNA gene expression  Experimental protocol: compare observables after controlled Experimental radiation exposure Data analysis • In normal lab conditions data • After beam irradiation (γ, e-, p, α) Biologists • Need for a reference point at zero- Biologists radiation: Modane Computer scientists Physicists Physicists Partners

 Biologists , INRA • France: Univ. Blaise Pascal, INRA Biological • Korea: KBSI, Chonnam National University models  Chemists: Geant4 DNA collaboration INRA, KBSI Geant4 DNA  Computer scientists collaboration • Univ. Blaise Pascal  Medical Physicists • France: Centre Jean Perrin Experimental • Korea: National Cancer Data analysis Centre data  Physicists γ, e- beams: Centre Jean Perrin, LPC • Geant4 DNA collaboration Proton, alpha beams: CENBG, NCC o CENBG, IPHC, IRSN, LPC, LPMC,LRad Radiation-free environment: LSM