BSC-Microsoft Research Centre Overview

Osman Unsal

Moscow 11.June.2009 Outline

• Barcelona Supercomputing Center

• BSC – Microsoft Research Centre Barcelona Supercomputing Center

• Mission – Investigate, develop and manage technology to facilitate the advancement of science.

• Objectives – Operate national supercomputing facility – R&D in Supercomputing and Computer Architecture. – Collaborate in R&D e-Science

• Consortium – the Spanish Government (MEC) – the Catalonian Government (DURSI) – the Technical University of Catalonia (UPC) Location

MareNostrum

● MareNostrum 2006 ● 10240 PowerPC 970 cores ● 2560 JS21 2.3 GHz ● 20 TB of Memory ● 8 GB per node ● 380 TB Storage Capacity ● 3 networks ● Myrinet ● Gigabit ● 10/100 Ethernet ● ● Linux 2.6 (SuSE) Spanish Supercomputing Network

Magerit Altamira

CaesarAugusta LaPalma Picasso Tirant Access Committee Resource utilization

Solicitudes recibidas y concedidas 140

120 • Resource distribution 100 – 20% internal use 80 60

– 80% assigned via Access 40

Committee 20

0 • Unique Access Committee 2006 P2 2006 P3 2006 P4 2007 P1 2007 P2 2007 P3 2008 P1 Total solicitadas, concedidas (clase A) y usadas – 44 Spanish researchers 60000 Solicitadas 50000 Concedidas • Core Team Usadas • 4 panels 40000 30000

– Astrophysics, Space, and Khoras Earth Sciences 20000

– Biomedicine and Life Sciences 10000

– Physics and Engineering 0 2006 P2 2006 P3 2006 P4 2007 P1 2007 P2 2007 P3 2008 P1 – Chemistry and New Materials

Uso por áreas de Ciencia – New applications every four months 2500

– Committee renewed every 2 years 2000

– Designated by the Spanish Ministry 1500 QCM FI BCV 1000

– Technical assistance by BSC Khoras AECT

staff 500

0 01/02/08 08/02/08 15/02/08 22/02/08 29/02/08 07/03/08 14/03/08 21/03/08 28/03/08 04/04/08 11/04/08 18/04/08 Industry using MareNostrum

● Average utilization of 1000 processors ● Using dedicated network

● Oil prospection using 2000 processors, each execution lasts for 5 weeks

• Provide HPC computing resourceswith the following specifications: – 1 Million CPU-hours – Disk space: 2 Terabytes capacity – Disaster Back-up for 20 Terabytes capacity Staff Evolution BSC-CNS has 195 members at October of 2007 and hailed from 21 different countries (Argentina, Belgium, Brazil, Bulgaria, Colombia, China, Cuba, France, Germany, India, Ireland, Italy, Lebanon, Mexico, Poland, Russia, Serbia, Turkey, the United Kingdom, the United States and Spain). BSC Departments

• Computer Science – Computer Architecture • Life Sciences • Earth Sciences • Computer Applications Towards Petaflop computing systems

Analysis and prediction of: • Applications • Parallel execution environments • OS activity

The Grid Distributed-memory programming model - Scalable MPI

Shared-memory Future Petaflop programming models systems - Mercurium OpenMP compiler - NthLib - NanosDSM - Transactional memory Small DMM

Large cluster systems

On-board SMP cc-NUMA Hardware and software technologies for: 10s, 100s, 1000s, … cores, • Grand Challenge applications multithreaded chip • Killer applications (Google, Amazon, eBay, …) CIRI R&D Philosophy

• Technology Web

Metacomputing Applications •ST-ORM •Steel stamping •Structural analysis MPI •MGPOM •UTE2paraver •MPIRE •OMPITrace

Computer OpenMP Performance •OMPTrace Dimemas Architecture •activity Visualization •Collective, mapping •hw counters •Scheduling •Nested Parallelism Paraver •Precedences •Indirect access

System scheduling RunTime Scheduling •Self analysis •Performance Driven Proc. Alloc. •Dynamic load balancing •Process and memory control •Page migration Amber Marenostrum

5x CPU?

MN prediction

5x CPU Ideal BW? MN prediction Ideal BW BSC Departments

• Computer Science – Computer Architecture • Life Sciences • Earth Sciences • Computer Applications Computer Architecture is .....

• About the interface between what technology can provide and what the people/market demand • At that interface we have our design point: – Reliability – Availability – Cost – Power – Performance

Yale Patt, University of Austin at Texas

Superscalar Processor

file

Fetch

select

Bypass

Decode

Rename Commit

Register

Window

Wakeup+

Instruction

Data Cache Data Register Write Register

Fetch of multiple instructions every cycle. Rename of registers to eliminate added dependencies. Instructions wait for source operands and for functional units. Out- of -order execution, but in order graduation. Predict branches and speculative execution

J.E. Smith and S.Vajapeyam.¨Trace Processors…¨ IEEE Computer.Sept. 1997. pp68-74. Superscalar Processors • Out of order (IPC <= 3)

F D R E E E E W F D R E E E W 1 F D R E E W T = N * * t IPC c F D R E E E E W F D R E E E W F D R E E W Time F D R E E E W F D R E E E E W F D R E E E E W Future High-Performance Processors

• Future Applications: Integer, FP, Commercial and Multimedia • Future Technology: Moore´s Law • No New Paradigm for Computing • Pipeline Design and Optimization: – Memory Hierarchy: Cache, Prefetching – Branch Predictors – Wire Delays – Power – Design – Verification • ILP improvements Future Architectures

• Superscalar • VLIW (EPIC) • Multithreaded • Multiprocessors on a chip

Which is the Architecture that will dominate the future?? Advanced ILP Techniques for Superscalar Processors

• Value Prediction • Reuse • Helper Threads • Trace Processor • Control/Data Speculation • Kilo-instruction Processors BSC Departments

• Computer Science – Computer Architecture • Life Sciences • Earth Sciences • Computer Applications Life Sciences Department Our goal is to understand Biology from a biomedical, molecular and evolutionary point of view Covering from the atom to the earth Group Research Line

Victor Atomic (and electronic) modeling of Guallar protein biochemistry and biophysics

Modesto Molecular dynamics: understanding the Orozco motion of proteins

Juan Identification of the structural bases of Fernández protein-protein interaction

Patrick Protein-protein interaction networks Aloy

Josep Ll. Web services Gelpí

Analysis of genomes and networks to model David diseases, systems and evolution of organisms Torrents Analysis of protein and function diversity on earth Computational genomics group (collaboration with EMBL, Heidelberg)

Identification and classification of new and known proteins from Molecular known genomes and metagenomic data Biology

Providing new functional associations and hypothesis for basic research

IMPACT + FIELDS FP7 European project BioMedicine Identification of new microorganisms, pathways, proteins and molecules. New therapeutic molecules (antibiotics, protein targets) Evolution Ecology Understanding the mechanisms Identification of new pathways and organisms of evolution of proteins and organisms for new possibilities in biodegradation. Understanding the structural basis of the interaction between proteins and with therapeutic drugs

- Drug design Biomedical - Identification of therapeutic protein targets and molecular - Finding functional associations between proteins goals: - Function prediction for new proteins - Structure prediction of proteins Life Sciences Department

• Library of protein MD simulations • 1400 at the moment • >900 CPU years. • 10 Tb of data

TimeSlice The natura key is the set of these fields CP,CE1 analysisId - version: can allow to have more CP initialSnapshot ENUM(„CATEGORY‟, the one simulation with the same CP lastSnapshot 'FILE','FLOAT', force field for the same protein Analysis ‟STRING‟) - pdbCode reason - programVersion (for example: CP name The web will load the AMBER8) CP,CE3 parentName title of the analysis from and all possible combinations of this field SimulationAnalysis ForceFields initialSnapshot = -1 type The web will show the && CP analysisId title analysis help from this lastSnapshot = -1 description field means ALL Simulation CE1 simulationId CP simulationId CE2 name CE2 parentName AnalysisFragment Fragment floatValue version CP,CE2 analysisId CP fragmentId stringValue CE1 pdbCode UNIQUE restraint CP,CE1 fragmentId programVersion (these fields are the pdbCode date real primary key) chain status statusComment Status values: simulationTime - -1: no information about status ResidueFragment saltConcentration - 0: will be calculated some day boxType - 1: setup finished, ready to be CP,CE1 fragmentId boxX calculated boxY - 2: @ Marenostrum initialResidue boxZ - 3: simulation finished lastResidue @ means volume - 4: analysis in execution NO CHAIN waters - 5: analysis finished: ok totalAtoms - 6: analysis finished: no ok temperature entries integrationTime CP idCode samplingTime -999, -999 Status comment: to be defined, proteinCharge means ALL but probably it will be a text totalCharge field where you will be able to shake put any interesting comment. cutoff GeneOntologyFragment PBC SimulationForceField CP,CE1 fragmentId ewald EnzymeCodeFragment CP,CE1 forceFieldId geneOntologyId CP,CE2 simulationId CP,CE1 fragmentId evidence desctiption enzymeCode

SetupOption ForceField ScopFragment CP,CE1 simulationId ActiveSiteFragment CP,CE2 idMon CP forceFieldId CP,CE1 fragmentId CP number CP,CE1 fragmentId CP type name scopId domain description name

Possible values CathFragment for type: PfamFragment The web - A (atom) will show Monomer CP,CE1 fragmentId CP,CE1 fragmentId - H (heteroatom) the analysis - W (water) CP idMon help from cathId accession - R (residue) this field domain name sourceData name numberAtoms OtherFragment synonym ENUM('PDB','MoDEL', formula CP,CE1 fragmentId 'OTHER') molecularWeight Version: 200501091241 charge reason

Rueda et al., PNAS 2007 BSC: Life Sciences R&D THE ENCODE THE EARTH PROJECT PROTEOME

• Genome all organisms + all metagenomes • 1.4x106 sequences • 32 Tb of RAM used • 4000 processors • 38 CPU years • 9 Tb of output data • G-Superscalar • 3.5 days job! BSC Departments

• Computer Science – Computer Architecture • Life Sciences • Earth Sciences • Computer Applications BSC Earth Sciences Department

Air Quality and Meteorological Modelling:

 The group offers a pan-European (12 km) and Spain (4 km) Air Quality forecasting and assessment service to end-users that takes advantage of the high spatial and temporal resolution of the air quality modelling system. (http://www.bsc.es/caliope). Development of the HERMES high-resolution emission model for Spain (1 km and 1 hr). BSC Earth Sciences Department Mineral Dust:  Daily operational forecasts of mineral dust for the Euro- Mediterranean and the East Asia region based DREAM Model (http://www.bsc.es/projects/earthscience/DREAM).  Leading initiative: World Meteorological Organization initiative to create a Regional Centre for Sand and Dust Storm Warning System.  Initial steps for the development of a global-to-regional / hydrostatic-non hydrostatic dust model based on the UMO.

Climate Modelling:  Global Climate Modelling with NASA GISS ModelE and NCAR WACCM in MareNostrum with a resolution of 2ºx2.5º.  Implementation of a regional climate model (RCM) based on the WRF/CMAQ/DREAM system for the Mediterranean Sea and Europe (20 km resolution) in order to simulations to ascertain the regional impact of climate change in the trends of extreme events. Asia dust forecasting at BSC SDS WS Operational products http://www.bsc.es/projects/earthscience/DREAM/ Model predictions (72-h): Horizontal distribution  PM2.5, PM10, TSP at surface and height  Total column mass (dust load)  Dust aerosol optical depth  Wet, dry, total deposition  Visibility (soon available)  Meteorological variables  Vertical distribution  Cross sections  Fixed point/time profiles  Fixed point (selected sites/cities)  Dustgrams  Meteograms

Request-only basis:  Numerical data  Climatology World Meteorological Organization (WMO): Sand and Dust Storm Warning and Assessment System (SDS-WAS)

• To enhance the ability of participating countries to establish and improve systems for forecasting and warning to suppress the impact of Sand and Dust Storm by • Establishing a coordinated global network of Sand and Dust Storm forecasting centers delivering products useful to a wide range of users in understanding and reducing the impacts of SDS North Africa, Middle East and Europe Asia

BSC China Meteorological AEMET Administration (CMA) CSIC, Spain Xiao Ye Zhang [email protected] WMO REGIONAL CENTRES BSC Departments

• Computer Science – Computer Architecture • Life Sciences • Earth Sciences • Computer Applications CASE R&D lines

• Objective: Grand Challenge problems with quasi linear scalability – 10,000 processors with 95% parallel efficiency

• Flow coupled problems (Aero Acoustics, Reactive flows, building energetic efficiency, …)

• Biomechanics (Haemodynamics, Cardiac model, Nasal airflow, …)

• Ab-initio Molecular dynamics and spectroscopy

• Plasma physics (Gryro-kinetic PIC codes,…)

• Seismic imaging (RTM, tomography,…) Biomechanics

● Cardiac Simulator

● By-pass Blood flow CASE R&D connection with China

Invited speakers in: AEROCHINA-2 Short Course, co-funded by Min. of Science and Technology of China and European Commission

Topics: Complex multi-physics HPC simulations in Aerospace Engineering

BSC talk: “HPC in Aerospace Industry for Large Scale : Aerodynamics and Aeroacoustics”

Marseille, June 16-18, 2008 Imagen Sísmica: Proyecto Calidoscopio

Selected by "IEEE Spectrum" as one of the 5 most-innovative technologies in 2007 BSC General Information BSC Projects income

Industry Spain

IBM Mare- Incognito ICTS 2006 Microsoft CONSOLIDER Repsol RES Airbus Caliope Genoma2 Others SUN Others

EU

DEISA SARC PRACE VELOX Total income XTreemOS Industry 11.368.925 € MITIN MERASA EU 8.445.744 € Others Spain 3.735.332 € Objective

• Design a 10+ Petaflops Supercomputer for 2010-11

• BSC – IBM Cooperation 1 • Spanish position with PRACE tier GEN

CI Ecosystem

tier 0 Principal Partners

General Partners CELL processor

Roadrunner supercomputer at 235 Mtransistors Los Alamos National Laboratory 235 mm2 Project structure

4 relevant apps: SIESTA, RTM, ALYA, TORB StarSs: CellSs, SMPSs General kernels OpenMP@Cell MPI+ OpenMP/StarSs Programming Applications models

Coordinated scheduling: Run time, Automatic analysis Process, Coarse/fine grain prediction Performance Models and Load Job Sampling analysis Power efficiency Clustering prototype balancing Integration with Peekperf tools

Processor Interconnect and node Contention Contribution to new Cell design Collectives Support for programming model Overlap Support for load balancing computation/communication Issues for future processors Supercomputing and e-Science: research areas and groups (1)

• Grupo de Modelización Molecular y Bioinformática (U. de Barcelona, M. Orozco) • Grup de Bioinformàtica del Genoma (Centre de Regulació Genòmica, R. Guigó) • Grupo de Biología Estructural Computacional (Centro Nacional de Investigaciones Oncológicas, A. Valencia)

Application scope “Life Sciences” • Grupo de Ciencias de la Tierra (BSC-CNS, J. M. Baldasano) Compilers and • Unidad de Contaminación Atmosférica (CIEMAT, F. Martín) Application scope tuning of application • Grupo de Diagnóstico y Modelización del Clima (U. Complutense de “Earth Sciences” Madrid, R. García-Herrera) kernels

• Grupo de Astrofísica Relativista y Cosmología (U. de Programming Valencia, J. M. Ibáñez) • Grupo de Simulación Numérica de Procesos Astrofísicos models and Application scope (Instituto de Astrofísica de Canarias, F. Moreno) • Grupo GAIA de Astronomía Galáctica (U. de Barcelona, Applicationperformance areas “Astrophysics” J. Torra) tuning tools • Grupo de Cosmología Computacional (U. Autónoma de Madrid, G. Yepes)

Architectures Application scope • Grupo de Mecánica de Fluidos Computacional (U. Politécnica de and hardware “Engineering” Madrid, J. Jiménez Sendín) technologies • Unidad de Simulación Numérica y Modelización de Procesos (CIEMAT, M. Uhlmann) Application scope “Physics”

• Grupo SIESTA (U. Autónoma de Madrid, J. M. Soler) • Grupo SIESTA (Laboratorio de Estructura Electrónica de los Materiales del Instituto de Ciencia de Materiales de Barcelona ICMAB-CSIC, P. J. Ordejón) Supercomputing and e-Science:

• Departamentoresearch de Tecnologías de la Información (BSC-CNS, M.areas Valero) and groups (2) • Grupo Computación de Altas Prestaciones (U. Politècnica de Catalunya, J. M. Llabería) • Grupo de Arquitectura y Tecnología de Sistemas Informáticos (U. Complutense de Madrid, F. Tirado) Application scope • Grupo de Arquitectura de Computadores (U. de Malaga, E. López Zapata) “Life Sciences” Compilers and Application scope tuning of application “Earth Sciences” kernels

• Departamento de Tecnologías de la Información (BSC-CNS, M. Valero) • Grupo Computación de Altas Prestaciones (U. Politècnica de Catalunya, J. M. Llabería) • Parallel Processing and Distributed Systems group Programming (U. Autónoma de Barcelona, A. Ripoll) models and Basic researchApplication scope performance “Astrophysics” tuning tools in supercomputing

• Departamento de Tecnologías de la Información (BSC-CNS, M. Valero) • Grupo Computación de Altas Prestaciones (U. Politècnica de Catalunya, J. M. Llabería) Architectures • Grupo de Arquitectura de Computadores (U de Zaragoza, Application scope V. Viñals) and hardware • Grupo de Arquitectura y Tecnología de Sistemas Informáticos (U. “Engineering” Complutense de Madrid, F. Tirado) technologies • Grupo de Arquitectura y Tecnología de Computadores (U. de Cantabria, J. R. Beivide) Application scope • Grupo de Arquitectura de Computadores (U. de Malaga, “Physics” E. López Zapata) • Grupo de Arquitectura de Computadores (U. de Las Palmas de Gran Canaria, Instituto Universitario de Ciencias y Tecnologías Cibernéticas, E. Fernández) Supercomputing and e-Science: research areas and groups (1)

Application scope • Consolider “Life Sciences” Compilers and program proposal Application scope tuning of application “Earth Sciences” • 22 groups kernels • 119 senior researchers • Work structured around: – 10 workpackages Programming – 29 tasks models and Application scope – 3 grand challenges performance “Astrophysics” tuning tools

Architectures Application scope and hardware “Engineering” technologies Application scope “Physics”

Strong interaction Interaction to be created BSC in Europe - DEISA Mobility – HPC - Europa

100% 90% 80% Physics 70% M athematics 60% Life sciences 50% Human Sciences 40% Engineering 30% Earth sciences 20% Computing 10% Chemistry Astrophysics 0%

IDRIS EPCC HLRS SARA CEPBA CINECA

ene-04 abr-04 ago-04 nov-04 feb-05 may-05 sep-05 dic-05 mar-06 10

5 BSC 0 CINECA -5 EPCC HLRS -10 IDRIS -15 SARA

Number of visitors of Number -20

-25

-30 Outline

• Barcelona Supercomputing Center

• BSC – Microsoft Research Centre The motivation

• “ItTop is better-down for Intel to architecture,get involved in this now so include: when we get to the point– Applicationof having 10s and 100s of cores we will have the answers. There is a lot of architecture work to do to release the potential, and– weDebugging will not bring these products to market until we have good solutions to the programming problem”

Justin– Programming Rattner Intel CTO models Marenostrum

– Programming languages Most beautiful supercomputer Source: Intel – Compilers Fortune magazine, Sept. 2006 – Operating Systems #1 in Europe, #5 in the World 100's of TeraFlops – Runtime environment with general purpose Linux supercluster of commodity • As design drivers PowerPC-based Blade Servers “Now, the•80 grains processors inside these in a diemachines of 300 more square and mm.more will be multi-core type devices, and so• Terabytesthe idea of parallelizationper second ofwon't memory just be bandwidthat the individual chip level, even inside•Note: that The chip barrier we need of tothe explore Teraflops new techniqueswas obtained like transactionalby Intel in 1991 memory using 10.000 Pentium that will allow us to get the full benefit of all those transistors Pro processors contained in more than 85 cabinets occupying 200 square meters  and map that into higher and higher performance.” Bill Gates,•This Supercomputing will be possible 05 inkeynote 5 years from now Microsoft- BSC Project: A quick snapshot

• Project started in April 2006

• Two-year initial duration

• Initial topic:Transactional Memory

• BSC – Microsoft Research Centre inaugurated January 2008 BSC Contributors

Mateo Valero Professor Spain Part-time Vladimir Gajinov PhD Student Serbia Full-time Eduard Ayguade Professor Spain Part-time Gokcen Kestor PhD Student Turkey Full-Time Adrian Cristal Doctor Argentina Full-time Chinmay Kulkarni Undergraduate India Intern Osman Unsal Doctor Turkey Full-time Azam Seyedi PhD Student Iran Full-Time Alex Pajuelo Doctpr Spain Part-time Nikola Markovic PhD Student Serbia Full-Time Xavier Verdu Doctor Spain Part-time Gulay Yalcin PhD Student Turkey Full-Time Enrique Vallejo PhD Student Spain Part-time Vesna Smiljovic PhD Student Serbia Full-Time Srdjan Stipic PhD Student Serbia Full-time Nebosja Miletic PhD Student Serbia Full-Time

Ferad Zyulkyarov PhD Student Bulgaria Full-time Oriol Arcas M.S. Student Spain Intern Sasa Tomic PhD Student Serbia Full-time Adria Armejach M.S. Student Spain Intern Nehir Sonmez PhD Student Turkey Full-time Otto Pflucker M.S. Student Spain Intern Cristian Perfumo PhD Student Argentina Full-time Vasilis Karakostas M.S. Student Greece Intern Sutirtha Sanyal PhD Student India Full-time Jasmina Tomic Webmaster Serbia Part-time Microsoft Contributors

Management Technical (Executive and Advisory Board) Tim Harris Fabrizio Gagliardi Satnam Singh Andrew Herbert Doug Burger Tony Hey Avi Mendelson Daniel Reed • • • • final objective initial driving force, but the commonplace Adventof multicore chips Full performance not the Simplicity vs. wide applicability Parallel programming

Applications Transactional MemoryTransactional

Programming model

Imperative Functional MSRC TransactionalMemory STM Architecture BSC HTM Benchmark • • • • • Wormbench Atomic Quake RMS applications Haskell Transactional - - TM Quake

Applications

Programming model Research onTM

Imperative Functional MSRC TransactionalMemory STM Architecture BSC HTM • • • TM TM in systemlibraries OpenMP + TM Haskell STM

Applications

Programming model on TM Research

Imperative Functional MSRC TransactionalMemory STM Architecture BSC HTM • • • • • Power EazyHTM Dynamic filtering Simulators HW acceleration of STM - efficient TM

Applications

Programming model on TM Research

Imperative Functional MSRC TransactionalMemory STM Architecture BSC HTM Major focus on TM applications

• Haskell STM benchmark, RMS-TM, Wormbench, QuakeTM, Atomic Quake

• Requested ahead of time by Intel, SUN, …

• Released publicly through www.bscmsrc.eu Quake • Interactive first-person shooter game • Very popular among gamers • Challenge for multi-core processors

• Two versions: • Atomic Quake • Quake TM Atomic Quake – General Overview • 10 man months • 27,400 Lines of C code • 56 files • 63 atomic blocks 98% In Transactions Single thread, 100% loaded • Irregular parallelism – Requests are dispatched to their handler functions. Quake-TM: The Server

• The main server task - computing a new frame

Frame execution diagram Execution breakdown

SELECT Stage Time [%]

Physics Request Processing 87.8

Reply 3.1 Process We concentrate on the request Physics Update 2.1 Rx Yes processing stage Read Measuring and Info 5.3

No Other 1.7 Reply Tx Sequential server execution with 8 connected clients. Quake-TM Parallelization

• Each of three main frame phases can be parallelized separately. – No intersection between parallel regions – We parallelize only request processing stage • OpenMP to start parallel execution. • Transactions for synchronization. • Macros to start transactions. – Support for various kinds of transactional systems. Results: TM and OpenMP

• BSC-Microsoft led the first proposal to integrate TM with OpenMP; the industrial standard for writing shared-memory parallel programs • Prototype uses in-house tools: Mercurium (S2S OpenMP)

#pragma omp task transaction [exclude(list)|only(list)] structured-block

• Suitable for the upcoming updates to the OpenMP standard which includes tasking • Simplifies writing code with tasks and critical sections

• Cooperating with other players in the field • On-track to define a unified OpenMP-TM standard • Participating on the appropriate forums (GCC, OpenMP) to push for industry adoption Haskell STM

• TM a first-class language for Haskell for a long time – Many STM applications developed

• UnreadTVar: delisting a variable from the readset • Limits of STM • AB-based application profiling • Dynamic switching between pessimistic and optimistic concurrency Selected Publications on TM V. Gajinov, F. Zyulkyarov, A. Cristal, O. Unsal, T. Harris, M. Valero, “QuakeTM: Parallelizing a Complex Serial Application Using Transactional Memory”, 23rd International Conference on Supercomputing (ICS 2009), June 2009. N. Sonmez, A. Cristal, T. Harris, O. Unsal, M. Valero, “Taking the heat off transactions: dynamic selection of pessimistic concurrency control”, 23th International Parallel and Distributed Systems Symposium, (IPDPS 2009), May 2009, F. Zyulkyarov , V. Gajinov, O. Unsal , A. Cristal , E. Ayguade, T. Harris , M. Valero, “Atomic Quake: Use Case of Transactional Memory in an Interactive Multiplayer Game Server”, 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), February 2009. C. Perfumo, N. Sonmez, S. Stipic, O. Unsal, A. Cristal, T. Harris, M. Valero, “The Limits of Software Transactional Memory (STM): Dissecting Haskell STM Applications on a Many- Core Environment”, ACM International Conference on Computing Frontiers, May 2008 E. Vallejo, T. Harris, A. Cristal, O. Unsal, M. Valero, “Hybrid Transactional Memory to Accelerate Safe Lock-based Transactions”, Third ACM SIGPLAN Workshop on Transactional Computing TRANSACT, February 2008 M. Milovanović, R. Ferrer, O. Unsal, A. Cristal, X. Martorell, E. Ayguadé, J. Labarta and M. Valero, “Transactional Memory and OpenMP”, International Journal of Parallel Programming. C. Perfumo, N. Sonmez, O. Unsal, A. Cristal, M. Valero, T. Harris, “Dissecting Transactional Executions in Haskell”, Second ACM Workshop on Transactional Computing TRANSACT, August 2007 T. Harris, S. Stipic, “Abstract Nested Transactions”, Second ACM Workshop on Transactional Computing TRANSACT, August 2007 T. Harris, A. Cristal, O. Unsal, E. Ayguade, F. Gagliardi, B. Smith, M. Valero, “Transactional Memory: An Overview”, IEEE Micro, May-June 2007 FP7 TM-Project VELOX

• Goal: Unified TM system stack, Duration: 2008-2011, Budget: 5M Euros • BSC involvement initiated through BSC-MSR Centre • Project participants: – BSC – EPFL – University of Neuchatel – Technical University of Dresden – Tel Aviv University – Chalmers Institute of Technology – AMD – Red-Hat – VirtualLogix • Project observers: – Microsoft – Intel – SUN Empowering TM across the system stack

C TM-based Applications and Benchmarks (Legacy software migration, TM-aware development, etc.)

A 7 Application Environments B (Application servers, event processing engine, etc.)

6 Programming Language (Java/C/C++ integration, exception handling, etc.)

5 Compiler (Code generation, bytecode instrumentation, etc.)

4 TM Libraries

(Word-based, object-based, PLS, TBB, etc.) Theory

3 Native Libraries Integration (System libraries)

2 Operating System

(Low-level TM support) TM, (Adaptive hybrid approaches, etc.) (Contention, (Contention, isolation, algorithms, etc.)

1 TM Hardware and Adaptation Layer (Multi-core CPUs, HW+SW transactions, etc.)

http://www.velox-project.eu/ Barcelona Multi-core Workshop BMW June 2008

• BSCMSRC organized Barcelona Multi-core Workshop in June 2008 – 200 people attending • Presentations by lead architects on new multi-cores: – Larrabee (Intel) – Rock (SUN) – Cell (IBM) • Presentations on parallel programming – UPCRC, Illinois – PPL, Stanford – StarSs, BSC • Presentation on EUs future plans for computing systems • Two panel sessions – 50 Billion Transistor Chips: What should the Hardware provide? – What Terrifies You More: Multicore Hardware or Software? Thanks! http://www.bscmsrc.eu