High Performance Computing in Bioinformatics Tutorial Outline Outline

High Performance Tutorial Outline Computing in Bioinformatics PART I: High Performance Computing Thomas Ludwig Thomas Ludwig ([email protected]) PART II: HPC Computing in Bioinformatics Ruprecht-Karls-Universität Heidelberg Alexandros Stamatakis Computer Science Department Grand Challenges for HPC Bioinformatics HPC Bioinformatics by Example of Phylogenetic Alexandros Stamatakis ([email protected]) Inference Technische Universität München Computer Science Department © Thomas Ludwig, Alexandros Stamatakis, GCB’04 2 PART I High Performance Computing Outline Introduction Introduction Architecture Architecture Top Systems Top Systems Programming Programming Problems Problems Own Research Own Research The Future The Future © Thomas Ludwig, Alexandros Stamatakis, GCB’04 3 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 4 Introduction Why High Performance Computing? What Is It? Situation in science and engineering High Performace Computing (HPC), Replace complicated physical experiments by Networking and Storage computer simulations Deals with high and highest performance Evaluate more fine-grained models computers, with high speed networks, and powerful disk and tape storage systems User requirements Compute masses of individual tasks Performance improvement Compute complicated single tasks Compared to personal computers and small workstations: Available computational power Factor 100...10.000 Single workstation is not sufficient © Thomas Ludwig, Alexandros Stamatakis, GCB’04 5 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 6 1 What Do I Need? What Do I Need...? Small scale high performance computing Large scale high performance computing Cheapest version: use what you have Buy 10.000 PCs or a dedicated Workstations with disks and network supercomputer A bit more expensive: buy PCs Buy special hardware for networking and E.g. 16 personal computers with disks and gigabit storage ethernet Add a special building It´s mainly a human ressources problem Add an electric power station Network of workstations is time consuming to maintain Software comes for free © Thomas Ludwig, Alexandros Stamatakis, GCB’04 7 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 8 How Much Do I Have to Pay? Application Fields Small scale (<64 nodes) Numerical calculations and simulations 1000€/node Particle physics Medium scale (64-1024 nodes) Computational fluid dynamics 2000€/node (multiprocessor, 64-bit) Car crash simulations 1500€/node for high speed network 500€/node for high performance I/O Weather forecast Large scale (>1024 nodes) ... Money for building Non-numerical computations Money for power plant Chess playing, theorem proving Current costs range between 20...400 million Euros Commercial database applications © Thomas Ludwig, Alexandros Stamatakis, GCB’04 9 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 10 Application Fields... Measures 20 6 30 9 40 12 All fields of Bioinformatics Mega (2 ≅10 ) – Giga (2 ≅10 ) – Tera (2 ≅10 ) 50 15 60 18 Computational genomics Peta (2 ≅10 ) – Exa (2 ≅10 ) Computational proteomics Computational evolutionary biology Computational Performance (Flop/s) … Flop/s = floating point operations per second Modern processor: 3 GFlop/s Nr. 1 supercomputer: 35 TFlop/s (factor 10.000) In general Everything that runs beween 1 and 10.000 days Network performance (Byte/s) Personal computer: 10/100 MByte/s Everything that uses high volumes of data Supercomputer networks: gigabytes/s © Thomas Ludwig, Alexandros Stamatakis, GCB’04 11 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 12 2 Measures... Outline Main memory (Byte) Introduction Personal computer: 1 GByte Architecture Nr. 1 supercomputer: 10 TByte (factor 10.000) Top Systems Disk space (Byte) Single disk 2004: 200 GByte Programming Nr. 1 supercomputer: 700 TByte (factor 3.500) Problems Tape storage (Byte) Own Research Personal computer: 200 GByte Nr. 1 supercomputer: 1.6 Pbyte (factor 8.000) The Future © Thomas Ludwig, Alexandros Stamatakis, GCB’04 13 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 14 Architecture Distributed Memory Architecture Autonomous computers Basic classification concept: computer 1 computer 2 connected via network How is the main memory organized? processor processor Processes on each compute node have access to local Distributed memory architecture memory only Shared memory architecture local local Parallel program spawns memory memory processes over a set of processors Available systems recv() send() Communication between Dedicated supercomputers computers (processes) via message passing Cluster systems interconncetion network Called: multi computer system © Thomas Ludwig, Alexandros Stamatakis, GCB’04 15 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 16 Distributed Memory Architecture... Shared Memory Architecture Advantages computer Several processors in one box (e.g. multiprocessor mother- Good scalability: just buy new nodes processor processor board) Concept scales up to 10.000+ nodes Each process on a processor You can use what you already have sees complete address space Extend the system when you have money and need for more power Communication between Interconnection network processes via shared Disadvantages variables Complicated programming: parallelization of Called: multiprocessor formerly sequential programs system, symmetric multi- (including complicated debugging, performance shared memory processing system (SMP) tuning, load blancing, etc.) © Thomas Ludwig, Alexandros Stamatakis, GCB’04 17 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 18 3 Shared Memory Architecture... Hybrid Architectures Advantages Use several SMP systems Much easier programming Combination of shared memory systems and distributed memory system Disadvantages The good thing: scalable performance Limited scalability: up to 64 processors according to your financial budget Reason: interconnection network becomes bottleneck The bad thing: programming gets even more complicated (hybrid programming) Limited extensibility Very expensive due to high performance The reality: vendors like to sell these interconnection network systems, because they are easier to build © Thomas Ludwig, Alexandros Stamatakis, GCB’04 19 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 20 Supercomputers vs. Clusters Supercomputers vs. Clusters... Supercomputers Supercomputers (Distributed/shared memory) Very expensive to buy Constructed by a major vendor (IBM, HP, ...) Usually high availability and scalability Use custom components (processor, network, ...) Custom (Unix-like) operating systems Clusters Clusters (Network of workstations, NOWs) Factor 10 cheaper to buy, but: Assembled by vendor or users Very expensive to own Commodity-of-the-shelf components (COTS) Lower overall availability and scalability Linux operating system © Thomas Ludwig, Alexandros Stamatakis, GCB’04 21 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 22 Outline TOP500-List Introduction Lists world 500 most powerful systems Architecture www.top500.org Top Systems Update in June and November Programming Ranking based on numerical algorithm Problems In 6 months almost half of the systems fall off the list Own Research The majority of systems now are clusters The Future © Thomas Ludwig, Alexandros Stamatakis, GCB’04 23 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 24 4 TOP500 Rank 1-10 TOP500 Rank 11-20 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 25 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 26 TOP500 Performance Statistics TOP500 Performance Trends factor 1000 in 11 years © Thomas Ludwig, Alexandros Stamatakis, GCB’04 27 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 28 ☺ EarthSimulator / NEC data archive / tape robots disks my notebook compute node networking cabinets (320) cabinets (65) air cooling cables earth quake power supply protection © Thomas Ludwig, Alexandros Stamatakis, GCB’04 29 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 30 5 EarthSimulator / NEC BlueGene / IBM 640 nodes x 8 processors 10TByte main memory = 5120 processors 700TByte disk 1.6PBytes tapes 200MioUSD for computer 200MioUSD for building 83.000 copper cables and electric power station 2.800km / 220t of cabeling 2 Building: 3250m earth quake protected Application field: climate Power: 7MW simulations © Thomas Ludwig, Alexandros Stamatakis, GCB’04 31 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 32 BlueGene / IBM Outline IBM intends to break the Application fields: Introduction 1PFlop/s barrier by end of bioinformatics 2006 Ab initio protein folding Architecture Molecular dynamics on a millisecond to second Top Systems Power consumption and time scale floor space problems Programming solved by new packing Protein structure prediction technology Problems ( less than 30 Earth- Own Research Simulators! ☺ ) The Future © Thomas Ludwig, Alexandros Stamatakis, GCB’04 33 © Thomas Ludwig, Alexandros Stamatakis, GCB’04 34 Programming Parallelization Paradigms Parallelization Paradigms... What do we have? Remember the two categories 1. Compute masses of individual tasks Many processors, one program, much data 2. Compute complicated single taks How do we proceed? Start one instance of the program on each Category 1: embarrassingly parallel processor (called process) Master/worker concept: Give it one part of the data Start master on one processor Start workers on all other processors Collect the result Master sends data to workers Is this all? Master collects results

Load more