HPC Hardware & Software Development At
Total Page:16
File Type:pdf, Size:1020Kb
Heiko J Schick – IBM Deutschland R&D GmbH August 2010 HPC HW & SW Development at IBM Research & Development Lab © 2010 IBM Corporation Agenda . Section 1: Hardware and Software Development Process . Section 2: Hardware and Firmware Development . Section 3: Operating System and Device Driver Development . Section 4: Performance Tuning . Section 5: Cluster Management . Section 6: Project Examples 2 © 2010 IBM Corporation IBM Deutschland Research & Development GmbH Overview Focus Areas . One of IBM‘s largest Research & Text. Skills: Hardware, Firmware, Development sites Operating Systems, Software . Founded: and Services 1953 . More than 60 Hard- . Employees: and Software projects Berlin ~2.000 . Technology consulting . Headquarter: . Cooperation with Mainz Böblingen research institutes Walldorf and universities Böblingen . Managing Director: München Dirk Wittkopp 3 3 © 2010 IBM Corporation Research Zürich Watson Almaden China Tokio Haifa Austin India 4 © Copyright IBM Corporation 2009 Research Hardware Development Greenock Rochester Boulder Böblingen Toronto Fujisawa Endicott Burlington La Gaude San Jose East Fishkill Poughkeepsie Yasu Tucson Haifa Yamato Austin Raleigh Bangalore 5 © Copyright IBM Corporation 2009 Research Hardware Development Software Development Krakau Moskau Vancouver Dublin Hursley Minsk Rochester Böblingen Beaverton Toronto Paris Endicott Santa Foster Rom City Lenexa Littleton Beijing Teresa Poughkeepsie Haifa Yamato Austin Raleigh Costa Kairo Schanghai Mesa Taipei Pune Bangalore São Paolo Golden Coast Perth Sydney 6 © Copyright IBM Corporation 2009 Vancouver Toronto Dublin Research Lenexa Hursley Hardware Development Software Development Rochester Paris Krakau Minsk Moskau Beijing Shanghai Taipei Boulder Endicott Greenock Rom Kairo Burlington Littleton La Gaude Pune San Jose Tuscon Poughkeepsie Fishkill Yasu Fujisawa Haifa Bangalore Yamato Raleigh Böblingen Almaden Austin Tokio Watson Zürich India China Beaverton Foster City Perth Santa Teresa Costa Mesa Gold Coast Sydney São Paolo 7 © Copyright IBM Corporation 2009 Vancouver Toronto Dublin Research Lenexa Hursley Hardware Development Software Development Rochester Paris Krakau Minsk Moskau Beijing Shanghai Taipei Boulder Endicott Greenock Rom Kairo Burlington Littleton La Gaude Pune San Jose Tuscon Poughkeepsie Fishkill Yasu Fujisawa Haifa Bangalore Yamato Raleigh Böblingen Almaden Austin Tokio Watson Zürich India China Beaverton Foster City Perth Santa Teresa Costa Mesa Gold Coast Sydney São Paolo 8 © Copyright IBM Corporation 2009 University Relations . Targets: – Networking and knowledge sharing between science and business – Cooperation in and/or managing of research projects between IBM and scientific institutions . Facts: – Contacts in most German States and three European countries – More than 21 different institutions – 35 running projects - thereof 14 PHDs (CAS) – Lectures of 81 IBMers on over 100 topics in different academic institutions . Results: – Numerous publications and patents – High impact on products and projects – 2009: 66% of new hires from former internships, diplomas or thesis – 2009: 8 university awards granted Friedrich-Schiller- Universität Jena 9 © 2010 IBM Corporation Skills Open Systems Design & Development, STG . System Design, Hardware . Platform Performance Architecture & Design . Green Computing/ . Firmware Design Energy-efficient Design & Development . High Performance Systems . Bringup, Integration & Dr. Michael Malms . System Integration Test Tool Manager Open Systems Verification (Chip & System) Development Design and Development Products/Projects . POWER based Blade Servers JS20 and JS21 . Cell based Blade Servers QS21 and QS22 . POWER based Appliances & Accelerators . System Reference Platforms for expedited Systems Development . Open Firmware for POWER based Servers and Accelerators . QPACE – a HPC Cluster Co-Development with academic Partners . BlueGene – Processor Verification . Storage Products – SAN Volume Controller 10 © 2010 IBM Corporation Hardware and Software Development Process - QPACE Development Plan 11 © 2010 IBM Corporation Hardware Development . System Ownership – Cell/B.E. based Systems • BladeCenter QS21 and QS22 • QPACE Node Card – PowerPC based System • BladeCenter JS20 and JS21 . PCB Design, Hardware Bring-up and VHDL Programming – Processor RIT Protection – HSS and Memory Characterization – Cell/B.E. Processor -- FPGA High-Speed Interface . Blue Gene: – Hardware Verification and Bring-up (I/O network and network interface, pervasive unit, floating point unit, etc.) 12 © 2010 IBM Corporation Hardware Development - QPACE Node Card Network Processor Network PHYs PowerXCell 8i (FPGA) Memory Processor 13 © 2010 IBM Corporation Firmware Development SFB IBM Operating System QPACE Linux Adoptions Boot Protocol Stacks Service Nano Kernel QS22 Boot Drivers QPACE Boot Drivers Loader Framework QS22 Specific RTAS QPACE Specific RTAS RTAS Framework User Interface Device Interface Client Interface QS22 Specific Code QPACE Specific Code & Device Tree & Device Tree Slimline Open Firmware Forth Engine Open Firmware Layer CPU Init I/O MMU I/O Sub I/O MMU I/O Sub Low Level Firmware Layer Mem Init Mem Init 14 © 2010 IBM Corporation Operating System and Device Driver Development . Device Drivers: – eHCA and eHEA Linux Device Driver – e1000 and NetXen (Boot) Firmware Device Driver – directAttach: Remote-Controls Accelerator Chips (e.g. Cell/B.E.) – OpenFabrics kernel and user level Verbs API including Hardware Specific Driver . PowerPC / POWER Architecture: – Open Firmware Development for Cell/B.E. and PowerPC 970 – Linux Platform Implementation for Cell/B.E. Network Communication and Protocols: – Bring-up and Debugging for PCI Express and InfiniBand (Mellanox ConnectX Chip) – QPACE Torus Interface (+ OpenMPI) – Development, Bring-up and Debugging for QLogic InfiniPath InfiniBand Adapter on PowerPC 15 © 2010 IBM Corporation Performance Tuning . Architecture: – Exploration of complete processor, memory, I/O and network communication path – Hardware Development (e.g. QPACE DMA Engine) . Network: – QPACE Torus Network • Low-Level Communication (e.g. Message Transfers, etc.) • MPI Point-to-Point and Collectives . Applications: – High Performance LINPACK • Memory Considerations (Stored Matrix in Main Memory) • MPI Communication • Efficient Level 3 Middleware Function (e.g. BLAS Routines: DGEMM, DTRSM, …) – Micro-benchmarks (e.g. processor specific functions like LS-to-LS communication and computation acceleration units) 16 © 2010 IBM Corporation Cluster Management - Distributed Image Management for Linux Clusters – Automated IP and naming configuration via XML • Description of the cluster network and cluster naming taxonomy – Network boot environment for nodes running diskless • Manages configuration of BOOTP / PXE, DHCP, NFS, DNS – File system management for the Linux operating system • Can keep multiple images concurrently, e.g. Redhat and SUSE • Allows fast incremental maintenance of file system images – Efficient command line tools for BladeCenter and rack mounted servers • Power on/off/cycle • Get event logs, firmware levels, MAC addresses, boot order, health status, etc. – DIM is lightweight • Footprint 2 MB / 100 files (xCat : >100MB / 1800 files, CSM : 2000 files, Java ) • No extra daemons • Simple, no overhead • Easy to customize and adaptable to any Linux distribution • Available for free at IBM AlphaWorks 17 © 2010 IBM Corporation Project Examples - Mare Nostrum 18 © 2010 IBM Corporation Project Examples - Roadrunner 19 © 2010 IBM Corporation Project Examples - Blue Gene 20 © 2010 IBM Corporation Project Examples - QPACE 21 © 2010 IBM Corporation Thank you very much for your attention. 22 © 2010 IBM Corporation Disclaimer . IBM®, DB2®, MVS/ESA, AIX®, S/390®, AS/400®, OS/390®, OS/400®, iSeries, pSeries, xSeries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere®, Netfinity®, Tivoli®, Informix und Informix® Dynamic ServerTM, IBM, BladeCenter and POWER and others are trademarks of the IBM Corporation in US and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license there from. Linux is a trademark of Linus Torvalds in the United States, other countries or both. Other company, product, or service names may be trademarks or service marks of others. The information and materials are provided on an "as is" basis and are subject to change. 23 © 2010 IBM Corporation .