Algoritmos Paralelos Y Distribuidos Para Resolver Ecuaciones Matriciales De Lyapunov En Problemas De Reducción De Modelos

Total Page:16

File Type:pdf, Size:1020Kb

Algoritmos Paralelos Y Distribuidos Para Resolver Ecuaciones Matriciales De Lyapunov En Problemas De Reducción De Modelos UNIVERSIDAD POLITÉCNICA DE VALENCIA DEPARTAMENTO DE SISTEMAS INFORMÁTICOS Y COMPUTACIÓN Algoritmos paralelos y distribuidos para resolver ecuaciones matriciales de Lyapunov en problemas de reducción de modelos Tesis Doctoral presentada por José M. Claver Iborra Dirigida por: Dr. D. Vicente Hernández García Dr. D. Enrique S. Quintana Ortí Valencia, junio de 1998 Agradecimientos Quiero expresar en estas breves líneas mi agradecimiento a las muchas personas que me prestaron su ayuda y apoyo durante la realización de esta tesis. En primer lugar quiero agradecer a mis directores, los Doctores Vicente Hernández y Enrique S. Quintana su inestimable apoyo. En particular, agradecer a Vicente Hernández que me introdujera en este campo de investigación, su magistral forma de enseñar y su constante preocupación por mi trabajo y a Enrique S. Quintana su constante atención y sus consejos durante las últimas etapas de esta tesis. A mis colegas del departamento de Informática de la Universitat Jaume I y especialmente a mis compañeros del grupo de investigación. He aprendido mucho de ellos y siempre han estado dispuestos a prestarme su ayuda. A mis padres por su incondicional apoyo y ánimo. Y por último, mi especial agradecimiento a mi mujer, María José, por creer siempre en mi y estar a mi lado en todo momento. Índice Página 1 El problema de la reducción de modelos. Conceptos fundamentales 1 1.1 Introducción 1 1.1.1 Objetivos y justificación 2 1.1.2 Estructura de la Tesis 3 1.2 Conceptos básicos de álgebra lineal 4 1.2.1 Notación 4 1.2.2 Valores y vectores propios 5 1.2.3 Subespacios invariantes y deflactantes 7 1.2.4 Valores y vectores singulares 8 1.2.5 Condicionamiento y estabilidad numérica 9 1.3 Conceptos básicos del análisis y diseño de sistemas lineales de control (SLC) 10 1.3.1 El modelo del espacio de estados 10 1.3.2 Estabilidad 12 1.3.3 Controlabilidad y Estabilizabilidad 12 1.3.4 Observabilidad y Detectabilidad 13 1.3.5 Realizaciones minimales 14 1.3.6 Las matrices Gramian y los valores singulares de Hankel de un SLC 14 Página i ii INDICE 1.4 El problema de la reducción de modelos 15 1.4.1 Planteamiento del problema 15 1.4.2 Importancia de la reducción de modelos 16 1.4.3 Métodos utilizados para la reducción de modelos 17 2 Métodos de truncamiento en el espacio de estados 19 2.1 Introducción 19 2.2 Realizaciones balanceadas 22 2.2.1 Sistemas en lazo abierto 22 2.2.2 Sistemas en lazo cerrado 27 2.3 Método de Schur 31 2.4 Conclusiones 33 3 Métodos para la resolución de las ecuaciones de Lyapunov 35 3.1 Introducción 35 3.2 Condicionamiento del problema 37 3.3 Método de Bartels-Stewart 39 3.4 Método de Hammarling 43 3.4.1 Resolución de las ecuaciones de Lyapunov en tiempo continuo 43 3.4.1.1 Valores propios reales 45 3.4.1.2 Valores propios complejos 47 3.4.2 Resolución de las ecuaciones de Lyapunov en tiempo discreto 49 3.4.2.1 Valores propios reales 51 3.4.2.2 Valores propios complejos 53 3.5 Método de la función signo matricial 56 3.5.1 La función signo matricial y la ecuación de Lyapunov 56 Página 3.5.2 Análisis de perturbaciones del método de la función signo matricial 59 INDICE iii 3.5.3 Aceleración de la convergencia: escalado e iteración de Halley 60 3.5.4 Iteraciones localmente convergentes 62 3.5.5 Algoritmo para resolver la ecuación de Lyapunov 62 3.5.6 Algoritmo para resolver la ecuación de Lyapunov para el factor de Cholesky 65 3.5.7 Algoritmos para resolver la ecuación de Lyapunov generalizada 67 3.5.8 Algoritmo para resolver ecuaciones de Lyapunov acopladas 69 3.5.9 Algoritmo para resolver la ecuación de Lyapunov en tiempo discreto 71 3.6 Conclusiones 72 4 Arquitecturas y algoritmos paralelos 75 4.1 Introducción 75 4.2 Arquitecturas de altas prestaciones 76 4.2.1 Clasificación de las arquitecturas 76 4.2.2 Metodologías de programación 78 4.2.3 Prestaciones de los algoritmos paralelos 79 4.3 Multicomputadores 80 4.3.1 Distribución de datos 80 4.3.2 Comunicaciones 84 4.3.3 Librerías de comunicaciones PVM y MPI 86 4.3.4 Núcleos computacionales paralelos (ScaLAPACK) 87 4.4 Computadores paralelos 90 4.4.1 Alliant FX/80 90 4.4.2 SGI PowerChallenge 92 4.4.3 Cray T3D 94 4.4.4 IBM SP2 95 Página 4.5 Conclusiones 97 iv INDICE 5 Algoritmos paralelos para resolver la ecuación de Lyapunov 99 5.1 Introducción 99 5.2 Paralelización del método de Hammarling 100 5.2.1 Dependencia de datos en el algoritmo de Hammarling 101 5.2.2 Multiprocesadores con memoria compartida 104 5.2.2.1 Algoritmos de grano fino para procesadores superescalares 105 5.2.2.2 Algoritmos de grano medio para procesadores vectoriales 109 5.2.3 Multiprocesadores con memoria distribuida 115 5.2.3.1 Algoritmos de frente de onda 115 5.2.3.2 Algoritmos de frente de onda adaptativos 123 5.2.3.3 Algoritmos cíclicos 125 5.2.3.4 Algoritmos cíclicos modificados 135 5.2.3.5 Algoritmos cíclicos frente de onda 137 5.3 Paralelización del método de la función signo matricial 142 5.3.1 Ejemplo de paralelización con el ScaLAPACK: Factorización LU 143 5.3.2 Algoritmo para la solución de la ecuación de Lyapunov 147 5.3.3 Algoritmo para obtener el factor de Cholesky de la ecuación de Lyapunov 149 5.3.4 Algoritmo para resolver la ecuación de Lyapunov generalizada 150 5.3.5 Algoritmo para resolver ecuaciones de Lyapunov acopladas 152 5.3.6 Análisis teórico de los algoritmos 153 5.4 Conclusiones 155 6 Análisis de los resultados experimentales 157 6.1 Introducción 157 Página 6.2 Método de Hammarling 158 6.2.1 Multiprocesadores con memoria compartida 158 INDICE v 6.2.2 Multiprocesadores con memoria distribuida 166 6.2.2.1 Algoritmos de frente de onda 166 6.2.2.2 Algoritmos de frente de onda adaptativos 170 6.2.2.3 Algoritmos cíclicos 172 6.2.3 Análisis de resultados 177 6.3 Método de la función signo matricial 178 6.3.1 Análisis de la fiabilidad numérica 179 6.3.2 Ecuación de Lyapunov generalizada 183 6.3.3 Ecuaciones de Lyapunov generalizadas acopladas 188 6.3.4 Análisis de resultados 193 6.4 Conclusiones 197 7 Conclusiones y líneas futuras 199 7.1 Conclusiones 199 7.1.1 Reducción de modelos: truncamiento en el espacio de estados 199 7.1.2 Métodos para la resolución de las ecuaciones de Lyapunov 200 7.1.3 Computadores de altas prestaciones 201 7.1.4 Algoritmos paralelos basados en el método de Hammarling 201 7.1.5 Algoritmos paralelos basados en la función signo matricial 203 7.2 Publicaciones en el marco de la Tesis 204 7.3 Líneas futuras de investigación 204 7.4 Agradecimientos 205 Bibliografía 207 Capítulo 1 El problema de la reducción de modelos. Conceptos fundamentales 1.1 Introducción En la actualidad se está realizando un gran esfuerzo por parte de la comunidad científico- técnica para la resolución de problemas de ingeniería dentro de unos límites determinados de tiempo, precisión y coste. Estos problemas tienen cada vez mayor dimensión, en algunos casos demasiado grande como para ser resueltos dentro de los límites de tiempo y precisión deseables, y mayor complejidad. La necesidad de resolver estos problemas ocasiona la creciente utilización de sistemas informáticos paralelos y distribuidos [Perrot 92]. Por una parte se diseñan sistemas hardware con mayores prestaciones y que puedan ejecutar el código desarrollado de una forma más eficiente. Por otra, se desarrollan nuevos algoritmos numéricos [Kailath 80, Petkov 91] y estrategias de programación más adecuadas para obtener el mayor provecho de los computadores ya existentes [Dowd 93]. Uno de los temas cruciales en ingeniería es la modelización de sistemas dinámicos complejos. Sin embargo, para que sea factible el tratamiento de problemas reales, son necesarias aproximaciones que lo transformen en un problema más simple. Un sistema dinámico complejo suele ser no lineal, distribuido y variable en el tiempo, por lo que una de las primeras aproximaciones clásicamente realizadas es su transformación en un sistema lineal invariante en el tiempo. Los sistemas lineales invariantes en el tiempo resultantes de esta aproximación inicial suelen ser sistemas de gran tamaño que poseen un gran número de variables de estado [Skelton 88]. Por ello es necesario buscar modelos matemáticos más simples que aproximen al máximo el comportamiento del sistema original. Este modelo, que poseerá menor número de estados que el sistema original, se denomina modelo reducido o modelo de orden reducido y al procedimiento utilizado para conseguirlo reducción de modelos [Fortuna 92]. La reducción de modelos, que era un problema abierto en teoría de sistemas hace unos años, actualmente es uno de sus temas fundamentales [Skelton 88]. La aproximación usual para obtener modelos de orden reducido suele ser la misma para sistemas en tiempo continuo que para sistemas en 1 2 CAPÍTULO 1 EL PROBLEMA DE LA REDUCCIÓN DE MODELOS tiempo discreto. Nos referiremos normalmente al primer tipo de sistemas, ya que en la mayoría de los casos las conclusiones son extensibles sin excesiva dificultad para el caso de tiempo discreto. 1.1.1 Objetivos y justificación Las ecuaciones de Lyapunov tienen una importancia fundamental en muchos algoritmos de análisis y síntesis en teoría de control. Éstas aparecen de forma natural en problemas de control lineal regidos por ecuaciones diferenciales ordinarias de primer orden autónomas lineales (EDO).
Recommended publications
  • Through the Years… When Did It All Begin?
    & Through the years… When did it all begin? 1974? 1978? 1963? 2 CDC 6600 – 1974 NERSC started service with the first Supercomputer… ● A well-used system - Serial Number 1 ● On its last legs… ● Designed and built in Chippewa Falls ● Launch Date: 1963 ● Load / Store Architecture ● First RISC Computer! ● First CRT Monitor ● Freon Cooled ● State-of-the-Art Remote Access at NERSC ● Via 4 acoustic modems, manually answered capable of 10 characters /sec 3 50th Anniversary of the IBM / Cray Rivalry… Last week, CDC had a press conference during which they officially announced their 6600 system. I understand that in the laboratory developing this system there are only 32 people, “including the janitor”… Contrasting this modest effort with our vast development activities, I fail to understand why we have lost our industry leadership position by letting someone else offer the world’s most powerful computer… T.J. Watson, August 28, 1963 4 2/6/14 Cray Higher-Ed Roundtable, July 22, 2013 CDC 7600 – 1975 ● Delivered in September ● 36 Mflop Peak ● ~10 Mflop Sustained ● 10X sustained performance vs. the CDC 6600 ● Fast memory + slower core memory ● Freon cooled (again) Major Innovations § 65KW Memory § 36.4 MHz clock § Pipelined functional units 5 Cray-1 – 1978 NERSC transitions users ● Serial 6 to vector architectures ● An fairly easy transition for application writers ● LTSS was converted to run on the Cray-1 and became known as CTSS (Cray Time Sharing System) ● Freon Cooled (again) ● 2nd Cray 1 added in 1981 Major Innovations § Vector Processing § Dependency
    [Show full text]
  • Unix and Linux System Administration and Shell Programming
    Unix and Linux System Administration and Shell Programming Unix and Linux System Administration and Shell Programming version 56 of August 12, 2014 Copyright © 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010, 2011, 2012, 2013, 2014 Milo This book includes material from the http://www.osdata.com/ website and the text book on computer programming. Distributed on the honor system. Print and read free for personal, non-profit, and/or educational purposes. If you like the book, you are encouraged to send a donation (U.S dollars) to Milo, PO Box 5237, Balboa Island, California, USA 92662. This is a work in progress. For the most up to date version, visit the website http://www.osdata.com/ and http://www.osdata.com/programming/shell/unixbook.pdf — Please add links from your website or Facebook page. Professors and Teachers: Feel free to take a copy of this PDF and make it available to your class (possibly through your academic website). This way everyone in your class will have the same copy (with the same page numbers) despite my continual updates. Please try to avoid posting it to the public internet (to avoid old copies confusing things) and take it down when the class ends. You can post the same or a newer version for each succeeding class. Please remove old copies after the class ends to prevent confusing the search engines. You can contact me with a specific version number and class end date and I will put it on my website. version 56 page 1 Unix and Linux System Administration and Shell Programming Unix and Linux Administration and Shell Programming chapter 0 This book looks at Unix (and Linux) shell programming and system administration.
    [Show full text]
  • Tour De Hpcycles
    Tour de HPCycles Wu Feng Allan Snavely [email protected] [email protected] Los Alamos National San Diego Laboratory Supercomputing Center Abstract • In honor of Lance Armstrong’s seven consecutive Tour de France cycling victories, we present Tour de HPCycles. While the Tour de France may be known only for the yellow jersey, it also awards a number of other jerseys for cycling excellence. The goal of this panel is to delineate the “winners” of the corresponding jerseys in HPC. Specifically, each panelist will be asked to award each jersey to a specific supercomputer or vendor, and then, to justify their choices. Wu FENG [email protected] 2 The Jerseys • Green Jersey (a.k.a Sprinters Jersey): Fastest consistently in miles/hour. • Polka Dot Jersey (a.k.a Climbers Jersey): Ability to tackle difficult terrain while sustaining as much of peak performance as possible. • White Jersey (a.k.a Young Rider Jersey): Best "under 25 year-old" rider with the lowest total cycling time. • Red Number (Most Combative): Most aggressive and attacking rider. • Team Jersey: Best overall team. • Yellow Jersey (a.k.a Overall Jersey): Best overall supercomputer. Wu FENG [email protected] 3 Panelists • David Bailey, LBNL – Chief Technologist. IEEE Sidney Fernbach Award. • John (Jay) Boisseau, TACC @ UT-Austin – Director. 2003 HPCwire Top People to Watch List. • Bob Ciotti, NASA Ames – Lead for Terascale Systems Group. Columbia. • Candace Culhane, NSA – Program Manager for HPC Research. HECURA Chair. • Douglass Post, DoD HPCMO & CMU SEI – Chief Scientist. Fellow of APS. Wu FENG [email protected] 4 Ground Rules for Panelists • Each panelist gets SEVEN minutes to present his position (or solution).
    [Show full text]
  • Cray Supercomputers Past, Present, and Future
    Cray Supercomputers Past, Present, and Future Hewdy Pena Mercedes, Ryan Toukatly Advanced Comp. Arch. 0306-722 November 2011 Cray Companies z Cray Research, Inc. (CRI) 1972. Seymour Cray. z Cray Computer Corporation (CCC) 1989. Spin-off. Bankrupt in 1995. z Cray Research, Inc. bought by Silicon Graphics, Inc (SGI) in 1996. z Cray Inc. Formed when Tera Computer Company (pioneer in multi-threading technology) bought Cray Research, Inc. in 2000 from SGI. Seymour Cray z Joined Engineering Research Associates (ERA) in 1950 and helped create the ERA 1103 (1953), also known as UNIVAC 1103. z Joined the Control Data Corporation (CDC) in 1960 and collaborated in the design of the CDC 6600 and 7600. z Formed Cray Research Inc. in 1972 when CDC ran into financial difficulties. z First product was the Cray-1 supercomputer z Faster than all other computers at the time. z The first system was sold within a month for US$8.8 million. z Not the first system to use a vector processor but was the first to operate on data on a register instead of memory Vector Processor z CPU that implements an instruction set that operates on one- dimensional arrays of data called vectors. z Appeared in the 1970s, formed the basis of most supercomputers through the 80s and 90s. z In the 60s the Solomon project of Westinghouse wanted to increase math performance by using a large number of simple math co- processors under the control of a single master CPU. z The University of Illinois used the principle on the ILLIAC IV.
    [Show full text]
  • Supercomputers: the Amazing Race Gordon Bell November 2014
    Supercomputers: The Amazing Race Gordon Bell November 2014 Technical Report MSR-TR-2015-2 Gordon Bell, Researcher Emeritus Microsoft Research, Microsoft Corporation 555 California, 94104 San Francisco, CA Version 1.0 January 2015 1 Submitted to STARS IEEE Global History Network Supercomputers: The Amazing Race Timeline (The top 20 significant events. Constrained for Draft IEEE STARS Article) 1. 1957 Fortran introduced for scientific and technical computing 2. 1960 Univac LARC, IBM Stretch, and Manchester Atlas finish 1956 race to build largest “conceivable” computers 3. 1964 Beginning of Cray Era with CDC 6600 (.48 MFlops) functional parallel units. “No more small computers” –S R Cray. “First super”-G. A. Michael 4. 1964 IBM System/360 announcement. One architecture for commercial & technical use. 5. 1965 Amdahl’s Law defines the difficulty of increasing parallel processing performance based on the fraction of a program that has to be run sequentially. 6. 1976 Cray 1 Vector Processor (26 MF ) Vector data. Sid Karin: “1st Super was the Cray 1” 7. 1982 Caltech Cosmic Cube (4 node, 64 node in 1983) Cray 1 cost performance x 50. 8. 1983-93 Billion dollar SCI--Strategic Computing Initiative of DARPA IPTO response to Japanese Fifth Gen. 1990 redirected to supercomputing after failure to achieve AI goals 9. 1982 Cray XMP (1 GF) Cray shared memory vector multiprocessor 10. 1984 NSF Establishes Office of Scientific Computing in response to scientists demand and to counteract the use of VAXen as personal supercomputers 11. 1987 nCUBE (1K computers) achieves 400-600 speedup, Sandia winning first Bell Prize, stimulated Gustafson’s Law of Scalable Speed-Up, Amdahl’s Law Corollary 12.
    [Show full text]
  • HPC at NCAR: Past, Present and Future
    HPC at NCAR: Past, Present and Future Tom Engel Computational and Information Systems Laboratory National Center for Atmospheric Research 26 May 2010 ABSTRACT: The history of high-performance computing at NCAR is reviewed from Control Data Corporation’s 3600 through the current IBM p575 cluster and new Cray XT5m, but with special recognition of NCAR’s relationship with Seymour Cray and Cray Research, Inc. The recent acquisition of a Cray XT5m is discussed, along with the rationale for that acquisition. NCAR’s plans for the new NCAR-Wyoming Supercomputing Center in Cheyenne, Wyoming, and the current status of that construction project, are also described. KEYWORDS: Cray-1A, Cray-3, legacy systems, XT5m, NCAR supercomputer manufacturer NEC. However, the ACE 1. Introduction procurement was suspended when SGI/Cray filed a dumping complaint with the Commerce Department. A NCAR’s use of high-performance computing (HPC) lengthy legal battle ensued, with anti-dumping duties resources predates the coining of that term, as well as imposed upon Japanese supercomputers and subsequent many others in routine use in recent years, such as appeals, which culminated in a U.S. Supreme Court “supercomputer” and “Moore’s Law.” Indeed, NCAR has decision in February 1999 that let the dumping decision had a long relationship with the Cray name, whether it is stand. During the three-year hiatus, NCAR continued the man, his machines, his companies, or those companies necessary incremental augmentation of its HPC capacity keeping his namesake and legacy alive. From the CDC with various Cray, SGI and IBM systems. 6600, the first system installed at the NCAR mesa laboratory, to the Cray-1A and ultimately the Cray-3, In 2000, SCD issued the Advanced Research NCAR’s Computational and Information Systems Computing System (ARCS) RFP, hoping to recoup the Laboratory (CISL), formerly known as the Scientific HPC capacity losses that Moore’s Law had imposed Computing Division (SCD), used Seymour Cray’s systems during the dumping investigation.
    [Show full text]
  • CRAY T90 Series Differences: Tutorial
    CRAY T90 Series Differences: Tutorial Michael Burwell, Cray Research, Inc., Eagan, Minnesota, USA ABSTRACT: One of the flagship products of Cray Research, the CRAY T90 series of computer systems represents a different hardware architecture from other mainframe offerings. Differ- ences also exist in the way interprocess communication is handled on CRAY T90 systems. Addi- tionally, these systems use a variety of system configuration tools that are different from those operating on the rest of the Cray Research line of mainframe products. The tutorial summa- rized in this article provides an overview of these hardware architecture, interprocess commu- nication, and software configuration differences. The presentation concludes with a summary of user-level differences, particularly as compared with CRAY C90 series systems. Introduction Architecture Overview The CRAY T90 series of computer systems is the current The major hardware components of a CRAY T90 series generation of CRAY Research’s high-performance parallel system are as follows: vector processing (PVP) systems. They are the successors to the • Mainframe cabinet CRAY C90 series of high-end supercomputer systems. Because • Input/output subsystem (IOS) / solid-state storage device of the importance of the CRAY T90 series among the suite of (SSD) cabinet products offered by Cray Research, coupled with the fact that • Heat exchange unit (HEU) they are still relatively recent offerings, this tutorial is provided. • AC/DC converter (HVDC-40) Among the salient characteristics of CRAY T90
    [Show full text]
  • Transferring Ecosystem Simulation Codes to Supercomputers
    NASA Technical Memorandum 4662 ,'/','.-7"> , . Transferring Ecosystem Simulation Codes to Supercomputers J. W. Skiles and C. H. Schulbach (NASA-TM-46o2) TRANSFERRING N95-24uS2 ECOSYSTEM SIMULATION CODES TO SUPERCOMPUTERS (NASA. Ames Research Center) 40 p Unc|as HII61 0044586 L " .L v. NASA Technical Memorandum 4662 Transferring Ecosystem Simulation Codes to Supercomputers J. W. Skiles, Johnson Controls World Services, Inc., Cape Canaveral, Florida C. H. Schulbach, Ames Research Center, Moffett Field, California February1995 National Aeronautics and Space Administration Ames Research Center Moffett Field, California 94035-1000 CONTENTS Page Summary ................................................................................................................................................................... 1 Introduction ............................................................................................................................................................... Assumptions ...................................................................................................................................................... Computer Platform Description ................................................................................................................................ 3 Parallelism in Computer Architecture ............................................................................................................... 4 Flynn's Classification Scheme .........................................................................................................................
    [Show full text]
  • Jaguar: Powering and Cooling the Beast
    Jaguar: Powering and Cooling the Beast Buddy Bland 2009 Conference on High-Speed Computing The Salishan Lodge Gleneden Beach, Oregon April 30, 2009 Outline • Jaguar’s features for performance and efficiency • Historical overview of cooling systems on Cray’s computers • Implications for the future 2 Managed by UT-Battelle for the U.S. Department of Energy Salishan 2009 - Bland Outstanding launch for petascale computing in Office of Science and ORNL at SC’08 Only 41 days after assembly of a totally new 150,000 core system • Jaguar beat the previous #1 performance on Top500 with an application running over 18 hours on the entire system • Jaguar had two real applications running over 1 PF – DCA++ 1.35 PF Superconductivity problem – LSMS 1.05 PF Thermodynamics of magnetic nanoparticles problem 3 Managed by UT-Battelle for the U.S. Department of Energy Salishan 2009 - Bland Cray XT5 “Jaguar” is showing impressive stability Within days of delivery, the system was running 84 full system jobs for many hours 82 80 78 76 74 HPL 1.059 PF Percent Efficiency HPL 1.004 PF 72 70 68 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Hours 4 Managed by UT-Battelle for the U.S. Department of Energy Salishan 2009 - Bland Gordon Bell prize awarded to ORNL team Three of six GB finalist ran on Jaguar • A team led by ORNL’s Thomas Schulthess received the prestigious 2008 Association for Computing Machinery (ACM) Gordon Bell Prize at SC08 • For attaining fastest performance ever in a scientific supercomputing application • Simulation of superconductors achieved 1.352 petaflops on ORNL’s Cray XT Jaguar supercomputer • By modifying the algorithms and software design of the DCA++ code, the team was able to boost its performance tenfold Gordon Bell Finalists DCA++ ORNL LS3DF LBNL SPECFEM3D SDSC • RHEA TACC • SPaSM LANL • VPIC LANL 5 Managed by UT-Battelle for the U.S.
    [Show full text]
  • 20190228 Mobile-Barcelona-2019
    Mobiles: on the road to Supercomputers GoingDigital Community - Mobile World Congress 19 Prof. Mateo Valero BSC Director Feb 2019 Barcelona Supercomputing Center Centro Nacional de Supercomputación BSC-CNS objectives Supercomputing services R&D in Computer, PhD programme, to Spanish and Life, Earth and technology transfer, EU researchers Engineering Sciences public engagement Spanish Government 60% BSC-CNS is a consortium Catalonian Government 30% that includes Univ. Politècnica de Catalunya (UPC) 10% Technological Achievements • Transistor (Bell Labs, 1947) • DEC PDP-1 (1957) • IBM 7090 (1960) • Integrated circuit (1958) • IBM System 360 (1965) • DEC PDP-8 (1965) • Microprocessor (1971) • Intel 4004 Technology Trends: Microprocessor Capacity Moore’s Law 2X transistors/Chip Every 1.5 years Gordon Moore (co-founder of Called “Moore’s Law” Intel) predicted in 1965 that the transistor density of semiconductor Microprocessors have become chips would double roughly every smaller, denser, and more powerful. 18 months. Not just processors, bandwidth, storage, etc 6 In the beginning ... there were only supercomputers Built to order – Very few of them Special purpose hardware – Very expensive Control Data Cray-1 – 1975, 160 MFLOPS • 80 units, 5-8 M$ Cray X-MP – 1982, 800 MFLOPS Cray-2 – 1985, 1.9 GFLOPS Cray Y-MP – 1988, 2.6 GFLOPS ...Fortran+ Vectorizing Compilers 6 6 “Killer microprocessors” 10.000 Cray-1, Cray-C90 1000 NEC SX4, SX5 Alpha AV4, EV5 MFLOPS Intel Pentium 100 IBM P2SC HP PA8200 10 1974 1979 1984 1989 1994 1999 • Microprocessors killed the Vector supercomputers • They were not faster ... • ... but they were significantly cheaper and greener • 10 microprocessors approx. 1 Vector CPU • SIMD vs.
    [Show full text]
  • Vector Microprocessors
    Vector Microprocessors by Krste AsanoviÂc B.A. (University of Cambridge) 1987 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY of CALIFORNIA, BERKELEY Committee in charge: Professor John Wawrzynek, Chair Professor David A. Patterson Professor David Wessel Spring 1998 The dissertation of Krste AsanoviÂc is approved: Chair Date Date Date University of California, Berkeley Spring 1998 Vector Microprocessors Copyright 1998 by Krste AsanoviÂc 1 Abstract Vector Microprocessors by Krste AsanoviÂc Doctor of Philosophy in Computer Science University of California, Berkeley Professor John Wawrzynek, Chair Most previous research into vector architectures has concentrated on supercomputing applications and small enhancements to existing vector supercomputer implementations. This thesis expands the body of vector research by examining designs appropriate for single-chip full-custom vector microprocessor imple- mentations targeting a much broader range of applications. I present the design, implementation, and evaluation of T0 (Torrent-0): the ®rst single-chip vector microprocessor. T0 is a compact but highly parallel processor that can sustain over 24 operations per cycle while issuing only a single 32-bit instruction per cycle. T0 demonstrates that vector architectures are well suited to full-custom VLSI implementation and that they perform well on many multimedia and human-machine interface tasks. The remainder of the thesis contains proposals for future vector microprocessor designs. I show that the most area-ef®cient vector register ®le designs have several banks with several ports, rather than many banks with few ports as used by traditional vector supercomputers, or one bank with many ports as used by superscalar microprocessors.
    [Show full text]
  • LINPACK Benchmark?Benchmark?
    Center for Information Services and High Performance Computing (ZIH) Performance Analysis of Computer Systems Benchmarks: TOP 500, Stream, and HPCC Nöthnitzer Straße 46 Raum 1026 Tel. +49 351 - 463 - 35048 Holger Brunst ([email protected]) Matthias S. Mueller ([email protected]) Center for Information Services and High Performance Computing (ZIH) Summary of Previous Lecture Nöthnitzer Straße 46 Raum 1026 Tel. +49 351 - 463 - 35048 Holger Brunst ([email protected]) Matthias S. Mueller ([email protected]) Summary of Previous Lecture Different workloads: – Test workload – Real workload – Synthetic workload Historical examples for test workloads: – Addition instruction – Instruction mixes – Kernels – Synthetic programs – Application benchmarks Holger Brunst, Matthias Müller: Leistungsanalyse Excursion on Speedup and Efficiency Metrics Comparison of sequential and parallel algorithms Speedup: T1 Sn = Tn – n is the number of processors – T1 is the execution time of the sequential algorithm – Tn is the execution time of the parallel algorithm with n processors Efficiency: Sp E = p p – Its value estimates how well-utilized p processors solve a given problem – Usually between zero and one. Exception: Super linear speedup (later) Holger Brunst, Matthias Müller: Leistungsanalyse Amdahl’s Law Find the maximum expected improvement to an overall system when only part of the system is improved Serial execution time = s+p Parallel execution time = s+p/n s + p S = n p s + n – Normalizing with respect to serial time (s+p) = 1 results in: • Sn = 1/(s+p/n) – Drops off rapidly as serial fraction increases – Maximum speedup possible = 1/s, independent of n the number of processors! Bad news: If an application has only 1% serial work (s = 0.01) then you will never see a speedup greater than 100.
    [Show full text]