Optimization in Computational Systems Biology Via High Performance Computing Techniques

Total Page:16

File Type:pdf, Size:1020Kb

Optimization in Computational Systems Biology Via High Performance Computing Techniques Doctoral Thesis Optimization in computational systems biology via high performance computing techniques David Rodr´ıguez Penas 2017 Optimization in computational systems biology via high performance computing techniques David Rodr´ıguezPenas Doctoral Thesis March 2017 PhD Advisors: Julio Rodr´ıguezBanga Patricia Gonz´alezG´omez Ram´onDoallo Biempica PhD Program in Information Technology Research Dr. Julio Rodr´ıguezBanga Dra. Patricia Gonz´alezG´omez Profesor de investigaci´on Profesora Titular de Universidade Instituto de Investigaci´onsMari~nas Dpto. Enxe~nar´ıade Computadores Consejo Superior de Investigaciones Universidade de A Coru~na Cient´ıficas(CSIC). Dr. Ram´on Doallo Biempica Catedr´aticode Universidade Dpto. de Enxe~nar´ıade Computadores Universidade de A Coru~na CERTIFICAN Que a presente memoria titulada \Optimization in computational systems biology via high performance computing techniques" foi realizada por D. David Rodr´ıguez Penas baixo nosa direcci´onno Departamento de Enxe~nar´ıade Computadores da Universidade da Coru~na,e concl´ueas´ıa Tese Doutoral que presenta para optar ao grado de Doutor en Enxe~nar´ıaInform´aticacoa Menci´on de Doutor Internacional. En A Coru~na,a 30 de Marzo do 2017 Fdo.: Julio Rodr´ıguezBanga Fdo.: Patricia Gonz´alezG´omez Director da Tese Doutoral Directora da Tese Doutoral Fdo.: Ram´onDoallo Biempica Fdo.: David Rodr´ıguezPenas Director da Tese Doutoral Autor da Tese Doutoral A Tamara e a eses sobri~nos que est´anpor vir: Sergio e Marinha. Agradecementos En primeiro lugar, gustar´ıame especialmente agradecer ´osmeus directores Julio, Patricia e Ram´on, tanto a oportunidade que me deron para desenvolver a presente tese, como a dedicaci´on,axuda e apoio que me ofreceron durante todos estes anos. Foi un honor traballar ´ovoso lado, e o m´aisprezado que me levo desta tese ´et´odalas cousas que aprend´ınconvosco. Por outro lado, tam´enquero reco~necera t´odolosmeus compa~neirosdo Grupo de Enxe~nar´ıade Procesos do Instituto de Investigaci´onsMari~nas, toda a axuda e apoio que me ofreceron durante todo este tempo. Quedo moi agradecido a todos v´ospola convivencia do d´ıa a d´ıa,o compa~neirismo, e os bos momentos vividos. Tam´enquero facer referencia a toda a axuda recibida por: David Henriques, compa~neirodo grupo co que m´aistraballei e que me proporcionou os problemas de biolox´ıade sistemas a optimizar nesta tese; ´oprofesor Jose Egea, tanto polas s´uascontribuci´onsdurante a creaci´ondos m´etodos propostos no Cap´ıtulo3, como polo seu asesoramento en toda a estad´ısticautilizada na presente tese; e tam´en´o profesor Julio S´aez,tanto pola colaboraci´ondurante a elaboraci´ondo Cap´ıtulo4, como a estupenda acollida e trato recibido por parte del e dos membros do seu grupo durante a mi~naestancia en Aquisgr´an. Quero engadir os meus agradecementos ´oGrupo de Arquitectura de Computa- dores da Universidade da Coru~na,e ´oDepartamento de Inform´aticado Instituto de Investigaci´onsMari~naspertencente ao CSIC en Vigo, polo uso da s´uainfraestrutura e todo o soporte recibido. E sobre todo, m´aisal´ado terreo acad´emico,te~noque dar as grazas a mi~na familia por todo ese ´animoe cari~noque me deron desinteresadamente durante todo vii viii este tempo. Primeiro a mi~naTamara, m´aximo apoio dende o inicio e at´eo final desta aventura; ti fuches e ser´asa mi~nam´aximamotivaci´on.Tam´en´osmeus pais, por padecer as mi~nasfrustraci´onse aledarse dos meus logros; sempre ser´anos meus mellores mestres e co seu apoio incondicional todo ´emoito m´aisf´acil.E por ´ultimo, ´osmeus irm´ans,os cales durante toda a mi~navida son e ser´anun exemplo a seguir. Por ´ultimo,agradezo ´osseguintes organismos o financiamento e apoio a este tra- ballo: Ministerio de Econom´ıae Competitividade, a trav´esdos proxectos DPI2011- 28112-C04, DPI2014-55276-C5, TIN2013-42148-P e TIN2016-75845-P (todos cofi- nanciados con fondos FEDER), e do programa de axudas para a formaci´on de per- soal de investigaci´onFPI (Ref: BES-2012-059201, vinculada ´oproxecto DPI2011- 28112-C04); Xunta de Galicia, polas axudas para a consolidaci´one estruturaci´onde unidades de investigaci´oncompetitivas (Redes ref. R2014/041 e ref. R2016/045, e os Grupos de Referencia Competitiva GRC2013/055); Centro de Supercomputaci´on de Galicia (CESGA) e European Bioinformatics Institute (EBI), polo acceso ´osseus recursos de computaci´on;e Microsoft Research, polo acceso ´aplataforma Microsoft Azure a trav´esdunha conta patrocinada. Resumo O obxectivo da biolox´ıade sistemas computacional ´exerar co~necemento sobre complexos sistemas biol´oxicosa trav´esda combinaci´onde datos experimentais con modelos matem´aticose t´ecnicasavanzadas de computaci´on.O desenvolvemento de modelos din´amicos(cin´eticos),tam´enco~necidoscomo enxe~ner´ıainversa, ´eun dos temas chave nesta ´area.Nos ´ultimosanos, moitas investigaci´onscentr´aronseno es- calado destes modelos, facendo da estimaci´ondos par´ametrosdestes modelos, tam´en co~necidacomo calibraci´onde modelos, unha tarefa complexa. Esa complexidade re- quire o uso de ferramentas e m´etodos eficientes para acadar bos resultados nun tem- po c´alculorazoable. En xeral, para resolver este tipo de problemas ´usanse m´etodos de optimizaci´onglobal, e en particular as metaheur´ısticasxurdiron como m´etodos eficientes para resolver os problemas m´aiscustosos. Con todo, para a maior´ıadas aplicaci´onsreais, as metaheur´ısticasa´ındarequiren moito tempo de c´alculopara obter resultados acept´abeis. Nesta tese pres´entase o dese~no,implementaci´one avaliaci´onde novas metaheur´ısti- cas paralelas, especializadas en resolver problemas de estimaci´onde par´ametrosden- tro do contexto da biolox´ıade sistema. En concreto, prop´o~nensenovas metaheur´ısti- cas baseadas nos algoritmos de avaliaci´ondiferencial e de procura dispersa. As no- vas propostas te~nencomo obxectivo acadar un equilibrio entre as capacidades de exploraci´one explotaci´ondos algoritmos. Ademais, demostran como a cooperaci´on entre procuras concorrentes mellora o comportamento dos algoritmos, mellorando a calidade das soluci´onse diminu´ındo o tempo de execuci´on.Tam´enestud´aron- se estratexias adaptativas para aumentar a robustez das propostas. Na avaliaci´on us´aronsetanto arquitecturas HPC tradicionais como novas infraestruturas na nube. Obtiv´eronsemoi bos resultados con problemas de optimizaci´onde grande dimensi´on e complexidade. ix Resumen El objetivo de la biolog´ıade sistemas computacional es generar conocimiento sobre complejos sistemas biol´ogicosmediante la combinaci´onde datos experimen- tales con modelos matem´aticosy t´ecnicasavanzadas de computaci´on.El desarrollo de modelos din´amicos(cin´eticos),tambi´enconocido como ingenier´ıainversa, es uno de los temas clave en este campo. En los ´ultimosa~nos, ha surgido un gran inter´es en el escalado de estos modelos cin´eticos,haciendo de la estimaci´onde par´ametros, tambi´enconocida como calibraci´onde modelos, una tarea con una gran dificultad, que requiere el uso de herramientas y m´etodos eficientes para alcanzar buenos re- sultados en un tiempo razonable. En general, para resolver estos problemas se usan m´etodos de optimizaci´onglobal. En concreto, las metaheur´ısticassurgen como al- goritmos eficientes a ser utilizado en los problemas m´ascomplejos. Sin embargo, en la mayor´ıade las aplicaciones reales, las metaheur´ısticastodav´ıarequieren mucho tiempo de c´alculopara obtener resultados aceptables. Esta tesis presenta el dise~no, implementaci´ony evaluaci´onde nuevas metaheur´ıs- ticas paralelas, especializadas sobretodo en resolver problemas de estimaci´onde par´a- metros en biolog´ıade sistemas. En concreto, se proponen nuevas metaheur´ısticasba- sadas en los algoritmos de evoluci´ondiferencial y de b´usquedadispersa. Las nuevas propuestas tienen como objetivo lograr un equilibrio entre las capacidades de explo- raci´ony explotaci´onde los algoritmos. Adem´as,demuestran como la cooperaci´on entre b´usquedasconcurrentes mejora el comportamiento del algoritmo, mejorando la calidad de las soluciones y disminuyendo el tiempo de ejecuci´on.Tambi´ense han estudiado estrategias adaptativas para aumentar la robustez de las propuestas. Pa- ra la evaluaci´onse han usado tanto arquitecturas HPC tradicionales como nuevas infraestructuras en la nube. Se han obtenido muy buenos resultados en problemas de gran dimensi´ony complejidad. xi Abstract The aim of computational systems biology is to generate new knowledge and understanding about complex biological systems by combining experimental data with mathematical modeling and advanced computational techniques. The devel- opment of dynamic models (also known as reverse engineering) is one of the current key issues in this area. In recent years, research has been focused on scaling-up these kinetic models. In this context, the problem of parameter estimation (model calibration) remains a very challenging task. The complexity of the underlying models requires the use of efficient solvers to achieve adequate results in reasonable computation times. Global optimization methods are used to solve these types of problems. In particular, metaheuristics have emerged as an efficient way of solving these hard global optimization problems. However, in most realistic applications, metaheuristics still require a large computation time to obtain acceptable results. This Thesis presents the design, implementation and evaluation of novel parallel metaheuristics with the focus on parameter estimation problems in computational systems biology. In particular, we propose new cooperative
Recommended publications
  • The Importance of Data
    The landscape of Parallel Programing Models Part 2: The importance of Data Michael Wong and Rod Burns Codeplay Software Ltd. Distiguished Engineer, Vice President of Ecosystem IXPUG 2020 2 © 2020 Codeplay Software Ltd. Distinguished Engineer Michael Wong ● Chair of SYCL Heterogeneous Programming Language ● C++ Directions Group ● ISOCPP.org Director, VP http://isocpp.org/wiki/faq/wg21#michael-wong ● [email protected][email protected] Ported ● Head of Delegation for C++ Standard for Canada Build LLVM- TensorFlow to based ● Chair of Programming Languages for Standards open compilers for Council of Canada standards accelerators Chair of WG21 SG19 Machine Learning using SYCL Chair of WG21 SG14 Games Dev/Low Latency/Financial Trading/Embedded Implement Releasing open- ● Editor: C++ SG5 Transactional Memory Technical source, open- OpenCL and Specification standards based AI SYCL for acceleration tools: ● Editor: C++ SG1 Concurrency Technical Specification SYCL-BLAS, SYCL-ML, accelerator ● MISRA C++ and AUTOSAR VisionCpp processors ● Chair of Standards Council Canada TC22/SC32 Electrical and electronic components (SOTIF) ● Chair of UL4600 Object Tracking ● http://wongmichael.com/about We build GPU compilers for semiconductor companies ● C++11 book in Chinese: Now working to make AI/ML heterogeneous acceleration safe for https://www.amazon.cn/dp/B00ETOV2OQ autonomous vehicle 3 © 2020 Codeplay Software Ltd. Acknowledgement and Disclaimer Numerous people internal and external to the original C++/Khronos group, in industry and academia, have made contributions, influenced ideas, written part of this presentations, and offered feedbacks to form part of this talk. But I claim all credit for errors, and stupid mistakes. These are mine, all mine! You can’t have them.
    [Show full text]
  • (CITIUS) Phd DISSERTATION
    UNIVERSITY OF SANTIAGO DE COMPOSTELA DEPARTMENT OF ELECTRONICS AND COMPUTER SCIENCE CENTRO DE INVESTIGACION´ EN TECNOLOX´IAS DA INFORMACION´ (CITIUS) PhD DISSERTATION PERFORMANCE COUNTER-BASED STRATEGIES TO IMPROVE DATA LOCALITY ON MULTIPROCESSOR SYSTEMS: REORDERING AND PAGE MIGRATION TECHNIQUES Author: Juan Angel´ Lorenzo del Castillo PhD supervisors: Prof. Francisco Fernandez´ Rivera Dr. Juan Carlos Pichel Campos Santiago de Compostela, October 2011 Prof. Francisco Fernandez´ Rivera, professor at the Computer Architecture Group of the University of Santiago de Compostela Dr. Juan Carlos Pichel Campos, researcher at the Computer Architecture Group of the University of Santiago de Compostela HEREBY CERTIFY: That the dissertation entitled Performance Counter-based Strategies to Improve Data Locality on Multiprocessor Systems: Reordering and Page Migration Techniques has been developed by Juan Angel´ Lorenzo del Castillo under our direction at the Department of Electronics and Computer Science of the University of Santiago de Compostela in fulfillment of the requirements for the Degree of Doctor of Philosophy. Santiago de Compostela, October 2011 Francisco Fernandez´ Rivera, Profesor Catedratico´ de Universidad del Area´ de Arquitectura de Computadores de la Universidad de Santiago de Compostela Juan Carlos Pichel Campos, Profesor Doctor del Area´ de Arquitectura de Computadores de la Universidad de Santiago de Compostela HACEN CONSTAR: Que la memoria titulada Performance Counter-based Strategies to Improve Data Locality on Mul- tiprocessor Systems: Reordering and Page Migration Techniques ha sido realizada por D. Juan Angel´ Lorenzo del Castillo bajo nuestra direccion´ en el Departamento de Electronica´ y Computacion´ de la Universidad de Santiago de Compostela, y constituye la Tesis que presenta para optar al t´ıtulo de Doctor por la Universidad de Santiago de Compostela.
    [Show full text]
  • Intel® Oneapi Programming Guide
    Intel® oneAPI Programming Guide Intel Corporation www.intel.com Notices and Disclaimers Contents Notices and Disclaimers....................................................................... 5 Chapter 1: Introduction oneAPI Programming Model Overview ..........................................................7 Data Parallel C++ (DPC++)................................................................8 oneAPI Toolkit Distribution..................................................................9 About This Guide.......................................................................................9 Related Documentation ..............................................................................9 Chapter 2: oneAPI Programming Model Sample Program ..................................................................................... 10 Platform Model........................................................................................ 14 Execution Model ...................................................................................... 15 Memory Model ........................................................................................ 17 Memory Objects.............................................................................. 19 Accessors....................................................................................... 19 Synchronization .............................................................................. 20 Unified Shared Memory.................................................................... 20 Kernel Programming
    [Show full text]
  • Executable Modelling for Highly Parallel Accelerators
    Executable Modelling for Highly Parallel Accelerators Lorenzo Addazi, Federico Ciccozzi, Bjorn¨ Lisper School of Innovation, Design, and Engineering Malardalen¨ University -Vaster¨ as,˚ Sweden florenzo.addazi, federico.ciccozzi, [email protected] Abstract—High-performance embedded computing is develop- development are lagging behind. Programming accelerators ing rapidly since applications in most domains require a large and needs device-specific expertise in computer architecture and increasing amount of computing power. On the hardware side, low-level parallel programming for the accelerator at hand. this requirement is met by the introduction of heterogeneous systems, with highly parallel accelerators that are designed to Programming is error-prone and debugging can be very dif- take care of the computation-heavy parts of an application. There ficult due to potential race conditions: this is particularly is today a plethora of accelerator architectures, including GPUs, disturbing as many embedded applications, like autonomous many-cores, FPGAs, and domain-specific architectures such as AI vehicles, are safety-critical, meaning that failures may have accelerators. They all have their own programming models, which lethal consequences. Furthermore software becomes hard to are typically complex, low-level, and involve explicit parallelism. This yields error-prone software that puts the functional safety at port between different kinds of accelerators. risk, unacceptable for safety-critical embedded applications. In A convenient solution is to adopt a model-driven approach, this position paper we argue that high-level executable modelling where the computation-intense parts of the application are languages tailored for parallel computing can help in the software expressed in a high-level, accelerator-agnostic executable mod- design for high performance embedded applications.
    [Show full text]
  • Programming Multicores: Do Applications Programmers Need to Write Explicitly Parallel Programs?
    [3B2-14] mmi2010030003.3d 11/5/010 16:48 Page 2 .......................................................................................................................................................................................................................... PROGRAMMING MULTICORES: DO APPLICATIONS PROGRAMMERS NEED TO WRITE EXPLICITLY PARALLEL PROGRAMS? .......................................................................................................................................................................................................................... IN THIS PANEL DISCUSSION FROM THE 2009 WORKSHOP ON COMPUTER ARCHITECTURE Arvind RESEARCH DIRECTIONS,DAVID AUGUST AND KESHAV PINGALI DEBATE WHETHER EXPLICITLY Massachusetts Institute PARALLEL PROGRAMMING IS A NECESSARY EVIL FOR APPLICATIONS PROGRAMMERS, of Technology ASSESS THE CURRENT STATE OF PARALLEL PROGRAMMING MODELS, AND DISCUSS David August POSSIBLE ROUTES TOWARD FINDING THE PROGRAMMING MODEL FOR THE MULTICORE ERA. Princeton University Keshav Pingali Moderator’s introduction: Arvind in the 1970s and 1980s, when two main Do applications programmers need to write approaches were developed. Derek Chiou explicitly parallel programs? Most people be- The first approach required that the com- lieve that the current method of parallel pro- pilers do all the work in finding the parallel- University of Texas gramming is impeding the exploitation of ism. This was often referred to as the ‘‘dusty multicores. In other words, the number of decks’’ problem—that is, how to
    [Show full text]
  • An Outlook of High Performance Computing Infrastructures for Scientific Computing
    An Outlook of High Performance Computing Infrastructures for Scientific Computing [This draft has mainly been extracted from the PhD Thesis of Amjad Ali at Center for Advanced Studies in Pure and Applied Mathematics (CASPAM), Bahauddin Zakariya University, Multan, Pakitsan 60800. ([email protected])] In natural sciences, two conventional ways of carrying out studies and reserach are: theoretical and experimental.Thefirstapproachisabouttheoraticaltreatmment possibly dealing with the mathematical models of the concerning physical phenomena and analytical solutions. The other approach is to perform physical experiments to carry out studies. This approach is directly favourable for developing the products in engineering. The theoretical branch, although very rich in concepts, has been able to develop only a few analytical methods applicable to rather simple problems. On the other hand the experimental branch involves heavy costs and specific arrangements. These shortcomings with the two approaches motivated the scientists to look around for a third or complementary choice, i.e., the use of numerical simulations in science and engineering. The numerical solution of scientific problems, by its very nature, requires high computational power. As the world has been continuously en- joying substantial growth in computer technologies during the past three decades, the numerical approach appeared to be more and more pragmatic. This constituted acompletenewbranchineachfieldofscience,oftentermedasComputational Science,thatisaboutusingnumericalmethodsforsolvingthemathematicalformu- lations of the science problems. Developments in computational sciences constituted anewdiscipline,namelytheScientific Computing,thathasemergedwiththe spirit of obtaining qualitative predictions through simulations in ‘virtual labora- tories’ (i.e., the computers) for all areas of science and engineering. The scientific computing, due to its nature, exists at the overlapping region of the three disciplines, mathematical modeling, numerical mathematics and computer science,asshownin Fig.
    [Show full text]
  • Performance Tuning Workshop
    Performance Tuning Workshop Samuel Khuvis Scientifc Applications Engineer, OSC February 18, 2021 1/103 Workshop Set up I Workshop – set up account at my.osc.edu I If you already have an OSC account, sign in to my.osc.edu I Go to Project I Project access request I PROJECT CODE = PZS1010 I Reset your password I Slides are the workshop website: https://www.osc.edu/~skhuvis/opt21_spring 2/103 Outline I Introduction I Debugging I Hardware overview I Performance measurement and analysis I Help from the compiler I Code tuning/optimization I Parallel computing 3/103 Introduction 4/103 Workshop Philosophy I Aim for “reasonably good” performance I Discuss performance tuning techniques common to most HPC architectures I Compiler options I Code modifcation I Focus on serial performance I Reduce time spent accessing memory I Parallel processing I Multithreading I MPI 5/103 Hands-on Code During this workshop, we will be using a code based on the HPCCG miniapp from Mantevo. I Performs Conjugate Gradient (CG) method on a 3D chimney domain. I CG is an iterative algorithm to numerically approximate the solution to a system of linear equations. I Run code with srun -n <numprocs> ./test_HPCCG nx ny nz, where nx, ny, and nz are the number of nodes in the x, y, and z dimension on each processor. I Download with: wget go . osu .edu/perftuning21 t a r x f perftuning21 I Make sure that the following modules are loaded: intel/19.0.5 mvapich2/2.3.3 6/103 More important than Performance! I Correctness of results I Code readability/maintainability I Portability -
    [Show full text]
  • Regent: a High-Productivity Programming Language for Implicit Parallelism with Logical Regions
    REGENT: A HIGH-PRODUCTIVITY PROGRAMMING LANGUAGE FOR IMPLICIT PARALLELISM WITH LOGICAL REGIONS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Elliott Slaughter August 2017 © 2017 by Elliott David Slaughter. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/mw768zz0480 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Alex Aiken, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Philip Levis I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Oyekunle Olukotun Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost for Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii Abstract Modern supercomputers are dominated by distributed-memory machines. State of the art high-performance scientific applications targeting these machines are typically written in low-level, explicitly parallel programming models that enable maximal performance but expose the user to programming hazards such as data races and deadlocks.
    [Show full text]
  • Introduction to Parallel Processing
    IntroductionIntroduction toto ParallelParallel ProcessingProcessing • Parallel Computer Architecture: Definition & Broad issues involved – A Generic Parallel ComputerComputer Architecture • The Need And Feasibility of Parallel Computing Why? – Scientific Supercomputing Trends – CPU Performance and Technology Trends, Parallelism in Microprocessor Generations – Computer System Peak FLOP Rating History/Near Future • The Goal of Parallel Processing • Elements of Parallel Computing • Factors Affecting Parallel System Performance • Parallel Architectures History – Parallel Programming Models – Flynn’s 1972 Classification of Computer Architecture • Current Trends In Parallel Architectures – Modern Parallel Architecture Layered Framework • Shared Address Space Parallel Architectures • Message-Passing Multicomputers: Message-Passing Programming Tools • Data Parallel Systems • Dataflow Architectures • Systolic Architectures: Matrix Multiplication Systolic Array Example PCA Chapter 1.1, 1.2 EECC756 - Shaaban #1 lec # 1 Spring 2012 3-13-2012 ParallelParallel ComputerComputer ArchitectureArchitecture A parallel computer (or multiple processor system) is a collection of communicating processing elements (processors) that cooperate to solve large computational problems fast by dividing such problems into parallel tasks, exploiting Thread-Level Parallelism (TLP). i.e Parallel Processing • Broad issues involved: Task = Computation done on one processor – The concurrency and communication characteristics of parallel algorithms for a given computational
    [Show full text]
  • HP-CAST 20 Final Agenda V3.0 (Sessions and Chairs Are Subject to Change Without Notice)
    HP Consortium for Advanced Scientific and Technical Computing World-Wide User Group Meeting ISS Hyperscale and HPC Organization The Westin Hotel Leipzig HP-CAST 20 Final Agenda V3.0 (Sessions and chairs are subject to change without notice) Supported by: Thursday, June 13th – Registration & Get-Together 18:00 – 22:00 Registration 18:00 – 22:00 Welcome Reception for All Attendees – Sky Bar FALCO at the Westin Hotel Friday, June 14th – Conference Section 08:00 – 18:00 Registration Board Introduction and HP Executive Updates HP-Liaison Frank Baetke, HP 08:15 – 09:20 HP-CAST President Rudolf Lohner, KIT/SCC HP Executive Updates on HPC Trends Paul Santeler, HP Scott Misage, HP Invited Partner Keynotes and Updates 09:20 – 09:50 High-Performance Computing Trends in the Georg Scheuerer, ANSYS Europe Manufacturing Industry 09:50 – 10:00 Making HPC Simulation Access Easier Benoît Vautrin, OVH (Oxalya) France 10:00 – 10:30 Break Invited Customer Keynotes 10:30 – 11:00 IT Infrastructure for HPC and Challenges of HPC Wolfgang Burke, BMW Germany Collocation at BMW 11:00 – 11:20 Deployment of HP Clusters in Iceland Guõbrandur R. Sigurõsson, Opin Kerfi Iceland 11:20 – 11:30 Q&A Session Short Processor Technology Updates Roberto Dognini, AMD Karl Freund, Calxeda (ARM) 11:30 – 12:30 John Hengeveld, Intel Timothy Lanfear, NVIDIA Sanjay Bhal, TI (ARM/DSP) 12:30 – 13:30 Lunch Friday, June 14th – Conference Section Attention: ATTENDANCE RESTRICTIONS APPLY Platforms and Products – #1 In Detail Roadmaps & Updates for Ed Turkel, HP 13:30 – 14:00 - Next Generation
    [Show full text]
  • Presentación De Powerpoint
    CENTRO DE SUPERCOMPUTACIÓN DE GALICIA High Scalability Multipole Method. Solving Half Billion of Unknowns J.C. Mouriño¹, A. Gómez¹, J.M. Taboada², L. Landesa², J.M. Bértolo³, F. Obelleiro³, J.L. Rodríguez³ ¹ Supercomputing Center of Galicia (CESGA), Spain ² University of Extremadura, Spain ³ University of Vigo, Spain International Supercomputing Conference Hamburg, June 23th 2009 Outline • Introduction • FMM-FFT algorithm • Parallel implementation • HEMCUVE. Challenge history • Experimental Results • Conclusions 2 Introduction • Numerical solution of the integro-differential electromagnetic equations -> important in industry • High frequency in large objects -> scalability limits – 0.78 m² at 3 GHz -> 15000 unknowns – real car at 79 GHz -> 400 million unknowns • Great effort reducing computational time of MoM • Fast Multipole Method (FMM) -> O(N³/²) good scalability • Multilevel FMM (MLFMA) -> O(N log N) poor scalability • Modern HPC systems have thousands of cores • FMM-FFT reduces complexity preserving scalability 3 Introduction • FMM – The geometry is partitioned into M groups using an oct-tree decomposition algorithm – Parallelization: • k-space samples are independent from one another • The workload can be partitioned in the k-space • MLFMA – The same algorithm can be recursively applied in a hierarchical multilevel oct-tree decomposition – Computational cost reduced to O(N log N) – Parallelization: • k-space samples are not independent due to interpolation/ anterpolation of fields across levels • Usually workload distributed by
    [Show full text]
  • CIS 501 Computer Architecture This Unit: Shared Memory
    This Unit: Shared Memory Multiprocessors App App App • Thread-level parallelism (TLP) System software • Shared memory model • Multiplexed uniprocessor CIS 501 Mem CPUCPU I/O CPUCPU • Hardware multihreading CPUCPU Computer Architecture • Multiprocessing • Synchronization • Lock implementation Unit 9: Multicore • Locking gotchas (Shared Memory Multiprocessors) • Cache coherence Slides originally developed by Amir Roth with contributions by Milo Martin • Bus-based protocols at University of Pennsylvania with sources that included University of • Directory protocols Wisconsin slides by Mark Hill, Guri Sohi, Jim Smith, and David Wood. • Memory consistency models CIS 501 (Martin): Multicore 1 CIS 501 (Martin): Multicore 2 Readings Beyond Implicit Parallelism • Textbook (MA:FSPTCM) • Consider “daxpy”: • Sections 7.0, 7.1.3, 7.2-7.4 daxpy(double *x, double *y, double *z, double a): for (i = 0; i < SIZE; i++) • Section 8.2 Z[i] = a*x[i] + y[i]; • Lots of instruction-level parallelism (ILP) • Great! • But how much can we really exploit? 4 wide? 8 wide? • Limits to (efficient) super-scalar execution • But, if SIZE is 10,000, the loop has 10,000-way parallelism! • How do we exploit it? CIS 501 (Martin): Multicore 3 CIS 501 (Martin): Multicore 4 Explicit Parallelism Multiplying Performance • Consider “daxpy”: • A single processor can only be so fast daxpy(double *x, double *y, double *z, double a): • Limited clock frequency for (i = 0; i < SIZE; i++) • Limited instruction-level parallelism Z[i] = a*x[i] + y[i]; • Limited cache hierarchy • Break it
    [Show full text]