Evolution of Programming Approaches for High-Performance Heterogeneous Systems

Total Page:16

File Type:pdf, Size:1020Kb

Evolution of Programming Approaches for High-Performance Heterogeneous Systems Evolution of Programming Approaches for High-Performance Heterogeneous Systems Jacob Lambert University of Oregon [email protected] Advisor: Allen D. Malony University of Oregon [email protected] External Advisor: Seyong Lee Oak Ridge National Lab [email protected] Area Exam Report Committee Members: Allen Malony, Boyana Norris, Hank Childs Computer Science University of Oregon United States December 14, 2020 Evolution of Programming Approaches for High-Performance Heterogeneous Systems ABSTRACT they still were created to address the same challenges: using opti- Nearly all contemporary high-performance systems rely on hetero- mized hardware to execute specific algorithmic patterns. geneous computation. As a result, scientific application developers The Partitionable SIMD/MIMD System (PASM) [270] machine are increasingly pushed to explore heterogeneous programming developed at Purdue University in 1981 was initially developed for approaches. In this project, we discuss the long history of hetero- image processing and pattern recognition application. PASM was geneous computing and analyze the evolution of heterogeneous unique in that it could be dynamically reconfigured into either a programming approaches, from distributed systems to grid com- SIMD or MIMD machine, or a combination thereof. The goal was puting to accelerator-based supercomputers. to create a machine that could be optimized for different image processing and pattern recognition tasks, configuring either more SIMD or MIMD capabilities depending on the requirements of the application. 1 INTRODUCTION However, like many early heterogeneous computing systems, Heterogeneous computing is paramount to today’s high-performance programmability was not the primary concern. The programming systems. The top and next generation of supercomputers all employ environment for PASM required the design of a new procedure- heterogeneity, and even desktop workstations can be configured based structured language similar to TRANQUIL [2], the develop- to utilize heterogeneous execution. The explosion of activity and ment of a custom compiler, and even the development of a custom interest in heterogeneous computing, as well as the exploration operating system. and development of heterogeneous programming approaches, may Another early heterogeneous system was TRAC, the Texas Re- seem like a recent trend. However, heterogeneous programming configurable Array Computer [264], built in 1980. Like PASM, TRAC has been a topic of research and discussion for nearly four decades. could weave between SIMD and MIMD execution modes. But also Many of the issues faced by contemporary heterogeneous pro- like PASM, programmability was not a primary or common concern gramming approach designers have long histories, and have many with the TRAC machine, as it relied on now-arcane Job Control connections with now antiquated projects. Languages and APL source code [197]. In this project, we explore the evolution and history of hetero- The lack of focus on programming approaches for early heteroge- geneous computing, with a focus on the development of heteroge- neous systems is evident in some ways by the difficulty in finding neous programming approaches. In Section 2, we do a deep dive information on how the machines were typically programmed. into the field of distributed heterogeneous programming, the first However, as the availability of heterogeneous computing environ- application of hardware heterogeneity in computing. In Section 3, ments increased throughout the 1990s, so did the research and we briefly explore the resolutions of distributed heterogeneous development of programming environments. systems and approaches, and discuss the transitional period for Throughout the 80s and early 90s, this environment expanded the field of heterogeneous computing. In Section 4, we provide a to include vector processors, scalar processors, graphics machines, broad exploration into contemporary accelerator-based heteroge- etc. To this end, in this first major section we explore distributed neous computing, specifically analyzing the different programming heterogeneous computing. approaches developed and employed across different accelerator Although the first heterogeneous machines consisted of mixed- architectures. Finally, in Section 5, we take a zoomed-out look at mode machines like PASM and TRAC, mixed-machine heteroge- the development of heterogeneous programming approaches, in- neous systems became the more popular and accessible option trospect on some important takeaways and topics, and speculate throughout the 1990s. Instead of a single machine with the ability about the future of next-generation heterogeneous systems. to switch between a synchronous SIMD mode and an asynchronous MIMD mode, mixed-machine systems contained a variety of differ- 2 DISTRIBUTED HETEROGENEOUS ent processing machines connected by a high-speed interconnect. Examples of machines used in mixed-machine systems include SYSTEMS 1980 - 1995 graphics and rendering-specific machines like the Pixel Planes 5, Even 40 years ago, computer scientists realized heterogeneity was Silicon Graphics 340 VGX, SIMD and vector machines like the needed due to diminishing returns in homogeneous systems. In MasPar MP-series and the CM 200/2000, and coarse grained MIMD the literature, the first references to the term "heterogeneous com- machines like the CM-5, Vista, and Sequent machines. puting" revolved around the distinction between single instruc- It was well understood that different classes of machines (SIMD, tion, multiple data (SIMD) and multiple instruction, multiple data MIMD, vector, graphics, sequential) excelled at different tasks (par- (MIMD) machines in a distributed computing environment. allel computation, statistical analysis, rendering, display), and that Several machines dating back to the 1980s were created and these machines could be networked together in a single system. advertised as heterogeneous computers. Although these machines However, coordinating these distributed systems to execute a single were conceptually different than today’s heterogeneous machines, application presented significant challenges, which many of the early surveyed works related to distributed heterogeneous com- projects in the next section began to address. puting, and they heavily influenced the heterogeneous systems In this section, we explore different programming frameworks created and heterogeneous software and programming approaches developed to utilize these distributed heterogeneous systems. In used. Ercegovac [106] lists how, at the time, the three different ap- Section 2.1, we review several surveys to gain a contextualized proaches were combined in different ways to form the five following insight into the research consensus during the time period. Then in heterogeneous approaches: Section 2.2, we review the most prominent and impactful program- (1) Mainframes with integrated vector units, programmed using ming systems introduced during this time. Finally in Section 2.3 we a single instruction set augmented by vector instructions. discuss the evolution of distributed heterogeneous computing, and (2) Vector processors having two distinct types of instructions how it relates to the subsequent sections. and processors, scalar and vector. An early example includes the SAXPY system, which could be classified as a Matrix 2.1 Distributed Heterogeneous Architectures, Processing Unit. Concepts, and Themes (3) Specialized processors attached to the host machine (AP). This approach closely resembles accelerator-based heteroge- For insight into high-level perspectives, opinions, and the general neous computing, the subject of Section 4. The ST-100 and state of the area of early distributed heterogeneous computing, we ST-50 are early examples of this approach. include discussions from several survey works published during (4) Multiprocessor Systems with vector processors as nodes, the targeted time period. We aim to extract general trends and or scalar processors augmented with vector units as nodes. overarching concepts that drove the development of early systems For example in PASM, mentioned earlier in this Section, the and early heterogeneous programming approaches. operating system supported multi-tasking at the FORTRAN The work by Ercegovac [106], Heterogeneity in Supercomputer Ar- level, and the programmer could use the fork/join API calls chitectures, represents one of the first published works specifically to exploit MIMD-level parallelism. surveying the state of high performance heterogeneous computing. CEDAR [172] represented another example of a multiproces- They define heterogeneity as the combination of different architec- sor cluster with eight processors, each processor modified tures and system design styles into one system or machine, and with an Alliant FX/8 mini-supercomputer. This allowed het- their motivation for heterogeneous systems is summed up well by erogeneity within clusters, and among clusters, and at the the following direct quote: level of instructions, supporting vector processing, multipro- Heterogeneity in the design (of supercomputers) needs cessing, and parallel processing. to be considered when a point of diminishing returns (5) Special-purpose architectures that could contain heterogene- in a homogeneous architecture is reached. ity at both the implementation and function levels. The Navier-Stokes computer (NSC) [262] is an example. The As we see throughout this work, this drive for specialization nodes
Recommended publications
  • Integration of CUDA Processing Within the C++ Library for Parallelism and Concurrency (HPX)
    1 Integration of CUDA Processing within the C++ library for parallelism and concurrency (HPX) Patrick Diehl, Madhavan Seshadri, Thomas Heller, Hartmut Kaiser Abstract—Experience shows that on today’s high performance systems the utilization of different acceleration cards in conjunction with a high utilization of all other parts of the system is difficult. Future architectures, like exascale clusters, are expected to aggravate this issue as the number of cores are expected to increase and memory hierarchies are expected to become deeper. One big aspect for distributed applications is to guarantee high utilization of all available resources, including local or remote acceleration cards on a cluster while fully using all the available CPU resources and the integration of the GPU work into the overall programming model. For the integration of CUDA code we extended HPX, a general purpose C++ run time system for parallel and distributed applications of any scale, and enabled asynchronous data transfers from and to the GPU device and the asynchronous invocation of CUDA kernels on this data. Both operations are well integrated into the general programming model of HPX which allows to seamlessly overlap any GPU operation with work on the main cores. Any user defined CUDA kernel can be launched on any (local or remote) GPU device available to the distributed application. We present asynchronous implementations for the data transfers and kernel launches for CUDA code as part of a HPX asynchronous execution graph. Using this approach we can combine all remotely and locally available acceleration cards on a cluster to utilize its full performance capabilities.
    [Show full text]
  • Analysis of GPU-Libraries for Rapid Prototyping Database Operations
    Analysis of GPU-Libraries for Rapid Prototyping Database Operations A look into library support for database operations Harish Kumar Harihara Subramanian Bala Gurumurthy Gabriel Campero Durand David Broneske Gunter Saake University of Magdeburg Magdeburg, Germany fi[email protected] Abstract—Using GPUs for query processing is still an ongoing Usually, library operators for GPUs are either written by research in the database community due to the increasing hardware experts [12] or are available out of the box by device heterogeneity of GPUs and their capabilities (e.g., their newest vendors [13]. Overall, we found more than 40 libraries for selling point: tensor cores). Hence, many researchers develop optimal operator implementations for a specific device generation GPUs each packing a set of operators commonly used in one involving tedious operator tuning by hand. On the other hand, or more domains. The benefits of those libraries is that they are there is a growing availability of GPU libraries that provide constantly updated and tested to support newer GPU versions optimized operators for manifold applications. However, the and their predefined interfaces offer high portability as well as question arises how mature these libraries are and whether they faster development time compared to handwritten operators. are fit to replace handwritten operator implementations not only w.r.t. implementation effort and portability, but also in terms of This makes them a perfect match for many commercial performance. database systems, which can rely on GPU libraries to imple- In this paper, we investigate various general-purpose libraries ment well performing database operators. Some example for that are both portable and easy to use for arbitrary GPUs such systems are: SQreamDB using Thrust [14], BlazingDB in order to test their production readiness on the example of using cuDF [15], Brytlyt using the Torch library [16].
    [Show full text]
  • Wind Rose Data Comes in the Form >200,000 Wind Rose Images
    Making Wind Speed and Direction Maps Rich Stromberg Alaska Energy Authority [email protected]/907-771-3053 6/30/2011 Wind Direction Maps 1 Wind rose data comes in the form of >200,000 wind rose images across Alaska 6/30/2011 Wind Direction Maps 2 Wind rose data is quantified in very large Excel™ spreadsheets for each region of the state • Fields: X Y X_1 Y_1 FILE FREQ1 FREQ2 FREQ3 FREQ4 FREQ5 FREQ6 FREQ7 FREQ8 FREQ9 FREQ10 FREQ11 FREQ12 FREQ13 FREQ14 FREQ15 FREQ16 SPEED1 SPEED2 SPEED3 SPEED4 SPEED5 SPEED6 SPEED7 SPEED8 SPEED9 SPEED10 SPEED11 SPEED12 SPEED13 SPEED14 SPEED15 SPEED16 POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 WEIBC1 WEIBC2 WEIBC3 WEIBC4 WEIBC5 WEIBC6 WEIBC7 WEIBC8 WEIBC9 WEIBC10 WEIBC11 WEIBC12 WEIBC13 WEIBC14 WEIBC15 WEIBC16 WEIBK1 WEIBK2 WEIBK3 WEIBK4 WEIBK5 WEIBK6 WEIBK7 WEIBK8 WEIBK9 WEIBK10 WEIBK11 WEIBK12 WEIBK13 WEIBK14 WEIBK15 WEIBK16 6/30/2011 Wind Direction Maps 3 Data set is thinned down to wind power density • Fields: X Y • POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 • Power1 is the wind power density coming from the north (0 degrees). Power 2 is wind power from 22.5 deg.,…Power 9 is south (180 deg.), etc… 6/30/2011 Wind Direction Maps 4 Spreadsheet calculations X Y POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 Max Wind Dir Prim 2nd Wind Dir Sec -132.7365 54.4833 0.643 0.767 1.911 4.083
    [Show full text]
  • Bench - Benchmarking the State-Of- The-Art Task Execution Frameworks of Many- Task Computing
    MATRIX: Bench - Benchmarking the state-of- the-art Task Execution Frameworks of Many- Task Computing Thomas Dubucq, Tony Forlini, Virgile Landeiro Dos Reis, and Isabelle Santos Illinois Institute of Technology, Chicago, IL, USA {tdubucq, tforlini, vlandeir, isantos1}@hawk.iit.edu Stanford University. Finally HPX is a general purpose C++ Abstract — Technology trends indicate that exascale systems will runtime system for parallel and distributed applications of any have billion-way parallelism, and each node will have about three scale developed by Louisiana State University and Staple is a orders of magnitude more intra-node parallelism than today’s framework for developing parallel programs from Texas A&M. peta-scale systems. The majority of current runtime systems focus a great deal of effort on optimizing the inter-node parallelism by MATRIX is a many-task computing job scheduling system maximizing the bandwidth and minimizing the latency of the use [3]. There are many resource managing systems aimed towards of interconnection networks and storage, but suffer from the lack data-intensive applications. Furthermore, distributed task of scalable solutions to expose the intra-node parallelism. Many- scheduling in many-task computing is a problem that has been task computing (MTC) is a distributed fine-grained paradigm that considered by many research teams. In particular, Charm++ [4], aims to address the challenges of managing parallelism and Legion [5], Swift [6], [10], Spark [1][2], HPX [12], STAPL [13] locality of exascale systems. MTC applications are typically structured as direct acyclic graphs of loosely coupled short tasks and MATRIX [11] offer solutions to this problem and have with explicit input/output data dependencies.
    [Show full text]
  • HPX – a Task Based Programming Model in a Global Address Space
    HPX – A Task Based Programming Model in a Global Address Space Hartmut Kaiser1 Thomas Heller2 Bryce Adelstein-Lelbach1 [email protected] [email protected] [email protected] Adrian Serio1 Dietmar Fey2 [email protected] [email protected] 1Center for Computation and 2Computer Science 3, Technology, Computer Architectures, Louisiana State University, Friedrich-Alexander-University, Louisiana, U.S.A. Erlangen, Germany ABSTRACT 1. INTRODUCTION The significant increase in complexity of Exascale platforms due to Todays programming models, languages, and related technologies energy-constrained, billion-way parallelism, with major changes to that have sustained High Performance Computing (HPC) appli- processor and memory architecture, requires new energy-efficient cation software development for the past decade are facing ma- and resilient programming techniques that are portable across mul- jor problems when it comes to programmability and performance tiple future generations of machines. We believe that guarantee- portability of future systems. The significant increase in complex- ing adequate scalability, programmability, performance portability, ity of new platforms due to energy constrains, increasing paral- resilience, and energy efficiency requires a fundamentally new ap- lelism and major changes to processor and memory architecture, proach, combined with a transition path for existing scientific ap- requires advanced programming techniques that are portable across plications, to fully explore the rewards of todays and tomorrows multiple future generations of machines [1]. systems. We present HPX – a parallel runtime system which ex- tends the C++11/14 standard to facilitate distributed operations, A fundamentally new approach is required to address these chal- enable fine-grained constraint based parallelism, and support run- lenges.
    [Show full text]
  • Conference Program
    Ernest N. Morial ConventionConference Center Program • New Orleans, Louisiana HPC Everywhere, Everyday Conference Program The International Conference for High Performance Exhibition Dates: Computing, Networking, Storage and Analysis November 17-20, 2014 Sponsors: Conference Dates: November 16-21, 2014 Table of Contents 3 Welcome from the Chair 67 HPC Impact Showcase/ Emerging Technologies 4 SC14 Mobile App 82 HPC Interconnections 5 General Information 92 Keynote/Invited Talks 9 SCinet Contributors 106 Papers 11 Registration Pass Access 128 Posters 13 Maps 152 Tutorials 16 Daily Schedule 164 Visualization and Data Analytics 26 Award Talks/Award Presentations 168 Workshops 31 Birds of a Feather 178 SC15 Call for Participation 50 Doctoral Showcase 56 Exhibitor Forum Welcome 3 Welcome to SC14 HPC helps solve some of the SC is fundamentally a technical conference, and anyone world’s most complex problems. who has spent time on the show floor knows the SC Exhibits Innovations from our community have program provides a unique opportunity to interact with far-reaching impact in every area of the future of HPC. Far from being just a simple industry science —from the discovery of new exhibition, our research and industry booths showcase recent 67 HPC Impact Showcase/ drugs to precisely predicting the developments in our field, with a rich combination of research Emerging Technologies next superstorm—even investment labs, universities, and other organizations and vendors of all 82 HPC Interconnections banking! For more than two decades, the SC Conference has types of software, hardware, and services for HPC. been the place to build and share the innovations that are 92 Keynote/Invited Talks making these life-changing discoveries possible.
    [Show full text]
  • Openpower AI CERN V1.Pdf
    Moore’s Law Processor Technology Firmware / OS Linux Accelerator sSoftware OpenStack Storage Network ... Price/Performance POWER8 2000 2020 DRAM Memory Chips Buffer Power8: Up to 12 Cores, up to 96 Threads L1, L2, L3 + L4 Caches Up to 1 TB per socket https://www.ibm.com/blogs/syst Up to 230 GB/s sustained memory ems/power-systems- openpower-enable- bandwidth acceleration/ System System Memory Memory 115 GB/s 115 GB/s POWER8 POWER8 CPU CPU NVLink NVLink 80 GB/s 80 GB/s P100 P100 P100 P100 GPU GPU GPU GPU GPU GPU GPU GPU Memory Memory Memory Memory GPU PCIe CPU 16 GB/s System bottleneck Graphics System Memory Memory IBM aDVantage: data communication and GPU performance POWER8 + 78 ms Tesla P100+NVLink x86 baseD 170 ms GPU system ImageNet / Alexnet: Minibatch size = 128 ADD: Coherent Accelerator Processor Interface (CAPI) FPGA CAPP PCIe POWER8 Processor ...FPGAs, networking, memory... Typical I/O MoDel Flow Copy or Pin MMIO Notify Poll / Int Copy or Unpin Ret. From DD DD Call Acceleration Source Data Accelerator Completion Result Data Completion Flow with a Coherent MoDel ShareD Mem. ShareD Memory Acceleration Notify Accelerator Completion Focus on Enterprise Scale-Up Focus on Scale-Out and Enterprise Future Technology and Performance DriVen Cost and Acceleration DriVen Partner Chip POWER6 Architecture POWER7 Architecture POWER8 Architecture POWER9 Architecture POWER10 POWER8/9 2007 2008 2010 2012 2014 2016 2017 TBD 2018 - 20 2020+ POWER6 POWER6+ POWER7 POWER7+ POWER8 POWER8 P9 SO P9 SU P9 SO 2 cores 2 cores 8 cores 8 cores 12 cores w/ NVLink
    [Show full text]
  • Learning PHP 5 by David Sklar
    Learning PHP 5 By David Sklar Ripped by: Lilmeanman Dedication To Jacob, who can look forward to so much learning. Preface Boring web sites are static. Interesting web sites are dynamic. That is, their content changes. A giant static HTML page listing the names, pictures, descriptions, and prices of all 1,000 products a company has for sale is hard to use and takes forever to load. A dynamic web product catalog that lets you search and filter those products so you see only the six items that meet your price and category criteria is more useful, faster, and much more likely to close a sale. The PHP programming language makes it easy to build dynamic web sites. Whatever interactive excitement you want to create—such as a product catalog, a blog, a photo album, or an event calendar—PHP is up to the task. And after reading this book, you'll be up to the task of building that dynamic web site, too. Who This Book Is For This book is for: • A hobbyist who wants to create an interactive web site for himself, his family, or a nonprofit organization. • A web site builder who wants to use the PHP setup provided by an ISP or hosting provider. • A small business owner who wants to put her company on the Web. • A page designer who wants to communicate better with her developer co-workers. • A JavaScript whiz who wants to build server-side programs that complement her client-side code. • A blogger or HTML jockey who wants to easily add dynamic features to her site.
    [Show full text]
  • Enea® Linux Open Source Report
    Enea® Linux Open Source Report 4.0 Enea® Linux Open Source Report Enea® Linux Open Source Report Copyright Copyright © Enea Software AB 2014. This User Documentation consists of confidential information and is protected by Trade Secret Law. This notice of copyright does not indicate any actual or intended publication of this information. Except to the extent expressly stipulated in any software license agreement covering this User Documentation and/or corresponding software, no part of this User Documentation may be reproduced, transmitted, stored in a retrieval system, or translated, in any form or by any means, without the prior written permission of Enea Software AB. However, permission to print copies for personal use is hereby granted. Disclaimer The information in this User Documentation is subject to change without notice, and unless stipulated in any software license agreement covering this User Documentation and/or corresponding software, should not be construed as a commitment of Enea Software AB. Trademarks Enea®, Enea OSE®, and Polyhedra® are the registered trademarks of Enea AB and its subsidiaries. Enea OSE®ck, Enea OSE® Epsilon, Enea® Element, Enea® Optima, Enea® Linux, Enea® LINX, Enea® LWRT, Enea® Accelerator, Polyhedra® Flash DBMS, Polyhedra® Lite, Enea® dSPEED, Accelerating Network Convergence™, Device Software Optimized™, and Embedded for Leaders™ are unregistered trademarks of Enea AB or its subsidiaries. Any other company, product or service names mentioned in this document are the registered or unregistered trade- marks of their respective owner. Acknowledgements and Open Source License Conditions Information is found in the Release Information manual. © Enea Software AB 2014 4.0 ii Enea® Linux Open Source Report Table of Contents 1 - About this Report .......................................................................................................
    [Show full text]
  • Vector Graphics Computer Science Masters Dissertation
    Vector Graphics to improve BLAST graphic representations by Rafael Jimenez Computer Science Masters Dissertation School of Computer Science University of KwaZulu-Natal December, 2007 Abstract BLAST reports can be complicated. Viewing them graphically helps to understand them better, especially when the reports are long. At present "Web BLAST" and the stand-alone "wwwBLAST" versions, distributed by the NCBI, include graph- ical viewers for BLAST results. An alternative approach is "BLAST Graphic Viewer" developed by GMOD as part of the BioPerl library. It provides a more aesthetically pleasing and informative graphical visualization to represent BLAST results. All the strategies mentioned above are based on the use of bitmap graph- ics and dependent on JavaScript code embedded in HTML. We present Vector Graphic BLAST (VEGRA) a Python object orientated library based on BioPy- thon to yield graphical visualization of results from BLAST utilizing vector graph- ics. Graphics produced by VEGRA are better than bitmaps for illustration, more flexible because they can be resized and stretched, require less memory, and their interactivity is more effective as it is independent of tertiary technologies due to its integration into the graphic. In addition, the library facilitates a definition of any layout for the different components of the graphic, as well as adjustment of size and colour properties. This dissertation studies previous alternatives and improves them by making use of vector graphics and thus allowing more effective presentation of results. VEGRA is not just an improvement for BLAST visualiza- tion but a model that illustrates how other visualization tools could make use of vector graphics. VEGRA currently works with BLAST, nevertheless the library has been written to be extended to other visualization problems.
    [Show full text]
  • “GPU in HEP: Online High Quality Trigger Processing”
    “GPU in HEP: online high quality trigger processing” ISOTDAQ 15.1.2020 Valencia Gianluca Lamanna (Univ.Pisa & INFN) G.Lamanna – ISOTDAQ – 15/1/2020 Valencia TheWorldin2035 2 The problem in 2035 15/1/2020 Valencia 15/1/2020 – FCC (Future Circular Collider) is only an example Fixed target, Flavour factories, … the physics reach will be defined ISOTDAQ ISOTDAQ – by trigger! What the triggers will look like in 2035? 3 G.Lamanna The trigger in 2035… … will be similar to the current trigger… High reduction factor High efficiency for interesting events Fast decision High resolution …but will be also different… The higher background and Pile Up will limit the ability to trigger 15/1/2020 Valencia 15/1/2020 on interesting events – The primitives will be more complicated with respect today: tracks, clusters, rings ISOTDAQ ISOTDAQ – 4 G.Lamanna The trigger in 2035… Higher energy Resolution for high pt leptons → high-precision primitives High occupancy in forward region → better granularity Higher luminosity track-calo correlation Bunch crossing ID becomes challenging, pile up All of these effects go in the same direction 15/1/2020 Valencia 15/1/2020 More resolution & more granularity more data & more – processing What previously had to be done in hardware may ISOTDAQ ISOTDAQ – now be done in firmware; What was previously done in firmware may now be done in software! 5 G.Lamanna Classic trigger in the future? Is a traditional “pipelined” trigger possible? Yes and no Cost and dimension Getting all data in one place • New links -> data flow • No “slow”
    [Show full text]
  • Exascale Computing Project -- Software
    Exascale Computing Project -- Software Paul Messina, ECP Director Stephen Lee, ECP Deputy Director ASCAC Meeting, Arlington, VA Crystal City Marriott April 19, 2017 www.ExascaleProject.org ECP scope and goals Develop applications Partner with vendors to tackle a broad to develop computer spectrum of mission Support architectures that critical problems national security support exascale of unprecedented applications complexity Develop a software Train a next-generation Contribute to the stack that is both workforce of economic exascale-capable and computational competitiveness usable on industrial & scientists, engineers, of the nation academic scale and computer systems, in collaboration scientists with vendors 2 Exascale Computing Project, www.exascaleproject.org ECP has formulated a holistic approach that uses co- design and integration to achieve capable exascale Application Software Hardware Exascale Development Technology Technology Systems Science and Scalable and Hardware Integrated mission productive technology exascale applications software elements supercomputers Correctness Visualization Data Analysis Applicationsstack Co-Design Programming models, Math libraries and development environment, Tools Frameworks and runtimes System Software, resource Workflows Resilience management threading, Data Memory scheduling, monitoring, and management and Burst control I/O and file buffer system Node OS, runtimes Hardware interface ECP’s work encompasses applications, system software, hardware technologies and architectures, and workforce
    [Show full text]