Ecpe Connections
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Spreading Excellence Report
HIGH PERFORMANCE AND EMBEDDED ARCHITECTURE AND COMPILATION Project Acronym: HiPEAC Project full title: High Performance and Embedded Architecture and Compilation Grant agreement no: ICT-217068 DELIVERABLE 3.1 SPREADING EXCELLENCE REPORT 15/04/2009 Spreading Excellence Report 1 1. Summary on Spreading Excellence ................................................................................ 3 2. Task 3.1: Conference ........................................................................................................ 4 2.1. HiPEAC 2008 Conference, Goteborg ......................................................................... 4 2.2. HiPEAC 2009 Conference, Paphos ........................................................................... 10 2.3. Conference Ranking .................................................................................................. 16 3. Task 3.2: Summer School .............................................................................................. 16 th 3.1. 4 International Summer School – 2008 ................................................................... 16 th 3.2. 5 International Summer School – 2009 ................................................................... 21 4. Task 3.3: HiPEAC Journal ............................................................................................ 22 5. Task 3.4: HiPEAC Roadmap ........................................................................................ 23 6. Task 3.5: HiPEAC Newsletter ...................................................................................... -
A Compiler-Compiler for DSL Embedding
A Compiler-Compiler for DSL Embedding Amir Shaikhha Vojin Jovanovic Christoph Koch EPFL, Switzerland Oracle Labs EPFL, Switzerland {amir.shaikhha}@epfl.ch {vojin.jovanovic}@oracle.com {christoph.koch}@epfl.ch Abstract (EDSLs) [14] in the Scala programming language. DSL devel- In this paper, we present a framework to generate compil- opers define a DSL as a normal library in Scala. This plain ers for embedded domain-specific languages (EDSLs). This Scala implementation can be used for debugging purposes framework provides facilities to automatically generate the without worrying about the performance aspects (handled boilerplate code required for building DSL compilers on top separately by the DSL compiler). of extensible optimizing compilers. We evaluate the practi- Alchemy provides a customizable set of annotations for cality of our framework by demonstrating several use-cases encoding the domain knowledge in the optimizing compila- successfully built with it. tion frameworks. A DSL developer annotates the DSL library, from which Alchemy generates a DSL compiler that is built CCS Concepts • Software and its engineering → Soft- on top of an extensible optimizing compiler. As opposed to ware performance; Compilers; the existing compiler-compilers and language workbenches, Alchemy does not need a new meta-language for defining Keywords Domain-Specific Languages, Compiler-Compiler, a DSL; instead, Alchemy uses the reflection capabilities of Language Embedding Scala to treat the plain Scala code of the DSL library as the language specification. 1 Introduction A compiler expert can customize the behavior of the pre- Everything that happens once can never happen defined set of annotations based on the features provided by again. -
Computer Architectures an Overview
Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements. -
(URMD) Grad Cohort Workshop
MARCH 16-17, 2018 SAN DIEGO GRAD Cohort UnderrepresentedURMD Minorities & Persons with Disabilities 2018 www.cra.org Dear Grad Cohort Participant, We welcome you to the 2018 CRA Grad Cohort Workshop for Underrepresented Minorities + Persons with Disabilities 2018 (URMD)! The next few days are filled with sessions where 25 senior computing researchers and professionals will be sharing their strategies and experiences to help increase GradCohort your graduate school and career success. There will also be plenty of opportunities to meet and network with these successful researchers as well as with graduate students from other universities. We hope that you will take the utmost advantage of this unique experience by actively participating in discussions, developing peer networks, and building mentoring relationships. Since this is the inaugural URMD Grad Cohort Workshop, we are especially interested in hearing your feedback about the program and the experience. Please take time to complete the evaluation form provided after the workshop. We want to learn what you liked and did not like, as well as any suggestions you might have for improving the event in subsequent years. The 2018 CRA-URMD Workshop is made possible through generous contributions by the Computing Research Association, National Science Foundation, AccessComputing, Whova, Google and Association for Computing Machinery. Please join us in thanking them for their kind support. We hope that you take home many new insights and connections from this workshop to help you succeed in -
Entrepreneurship Opportunities & Skills
Entrepreneurship Opportunities & Skills Kunle Olukotun Stanford University 2019 CRA URMD Grad Cohort Workshop Kunle Olukotun • Professor of EE and CS at Stanford since 1992 – Cadence Design endowed chair • Computer architecture and parallel computing • Pioneer in CMP (multicore) processor design – Stanford Hydra project • Parallel programming for all programmers – Director of Pervasive Parallelism Lab (PPL) – High-performance Domain Specific Languages (DSLs) • Data Analytics for What’s Next (DAWN) – Democratizing machine learning Academic Research and Startups • When you can’t find an industrial partner for your ideas make one • Technology transfer by PhD – Understand the initial market for your product (technology) – What is your unfair advantage – More robust technology than research prototypes – Cost of schedule slip is much worse – You don’t have much time for research at a start-up • Pick your team wisely – Technical team (lab partners?) – Management team (who is making decisions) – Cap table • Funding issues – Investors (smart vs. dumb) – Angels vs Venture Capital firms Startup Pros/Cons Pros Cons • Don’t have to convince existing company to change • Have to convince investors course (until exit) (repeatedly) • Have to build a whole company, not just a development team • Finance, sales, marketing, … • Limited resources • Impatient capital Afara WebSystems • Founded in 1999 – Height of internet boom – Large web sites running out of power and space – Goal: Revolutionize internet data centers (multi-B $ market) – Goal: Approach: 10x -
Implementing and Evaluating Nested Parallel Transactions in Software Transactional Memory
Implementing and Evaluating Nested Parallel Transactions in Software Transactional Memory Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford University Stanford, CA 94305 {wkbaek,nbronson,kozyraki,kunle}@stanford.edu ABSTRACT 1. INTRODUCTION Transactional Memory (TM) is a promising technique that sim- Transactional Memory (TM) [13] has surfaced as a promising plifies parallel programming for shared-memory applications. To technique to simplify parallel programming. TM addresses the dif- date, most TM systems have been designed to efficiently support ficulty of lock-based synchronization by allowing programmers to single-level parallelism. To achieve widespread use and maximize simply declare certain code segments as transactions that execute performance gains, TM must support nested parallelism available in an atomic and isolated way with respect to other code. TM takes in many applications and supported by several programming mod- responsibility for all concurrency control. The potential of TM els. has motivated extensive research on hardware, software, and hy- We present NesTM, a software TM (STM) system that supports brid implementations. We focus on software TM (STM) [8,11,19], closed-nested parallel transactions. NesTM is based on a high- because it is the only approach compatible with existing and up- performance, blocking STM that uses eager version management coming multicore chips. and word-granularity conflict detection. Its algorithm targets the Most TM systems, thus far, have assumed that the code within state and runtime overheads of nested parallel transactions. We a transaction executes sequentially. However, real world applica- also describe several subtle correctness issues in supporting nested tions often include the potential for nested parallelism in various parallel transactions in NesTM and discuss their performance im- forms such as nested parallel loops, recursive function calls, and pact. -
Ultrasparc T1 (Niagara)
UltraSPARC T1 (Niagara) Vortrag im Rahmen des Seminars „Ausgewählte Themen in Hardwareentwurf und Optik“ HWS 06 Universität Mannheim Jochen Kinzel 1 Inhalt Überblick Core Crossbar Level 2-Cache DRAM-Controller Floating-Point-Unit Abschließende Bemerkungen Ausblick: Niagara II 2 Überblick 3 Der Weg zum UltraSPARC T1 Ursprüngliche Idee von Afara Websystems Inc. Übernahme durch Sun Microsystems Inc. Aug 2004: Präsentation des Niagara auf der HotChips 16 Nov 2005: UltraSPARC T1 ist fertig Dez 2005: Erste Server mit UltraSPARC T1 März 2006: OpenSPARC T1 veröffentlicht 4 Facts 64 Bit Multicore-Prozessor Verfügbar mit 4, 6 und 8 Cores Pro Core: 4 Threads 16 KBytes L1-ICache, 8 KByte L1-DCache 3 MByte Level 2-Cache Max. 1,2 GHz 72 Watt Leistungsbedarf 90nm-Technologie, 9 Layer 5 Anwendungsgebiete Parallele Anwendungen Zentral synchrone Dezentral asynchrone Anwendungen mit Anwendungen mit strukturierter Kontrollflussparallelität Datenflussparallelität Bsp.: Transaktionssysteme, Bsp.: Wettersimulation Webserver-, Datenbankanwendungen Geeignete Systeme: z.B. BlueGene von IBM Geeignete Systeme: z.B. FireT2000 von Sun mit UltraSPARC T1 Prozessor 6 Quelle: [1] 7 Core 8 Die Pipeline Quelle: [1] 9 Instruction Fetch Unit (IFU) Level 1 – Instruction Cache 16 KByte groß 4-fach assoziativ 32 Byte große Cache Line Instruction Fill Queue (IFQ) Missed Instruction List (MIL) Außerdem: Instruction Table Lookaside Buffer 10 IFU und Thread-Select Quelle: [1] 11 Thread-Select (TS) 2 Instruktionsregister pro Thread Program Counter (PC) pro Thread In jedem Takt: Erneute -
Kunle Olukotun Cadence Design Systems Professor and Professor of Electrical Engineering
Kunle Olukotun Cadence Design Systems Professor and Professor of Electrical Engineering CONTACT INFORMATION • Administrative Contact Kathy Robinson - Administrative Associate Email [email protected] Tel (650) 723-1430 Bio BIO Kunle Olukotun is the Cadence Design Systems Professor in the School of Engineering and Professor of Electrical Engineering and Computer Science at Stanford University. Olukotun is well known as a pioneer in multicore processor design and the leader of the Stanford Hydra chip multiprocessor (CMP) research project. Olukotun founded Afara Websystems to develop high-throughput, low-power multicore processors for server systems. The Afara multicore processor, called Niagara, was acquired by Sun Microsystems. Niagara derived processors now power all Oracle SPARC-based servers. Olukotun currently directs the Stanford Pervasive Parallelism Lab (PPL), which seeks to proliferate the use of heterogeneous parallelism in all application areas using Domain Specific Languages (DSLs). ACADEMIC APPOINTMENTS • Professor, Electrical Engineering • Professor, Computer Science • Faculty Affiliate, Institute for Human-Centered Artificial Intelligence (HAI) • Member, Wu Tsai Neurosciences Institute HONORS AND AWARDS • Fellow, ACM (2007) • Fellow, IEEE (2007) PROFESSIONAL EDUCATION • PhD, Michigan (1991) LINKS • Personal Site: http://arsenalfc.stanford.edu/kunle/ Teaching COURSES 2021-22 Page 1 of 2 Kunle Olukotun http://cap.stanford.edu/profiles/Oyekunle_Olukotun/ • Digital Systems Design Lab: EE 109 (Spr) • Parallel Computing: CS 149 -
Sun's Big Splash
WINNER INTEGRATED CIRCUITS Sun’s Big Splash THE NIAGARA MICROPROCESSOR CHIP IS SUN’S BEST HOPE FOR A COMEBACK BY LINDA GEPPERT he Sunnyvale, Calif., campus of Sun But Sun, headquartered in Santa Clara, Calif., is still far from its glory days of the last decade. It could use a Microsystems Inc. is a quiet and small miracle to get back solidly on its feet, and at last the company may have one: a new microprocessor chip peaceful place with six low-rise build- intended for the volume servers that are the heart of ings connected by tree-lined walkways. data centers running the information and Web pro- cessing for businesses, universities, hospitals, factories, TBut the tranquility masks a frightening real- and the like. Sun’s engineers have had working chips since last spring and are now heavily into testing and ity—Sun is in serious economic trouble. The debugging them and making design changes for the next company was badly splattered by the burst of fabrication run in early 2005. The server business generates $50 billion a year, the dot-com bubble of 2000. Revenues for according to Jessica Yang, a research analyst at IDC, Framingham, Mass., and Sun’s share recently is about this once towering colossus of the server 12 percent—down from 17 percent just four years ago. Sun’s new chip, called Niagara for the torrent of data industry went south, and its stock plunged and instructions that flow between the chip and its from more than US $60 in 2000 to less than memory, was designed from the ground up to do away with the impact of latency—the idle time a micropro- $3 in 2002. -
Energy-Efficient Abundant-Data Computing: the N3XT 1,000X
COVER FEATURE REBOOTING COMPUTING Energy-Efficient Abundant-Data Computing: The N3XT 1,000× Mohamed M. Sabry Aly, Mingyu Gao, Gage Hills, Chi-Shuen Lee, Greg Pitner, Max M. Shulaker, Tony F. Wu, and Mehdi Asheghi, Stanford University Jeff Bokor, University of California, Berkeley Franz Franchetti, Carnegie Mellon University Kenneth E. Goodson and Christos Kozyrakis, Stanford University Igor Markov, University of Michigan, Ann Arbor Kunle Olukotun, Stanford University Larry Pileggi, Carnegie Mellon University Eric Pop, Stanford University Jan Rabaey, University of California, Berkeley Christopher Ré, H.-S. Philip Wong, and Subhasish Mitra, Stanford University Next-generation information technologies will process unprecedented amounts of loosely structured data that overwhelm existing computing systems. N3XT improves the energy efficiency of abundant-data applications 1,000-fold by using new logic and memory technologies, 3D integration with fine-grained connectivity, and new architectures for computation immersed in memory. 24 COMPUTER PUBLISHED BY THE IEEE COMPUTER SOCIETY 0018-9162/15/$31.00 © 2015 IEEE he rising demand for high- new system technology that promises enabled by low-temperature performance IT services with to breathe new life into computing. layer transfer techniques. This human-like interfaces is driv- Key N3XT components include the unique approach decouples ing the quest for the next gen- following: high-temperature nanoma- Teration of energy-efficient computers. terial synthesis (to achieve These computers will operate on abun- › High-performance and energy- high- quality materials) from dant data that can be highly unstruc- efficient field-effect transistors low-temperature monolithic 3D tured and often streamed in terabytes. (FETs) based on atomic-scale integration. -
Understanding SPARC Processor Performance
Understanding SPARC Processor Performance MAY 15 & 16, 2019 CLEVELAND PUBLIC AUDITORIUM, CLEVELAND, OHIO WWW.NEOOUG.ORG/GLOC About the Speaker • Akiva Lichtner • Physics background • Twenty years experience in IT • Enterprise production support analyst • Java developer • Oracle query plan manager … • Spoke here at G.L.O.C. about TDD and Java dynamic tracing Audience • Developers • System administrators • Tech support analysts • IT managers Motivation • I have been working in tech support for a large application • We have run SPARC T4 servers and now we run T7 servers • Application servers, database servers • Environments are all different • Users complained for years about “environment X” being slow, finally figured out why • What I learned can be very useful for users of SPARC servers What is SPARC? • First released in 1987, created by Sun Microsystems to replace the Motorola 68000 in its workstation products • During the .com boom Solaris/SPARC and Windows/Intel were the only supported platforms for the JVM • In 2000 the bubble burst and Sun server sales plunged • Sun acquired Afara Websystems, which had built an interesting new processor, and renamed it the UltraSPARC T1 • Was followed by T2 through M8, evolutions of the same design • More recently Oracle has added significant new functionality Processor Design • High core count (even in the early days) • Many threads per core • “Barrel” processor • Designed to switch efficiently • Non-uniform memory access • Per-processor shared cache • Core-level shared cache A picture speaks a thousand -
Advanced Multipurpose Microprocessor
© 2014 IJIRT | Volume 1 Issue 5 | ISSN : 2349-6002 ADVANCED MULTIPURPOSE MICROPROCESSOR Pankaj Gupta, Ravi Sangwan IT Department, mdu university Abstract- the Next Generation Multipurpose European Deep Sub-Micron (DSM) technology in order Microprocessor (NGMP) is a SPARC V8 (E) there have to meet increasing requirements on performance and to been three major revisions of the architecture. The first ensure the supply of European space processors. published revision was the 32-bit SPARC Version 7 (V7) in II. ARCHITECTURAL OVERVIEW 1986. SPARC Version 8 (V8), the main differences between V7 and V8 were the addition of integer multiply and divide It should be noted that this paper describes the current instructions, and an upgrade from 80-bit "extended state of the NGMP. The specification has been frozen and precision" floating-point arithmetic to 128-bit "quad- the activity is currently in its architectural design phase. precision" arithmetic. This paper describes the baseline SPARC machines have generally used Sun's SunOS, architecture, points out key choices that have been made and emphasises design decisions that are still open. The Solaris or Open Solaris, but other operating systems such software tools and operating systems that will be available as Next STEP, RTEMS, FreeBSD, OpenBSD, NetBSD, for the NGMP, together with a general overview of the new and Linux have also been used. LEON4FT microprocessor, are also described. Fig. 1 depicts an overview of the NGMP architecture. The system will consist of five AHB buses; one 128-bit I. BACKGROUND Processor bus, one 128-bit Memory bus, two 32-bit I/O The LEON project was started by the European Space buses and one 32-bit Debug bus.