Coverstory by Markus Levy, Technical Editor
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
Pipelining and Vector Processing
Chapter 8 Pipelining and Vector Processing 8–1 If the pipeline stages are heterogeneous, the slowest stage determines the flow rate of the entire pipeline. This leads to other stages idling. 8–2 Pipeline stalls can be caused by three types of hazards: resource, data, and control hazards. Re- source hazards result when two or more instructions in the pipeline want to use the same resource. Such resource conflicts can result in serialized execution, reducing the scope for overlapped exe- cution. Data hazards are caused by data dependencies among the instructions in the pipeline. As a simple example, suppose that the result produced by instruction I1 is needed as an input to instruction I2. We have to stall the pipeline until I1 has written the result so that I2 reads the correct input. If the pipeline is not designed properly, data hazards can produce wrong results by using incorrect operands. Therefore, we have to worry about the correctness first. Control hazards are caused by control dependencies. As an example, consider the flow control altered by a branch instruction. If the branch is not taken, we can proceed with the instructions in the pipeline. But, if the branch is taken, we have to throw away all the instructions that are in the pipeline and fill the pipeline with instructions at the branch target. 8–3 Prefetching is a technique used to handle resource conflicts. Pipelining typically uses just-in- time mechanism so that only a simple buffer is needed between the stages. We can minimize the performance impact if we relax this constraint by allowing a queue instead of a single buffer. -
Data-Level Parallelism
Fall 2015 :: CSE 610 – Parallel Computer Architectures Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 – Parallel Computer Architectures Overview • Data Parallelism vs. Control Parallelism – Data Parallelism: parallelism arises from executing essentially the same code on a large number of objects – Control Parallelism: parallelism arises from executing different threads of control concurrently • Hypothesis: applications that use massively parallel machines will mostly exploit data parallelism – Common in the Scientific Computing domain • DLP originally linked with SIMD machines; now SIMT is more common – SIMD: Single Instruction Multiple Data – SIMT: Single Instruction Multiple Threads Fall 2015 :: CSE 610 – Parallel Computer Architectures Overview • Many incarnations of DLP architectures over decades – Old vector processors • Cray processors: Cray-1, Cray-2, …, Cray X1 – SIMD extensions • Intel SSE and AVX units • Alpha Tarantula (didn’t see light of day ) – Old massively parallel computers • Connection Machines • MasPar machines – Modern GPUs • NVIDIA, AMD, Qualcomm, … • Focus of throughput rather than latency Vector Processors 4 SCALAR VECTOR (1 operation) (N operations) r1 r2 v1 v2 + + r3 v3 vector length add r3, r1, r2 vadd.vv v3, v1, v2 Scalar processors operate on single numbers (scalars) Vector processors operate on linear sequences of numbers (vectors) 6.888 Spring 2013 - Sanchez and Emer - L14 What’s in a Vector Processor? 5 A scalar processor (e.g. a MIPS processor) Scalar register file (32 registers) Scalar functional units (arithmetic, load/store, etc) A vector register file (a 2D register array) Each register is an array of elements E.g. 32 registers with 32 64-bit elements per register MVL = maximum vector length = max # of elements per register A set of vector functional units Integer, FP, load/store, etc Some times vector and scalar units are combined (share ALUs) 6.888 Spring 2013 - Sanchez and Emer - L14 Example of Simple Vector Processor 6 6.888 Spring 2013 - Sanchez and Emer - L14 Basic Vector ISA 7 Instr. -
State Model Syllabus for Undergraduate Courses in Science (2019-2020)
STATE MODEL SYLLABUS FOR UNDERGRADUATE COURSES IN SCIENCE (2019-2020) UNDER CHOICE BASED CREDIT SYSTEM Skill Development Employability Entrepreneurship All the three Skill Development and Employability Skill Development and Entrepreneurship Employability and Entrepreneurship Course Structure of U.G. Botany Honours Total Semester Course Course Name Credit marks AECC-I 4 100 Microbiology and C-1 (Theory) Phycology 4 75 Microbiology and C-1 (Practical) Phycology 2 25 Biomolecules and Cell Semester-I C-2 (Theory) Biology 4 75 Biomolecules and Cell C-2 (Practical) Biology 2 25 Biodiversity (Microbes, GE -1A (Theory) Algae, Fungi & 4 75 Archegoniate) Biodiversity (Microbes, GE -1A(Practical) Algae, Fungi & 2 25 Archegoniate) AECC-II 4 100 Mycology and C-3 (Theory) Phytopathology 4 75 Mycology and C-3 (Practical) Phytopathology 2 25 Semester-II C-4 (Theory) Archegoniate 4 75 C-4 (Practical) Archegoniate 2 25 Plant Physiology & GE -2A (Theory) Metabolism 4 75 Plant Physiology & GE -2A(Practical) Metabolism 2 25 Anatomy of C-5 (Theory) Angiosperms 4 75 Anatomy of C-5 (Practical) Angiosperms 2 25 C-6 (Theory) Economic Botany 4 75 C-6 (Practical) Economic Botany 2 25 Semester- III C-7 (Theory) Genetics 4 75 C-7 (Practical) Genetics 2 25 SEC-1 4 100 Plant Ecology & GE -1B (Theory) Taxonomy 4 75 Plant Ecology & GE -1B (Practical) Taxonomy 2 25 C-8 (Theory) Molecular Biology 4 75 Semester- C-8 (Practical) Molecular Biology 2 25 IV Plant Ecology & 4 75 C-9 (Theory) Phytogeography Plant Ecology & 2 25 C-9 (Practical) Phytogeography C-10 (Theory) Plant -
Diploma Thesis
Faculty of Computer Science Chair for Real Time Systems Diploma Thesis Timing Analysis in Software Development Author: Martin Däumler Supervisors: Jun.-Prof. Dr.-Ing. Robert Baumgartl Dr.-Ing. Andreas Zagler Date of Submission: March 31, 2008 Martin Däumler Timing Analysis in Software Development Diploma Thesis, Chemnitz University of Technology, 2008 Abstract Rapid development processes and higher customer requirements lead to increasing inte- gration of software solutions in the automotive industry’s products. Today, several elec- tronic control units communicate by bus systems like CAN and provide computation of complex algorithms. This increasingly requires a controlled timing behavior. The following diploma thesis investigates how the timing analysis tool SymTA/S can be used in the software development process of the ZF Friedrichshafen AG. Within the scope of several scenarios, the benefits of using it, the difficulties in using it and the questions that can not be answered by the timing analysis tool are examined. Contents List of Figures iv List of Tables vi 1 Introduction 1 2 Execution Time Analysis 3 2.1 Preface . 3 2.2 Dynamic WCET Analysis . 4 2.2.1 Methods . 4 2.2.2 Problems . 4 2.3 Static WCET Analysis . 6 2.3.1 Methods . 6 2.3.2 Problems . 7 2.4 Hybrid WCET Analysis . 9 2.5 Survey of Tools: State of the Art . 9 2.5.1 aiT . 9 2.5.2 Bound-T . 11 2.5.3 Chronos . 12 2.5.4 MTime . 13 2.5.5 Tessy . 14 2.5.6 Further Tools . 15 2.6 Examination of Methods . 16 2.6.1 Software Description . -
Instruction-Level Distributed Processing
COVER FEATURE Instruction-Level Distributed Processing Shifts in hardware and software technology will soon force designers to look at microarchitectures that process instruction streams in a highly distributed fashion. James E. or nearly 20 years, microarchitecture research In short, the current focus on instruction-level Smith has emphasized instruction-level parallelism, parallelism will shift to instruction-level distributed University of which improves performance by increasing the processing (ILDP), emphasizing interinstruction com- Wisconsin- number of instructions per cycle. In striving munication with dynamic optimization and a tight Madison F for such parallelism, researchers have taken interaction between hardware and low-level software. microarchitectures from pipelining to superscalar pro- cessing, pushing toward increasingly parallel proces- TECHNOLOGY SHIFTS sors. They have concentrated on wider instruction fetch, During the next two or three technology generations, higher instruction issue rates, larger instruction win- processor architects will face several major challenges. dows, and increasing use of prediction and speculation. On-chip wire delays are becoming critical, and power In short, researchers have exploited advances in chip considerations will temper the availability of billions of technology to develop complex, hardware-intensive transistors. Many important applications will be object- processors. oriented and multithreaded and will consist of numer- Benefiting from ever-increasing transistor budgets ous separately -
073-080.Pdf (568.3Kb)
Graphics Hardware (2007) Timo Aila and Mark Segal (Editors) A Low-Power Handheld GPU using Logarithmic Arith- metic and Triple DVFS Power Domains Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seung Jin Lee, and Hoi-Jun Yoo Department of EECS, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea Abstract In this paper, a low-power GPU architecture is described for the handheld systems with limited power and area budgets. The GPU is designed using logarithmic arithmetic for power- and area-efficient design. For this GPU, a multifunction unit is proposed based on the hybrid number system of floating-point and logarithmic numbers and the matrix, vector, and elementary functions are unified into a single arithmetic unit. It achieves the single-cycle throughput for all these functions, except for the matrix-vector multipli- cation with 2-cycle throughput. The vertex shader using this function unit as its main datapath shows 49.3% cycle count reduction compared with the latest work for OpenGL transformation and lighting (TnL) kernel. The rendering engine uses also the logarithmic arithmetic for implementing the divisions in pipeline stages. The GPU is divided into triple dynamic voltage and frequency scaling power domains to minimize the power consumption at a given performance level. It shows a performance of 5.26Mvertices/s at 200MHz for the OpenGL TnL and 52.4mW power consumption at 60fps. It achieves 2.47 times per- formance improvement while reducing 50.5% power and 38.4% area consumption compared with the lat- est work. Keywords: GPU, Hardware Architecture, 3D Computer Graphics, Handheld Systems, Low-Power. -
On the Hardware Reduction of Z-Datapath of Vectoring CORDIC
On the Hardware Reduction of z-Datapath of Vectoring CORDIC R. Stapenhurst*, K. Maharatna**, J. Mathew*, J.L.Nunez-Yanez* and D. K. Pradhan* *University of Bristol, Bristol, UK **University of Southampton, Southampton, UK [email protected] Abstract— In this article we present a novel design of a hardware wordlength larger than 18-bits the hardware requirement of it optimal vectoring CORDIC processor. We present a mathematical becomes more than the classical CORDIC. theory to show that using bipolar binary notation it is possible to eliminate all the arithmetic computations required along the z- In this particular work we propose a formulation to eliminate datapath. Using this technique it is possible to achieve three and 1.5 all the arithmetic operations along the z-datapath for conventional times reduction in the number of registers and adder respectively two-sided vector rotation and thereby reducing the hardware compared to classical CORDIC. Following this, a 16-bit vectoring while increasing the accuracy. Also the resulting architecture CORDIC is designed for the application in Synchronizer for IEEE shows significant hardware saving as the wordlength increases. 802.11a standard. The total area and dynamic power consumption Although we stick to the 2’s complement number system, without of the processor is 0.14 mm2 and 700μW respectively when loss of generality, this formulation can be adopted easily for synthesized in 0.18μm CMOS library which shows its effectiveness redundant arithmetic and higher radix formulation. A 16-bit as a low-area low-power processor. processor developed following this formulation requires 0.14 mm2 area and consumes 700 μW dynamic power when synthesized in 0.18μm CMOS library. -
18-447 Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures
18-447 Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 1/28/2015 Agenda for Today & Next Few Lectures n Single-cycle Microarchitectures n Multi-cycle and Microprogrammed Microarchitectures n Pipelining n Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery, … n Out-of-Order Execution n Issues in OoO Execution: Load-Store Handling, … 2 Reminder on Assignments n Lab 2 due next Friday (Feb 6) q Start early! n HW 1 due today n HW 2 out n Remember that all is for your benefit q Homeworks, especially so q All assignments can take time, but the goal is for you to learn very well 3 Lab 1 Grades 25 20 15 10 5 Number of Students 0 30 40 50 60 70 80 90 100 n Mean: 88.0 n Median: 96.0 n Standard Deviation: 16.9 4 Extra Credit for Lab Assignment 2 n Complete your normal (single-cycle) implementation first, and get it checked off in lab. n Then, implement the MIPS core using a microcoded approach similar to what we will discuss in class. n We are not specifying any particular details of the microcode format or the microarchitecture; you can be creative. n For the extra credit, the microcoded implementation should execute the same programs that your ordinary implementation does, and you should demo it by the normal lab deadline. n You will get maximum 4% of course grade n Document what you have done and demonstrate well 5 Readings for Today n P&P, Revised Appendix C q Microarchitecture of the LC-3b q Appendix A (LC-3b ISA) will be useful in following this n P&H, Appendix D q Mapping Control to Hardware n Optional q Maurice Wilkes, “The Best Way to Design an Automatic Calculating Machine,” Manchester Univ. -
Branch-Directed and Pointer-Based Data Cache Prefetching
Branch-directed and Pointer-based Data Cache Prefetching Yue Liu Mona Dimitri David R. Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA Abstract The design of the on-chip cache memory and branch prediction logic has become an integral part of a microprocessor implementation. Branch predictors reduce the effects of control hazards on pipeline performance. Branch prediction implementations have been proposed which eliminate a majority of the pipeline stalls associated with branches. Caches are commonly used to reduce the performance gap between microprocessor and memory technology. Caches can only provide this benefit if the requested address is in the cache when requested. To increase the probability that addresses are present when requested, prefetching can be employed. Prefetching attempts to prime the cache with instructions and data which will be accessed in the near future. The work presented here describes two new prefetching algorithms. The first ties data cache prefetching to branches in the instruction stream. History of the data stream is incorporated into a Branch Target Buffer (BTB). Since branch behavior determines program control flow, data access patterns are also dependent upon this behavior. Results indicate that combining this strategy with tagged prefetching can significantly improve cache hit ratios. While improving cache hit rates is important, the resulting memory bus traffic introduced can be an issue. Our prefetching technique can also significantly reduce the overall memory bus traffic. The second prefetching scheme dynamically identifies pointer-based data references in the data stream, and records these references in order to successfully prime the cache with the next element in the linked data structure in advance of its access. -
Lecture 14: Gpus
LECTURE 14 GPUS DANIEL SANCHEZ AND JOEL EMER [INCORPORATES MATERIAL FROM KOZYRAKIS (EE382A), NVIDIA KEPLER WHITEPAPER, HENNESY&PATTERSON] 6.888 PARALLEL AND HETEROGENEOUS COMPUTER ARCHITECTURE SPRING 2013 Today’s Menu 2 Review of vector processors Basic GPU architecture Paper discussions 6.888 Spring 2013 - Sanchez and Emer - L14 Vector Processors 3 SCALAR VECTOR (1 operation) (N operations) r1 r2 v1 v2 + + r3 v3 vector length add r3, r1, r2 vadd.vv v3, v1, v2 Scalar processors operate on single numbers (scalars) Vector processors operate on linear sequences of numbers (vectors) 6.888 Spring 2013 - Sanchez and Emer - L14 What’s in a Vector Processor? 4 A scalar processor (e.g. a MIPS processor) Scalar register file (32 registers) Scalar functional units (arithmetic, load/store, etc) A vector register file (a 2D register array) Each register is an array of elements E.g. 32 registers with 32 64-bit elements per register MVL = maximum vector length = max # of elements per register A set of vector functional units Integer, FP, load/store, etc Some times vector and scalar units are combined (share ALUs) 6.888 Spring 2013 - Sanchez and Emer - L14 Example of Simple Vector Processor 5 6.888 Spring 2013 - Sanchez and Emer - L14 Basic Vector ISA 6 Instr. Operands Operation Comment VADD.VV V1,V2,V3 V1=V2+V3 vector + vector VADD.SV V1,R0,V2 V1=R0+V2 scalar + vector VMUL.VV V1,V2,V3 V1=V2*V3 vector x vector VMUL.SV V1,R0,V2 V1=R0*V2 scalar x vector VLD V1,R1 V1=M[R1...R1+63] load, stride=1 VLDS V1,R1,R2 V1=M[R1…R1+63*R2] load, stride=R2 -
Am186em and Am188em User's Manual
Am186EM and Am188EM Microcontrollers User’s Manual © 1997 Advanced Micro Devices, Inc. All rights reserved. Advanced Micro Devices, Inc. ("AMD") reserves the right to make changes in its products without notice in order to improve design or performance characteristics. The information in this publication is believed to be accurate at the time of publication, but AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication or the information contained herein, and reserves the right to make changes at any time, without notice. AMD disclaims responsibility for any consequences resulting from the use of the information included in this publication. This publication neither states nor implies any representations or warranties of any kind, including but not limited to, any implied warranty of merchantability or fitness for a particular purpose. AMD products are not authorized for use as critical components in life support devices or systems without AMD’s written approval. AMD assumes no liability whatsoever for claims associated with the sale or use (including the use of engineering samples) of AMD products except as provided in AMD’s Terms and Conditions of Sale for such products. Trademarks AMD, the AMD logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Am386 and Am486 are registered trademarks, and Am186, Am188, E86, AMD Facts-On-Demand, and K86 are trademarks of Advanced Micro Devices, Inc. FusionE86 is a service mark of Advanced Micro Devices, Inc. Product names used in this publication are for identification purposes only and may be trademarks of their respective companies. -
Class-Action Lawsuit
Case 3:20-cv-00863-SI Document 1 Filed 05/29/20 Page 1 of 279 Steve D. Larson, OSB No. 863540 Email: [email protected] Jennifer S. Wagner, OSB No. 024470 Email: [email protected] STOLL STOLL BERNE LOKTING & SHLACHTER P.C. 209 SW Oak Street, Suite 500 Portland, Oregon 97204 Telephone: (503) 227-1600 Attorneys for Plaintiffs [Additional Counsel Listed on Signature Page.] UNITED STATES DISTRICT COURT DISTRICT OF OREGON PORTLAND DIVISION BLUE PEAK HOSTING, LLC, PAMELA Case No. GREEN, TITI RICAFORT, MARGARITE SIMPSON, and MICHAEL NELSON, on behalf of CLASS ACTION ALLEGATION themselves and all others similarly situated, COMPLAINT Plaintiffs, DEMAND FOR JURY TRIAL v. INTEL CORPORATION, a Delaware corporation, Defendant. CLASS ACTION ALLEGATION COMPLAINT Case 3:20-cv-00863-SI Document 1 Filed 05/29/20 Page 2 of 279 Plaintiffs Blue Peak Hosting, LLC, Pamela Green, Titi Ricafort, Margarite Sampson, and Michael Nelson, individually and on behalf of the members of the Class defined below, allege the following against Defendant Intel Corporation (“Intel” or “the Company”), based upon personal knowledge with respect to themselves and on information and belief derived from, among other things, the investigation of counsel and review of public documents as to all other matters. INTRODUCTION 1. Despite Intel’s intentional concealment of specific design choices that it long knew rendered its central processing units (“CPUs” or “processors”) unsecure, it was only in January 2018 that it was first revealed to the public that Intel’s CPUs have significant security vulnerabilities that gave unauthorized program instructions access to protected data. 2. A CPU is the “brain” in every computer and mobile device and processes all of the essential applications, including the handling of confidential information such as passwords and encryption keys.