Proceedings of CA 2004 the Fall 2004 ENGR 3410 Computer Architecture

Franklin W. Olin College of Engineering Proceedings of CA 2004 the Fall 2004 ENGR 3410 Computer Architecture Class <[email protected]> Needham, Massachusetts, December 2004 Contents Preface Mark L. Chang . ii Digital Signal Processors Grant R. Hutchins, Leighton M. Ige . 2 FPGA Architecture, Algorithms, and Applications Kathleen King, Sarah Zwicker . 7 A History of Nintendo Architecture Jesus Fernandez, Steve Shannon . 13 Graphical Processing Units: An Examination of Use and Function Michael Crayton, Mike Foss, Dan Lindquist, Kevin Tostado . 17 Dual Core On-Chip Multiprocessors and the IBM Power5 Thomas C. Cecil . 24 Pong on a Xilinx Spartan3 Field Programmable Gate Array Kate Cummings, Chris Murphy, Joy Poisel . 28 The PowerPC Architecture: A Taxonomy, History, and Analysis Alex Dorsk, Katie Rivard, Nick Zola . 32 A Brief Introduction to Quantum Computation Michael W. Curtis . 38 Beauty and the Beast: AltiVec and MMX Vector Processing Units Drew Harry, Janet Tsai . 44 VLIW and Superscalar Processing Sean Munson, Clara Cho . 49 i Preface The members of the Fall 2004 ENGR 3410 Computer Architecture class are pleased to present the Proceedings of CA 2004. This is the first offering of the class, and this document represents only a portion of the work done by the students this semester. Presented here are the final papers describing their end of semester projects. The students were given a few weeks to research any Computer Architecture topic that interested them. The deliverables included a brief in-class presenta- tion as well as the documents compiled here. The goal of the project was to encourage group learning of advanced Computer Architecture topics not covered in class and give the students a chance to develop independent research skills. The projects were wonderfully varied, ranging from literature reviews to actual hardware implementations. It is important to note for the reader that several students chose to complete an optional lab as part of the final project and implemented a pipelined MIPS processor in Verilog. This work is not reflected in these proceedings, however the scope of these students’ research projects were adjusted accordingly. As the instructor for CA 2004, I am most proud of the work accomplished by the students this semester. I would like to thank them for their hard work and their creative zeal. For more information regarding Computer Architec- ture at F. W. Olin College of Engineering, please visit our class web page at http://ca.ece.olin.edu. Mark L. Chang Assistant Professor Franklin W. Olin College of Engineering ii PROCEEDINGS OF THE FALL 2004 ENGR 3410: COMPUTER ARCHITECTURE CLASS (CA2004), FRANKLIN W. OLIN COLLEGE OF ENGINEERING 2 Digital Signal Processors Grant R. Hutchins, Leighton M. Ige Abstract— Digital Signal Processors (DSPs) operate on digital components. Many audio receivers offer the capability of signals in realtime applications. They have several benefits real time equalization adjustments to audio playback. The over other processor architectures, due to specialized designs computationally intensive algorithms for transforming audio and instruction sets. This article offers an overview of general properties of DSPs and comments on their applications in the to sound as if it is being played in a large concert hall or a real world. dance hall are handled by DSPs. Additionally, many filtering and noise reduction algorithms can be implemented to clean I. INTRODUCTION up audio signals. Convolution and Fast Fourier Transforms We live in an analog world composed of continuous func- require many sequential multiplications and additions, a pro- tions: the light you see, the sound you hear, and the words you cedure for which DSPs are optimized. speak. However, for many reasons these signals are converted DSPs are also very flexible in their implementation as they into digital representations for processing and transmission. It can be reprogrammed to perform different calculations. For is often cheaper, more accurate, and computationally simpler example, a cell phone provider could decide to support a new to manipulate a digital representation of a continuous signal, data feature on its phones. The DSP could then be updated to since there are discrete time slices of data to process instead decode the new data packets being sent to the phone. of a signal with infinite resolution. The power of the digital While there are currently many DSPs implemented in revolution lies in the ability of digital systems to approximate everyday devices, they are becoming more prevalent due to the analog world. Once in a discrete representation, signals their high performance, flexibility, and continually decreasing can be digitally processed to have different effects, such as prices. The capability to perform complex manipulations on the reduction of background noise in a microphone feed or data in real time can enable many systems to incorporate more the encryption and compression of a cell phone call. inputs from the world. Digital signal processors (DSP) are specific microprocessors that specialize in the manipulation of signals in real time. II. COMMON DSP TASKS In the specific applications of devices that interface with the In digital signal processing, a few simple types of algorithms world, live inputs must be sampled, processed, decoded, and dominate. Thus, DSPs highlight their abilities to perform outputted on the fly. Thus, DSPs must be designed to be able functions to to complete a lot of computations very quickly to be effective. The rate at which these DSPs operate must be at least as fast A. Multiply-Accumulate as the rate of its input, otherwise backlogs will be created. Once the process no longer proceeds in real time you lose the Time domain filters are a common operation in digital signal synchronization between inputs and outputs, with the signal processing. Digital filters are implemented by taking streams becoming decoupled. This scenario can be likened to that of a of current and prior input and output data, scaling them with cell phone conversation in which you had to wait a long time coefficients that perform a certain filter function, and summing for your phone to decode the received audio signal from your them to generate the next output. correspondent. When there is too long of a delay many of the Some filters have hundreds or thousands of coefficients. benefits of a synchronous conversation are lost and one would Each of these multiplication operations must occur in order be no worse off sending a written letter. to generate each output sample. However, data rates in audio Today, by far, the largest implementation of DSPs is in DSP applications are often as high as 96,000 samples per cell phones. Every cell phone sold contains a DSP, which second (video can go much higher), meaning that millions of is necessary to perform all the encoding and decoding of multiplications and accumulations must happen every second. the audio signals which are transmitted through the wireless For this reason, DSPs are optimized for and benchmarked communications network. Additionally, most new phones now on their ability to perform multiply-accumulate operations. A have advanced multimedia capabilities which need to be common metric for DSP processors is MMACS, or millions processed in real time. These functions such as picture taking, of multiply-accumulates per second. music playing, and video recording, along with their non- integrated device counterparts, require DSPs for many fast B. Fast Fourier Transform (FFT) computations. DSPs can be found at the core of many of the In most situations, signal data is represented in the time products in the growing multimedia device segment. domain. That is, each separate word of signal data represents Within the audio domain, DSPs are rather prevalent and a discrete sample of the signal in time. However, many can be found in CD players as well as high end home audio functions, such as equalization and pitch-shifting, operate on PROCEEDINGS OF THE FALL 2004 ENGR 3410: COMPUTER ARCHITECTURE CLASS (CA2004), FRANKLIN W. OLIN COLLEGE OF ENGINEERING 3 Fig. 2. Super Harvard Architecture (taken from [1]) Fig. 1. Typical architecture for a DSP (taken from[1]) 2) Harvard Architecture: The Harvard architecture was developed by a group led by Howard Aiken at Harvard Univer- the frequency domain, where each word represents the amount sity during the 1940s. The Harvard architecture employs two of information present in the signal at a particular frequency. separate memories, one for instructions and one for data. The The Fast Fourier Transform is an algorithm which takes processor has independent buses, so that program instructions time domain data and converts it into corresponding frequency and data can be fetched simultaneously. domain data. In order to do so, the data must be reshuffled Most DSPs use some sort of Harvard architecture. However, and processed through multiple loops. Using a similar routine, most digital signal processing routines are data intensive the Inverse Fast Fourier Transform converts frequency domain and thus require more than one load from data memory for data back into time domain data. each load from instruction memory. The original Harvard architecture does not support III. ARCHITECTURE AND DATAPATH 3) Super Harvard Architecture (SHARC: Some products DSPs have several architectural properties that set them from Analog Devices use what they call a “Super Harvard ar- apart from other classes of processors. Each optimization has chitecture” (SHARC), such as their ADSP-2016x and ADSP- the common applications of DSPs in mind and aims to reduce 211xx families of DSPs. The SHARC processors add an the overhead associated with manipulating large digital signals instruction cache to the CPU and a dedicated I/O controller to in a repetitive fashion. the data memory. A. Memory The SHARC instruction cache takes advantage of the fact that digital signal processing routines make use of long repeti- DSPs access memory in differing ways.

Proceedings of CA 2004 the Fall 2004 ENGR 3410 Computer Architecture

CPU ボードカタログサポート CPU Intel ：Core I7、Xeon-E5 Freescale ：T4240、P4080、MPC8640D AMD ：Radeon HD 6970M、HD 7970M GPGPU NVIDIA ：Fermi、Kepler Architecture GPGPU

Charactersing the Limits of the Openflow Slow-Path

SO630 Manual

Avionics Hardware Issues 2010/11/19 Chih-Hao Sun Avionics Software--Hardware Issue -History

Ebook - Informations About Operating Systems Version: August 15, 2006 | Download

Parallel Rendering on Hybrid Multi-GPU Clusters

Average Memory Access Time: Reducing Misses

Chapter 1. Origins of Mac OS X

Parallel Texture-Based Vector Field Visualization on Curved Surfaces Using GPU Cluster Computers

A Developer's Guide to the POWER Architecture

Contents of DOWNLOAD.ZIP Contents of This

Evolution of the Graphical Processing Unit