Lecture 10A: Digital Signal Processors: a TI Architectural History Collated By: Professor Kurt Keutzer Computer Science 252, Spring 2000 with Contributions From: Dr

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 10A: Digital Signal Processors: a TI Architectural History Collated By: Professor Kurt Keutzer Computer Science 252, Spring 2000 with Contributions From: Dr Lecture 10a: Digital Signal Processors: A TI Architectural History Collated by: Professor Kurt Keutzer Computer Science 252, Spring 2000 With contributions from: Dr. Brock Barton, Clark Hise TI; Dr. Surendar S. Magar, Berkeley Concept Research Corporation 1 DSP ARCHITECTURE EVOLUTION Multipliers (MUL) Multiprocessors (MP) Multi-Processing Video/Imaging W-CDMA Radars DSP Building Blocks Function/Application Specific Digital Radios & Bit Slice Processors (MUL, etc.) ( MP) High-End Control Modems DSP µP and RISC Voice Coding ( MP ) Instruments µC and Analog µ Low-End P Modems Application Examples Application Industrial Control 1980 1985 1990 1995 2 DSP ARCHITECTURE Enabling Technologies Time Frame Approach Primary Application Enabling Technologies Early 1970’s • Discrete logic • Non-real time • Bipolar SSI, MSI procesing • FFT algorithm • Simulation Late 1970’s • Building block • Military radars • Single chip bipolar multiplier • Digital Comm. • Flash A/D Early 1980’s • Single Chip DSP µP • Telecom • µP architectures • Control • NMOS/CMOS Late 1980’s • Function/Application • Computers • Vector processing specific chips • Communication • Parallel processing Early 1990’s • Multiprocessing • Video/Image Processing • Advanced multiprocessing • VLIW, MIMD, etc. Late 1990’s • Single-chip • Wireless telephony • Low power single-chip DSP multiprocessing • Internet related • Multiprocessing 3 Texas Instruments TMS320 Family Multiple DSP µP Generations First Bit Size Clock Instruction MAC MOPS Device density (# Sample speed Throughput execution of transistors) (MHz) (ns) Uniprocessor Based ( Harvard Architecture) TMS32010 1982 16 integer 20 5 MIPS 400 5 58,000 (3µ) TMS320C25 1985 16 integer 40 10 MIPS 100 20 160,000 (2µ) TMS320C30 1988 32 flt.pt. 33 17 MIPS 60 33 695,000 (1µ) TMS320C50 1991 16 integer 57 29 MIPS 35 60 1,000,000 (0.5µ) TMS320C2XXX 1995 16 integer 40 MIPS 25 80 Multiprocessor Based TMS320C80 1996 32 integer/flt. 2 GOPS MIMD 120 MFLOP TMS320C62XX 1997 16 integer 1600 MIPS 5 20 GOPS VLIW TMS310C67XX 1997 32 flt. pt. 5 1 GFLOP VLIW 4 First Generation DSP µP Case Study TMS32010 (Texas Instruments) - 1982 Features u 200 ns instruction cycle (5 MIPS) u 144 words (16 bit) on-chip data RAM u 1.5K words (16 bit) on-chip program ROM - TMS32010 u External program memory expansion to a total of 4K words at full speed u 16-bit instruction/data word u single cycle 32-bit ALU/accumulator u Single cycle 16 x 16-bit multiply in 200 ns u Two cycle MAC (5 MOPS) u Zero to 15-bit barrel shifter u Eight input and eight output channels 5 TMS32010 BLOCK DIAGRAM 6 TMS32010 Program Memory Maps Microcomputer Mode Microprocessor Mode Address 16-bit word 16-bit word 0 Reset 1st Word 0 Reset 1st Word 1 Reset 2nd Word 1 Reset 2nd Word Internal Memory 2 Interrupt 2 Interrupt Space External Memory 1525 Space Internal Memory Space Reserved For Testing 1536 External Memory Space 4095 4095 7 Digital FIR Filter Implementation (Uniprocessor-Circular Buffer) Start each Time here 1st. Cycle 2nd. Cycle End X0 Start a a a a Start n-1 n-2 1 0 X1 X2 a a 0 n-1 X3 X4 X X5 X n-1 End Replace + starting value Acc with new value 8 TMS32010 FIR FILTER PROGRAM Indirect Addressing (Smaller Program Space) Y(n) = x[n-(N-1)] . h(N-1) + x[n-(N-2)] . h(N-2) +…+ x(n) . h(0) For N=50, Indirect Addressing t=42 µs (23.8 KHz) 9 For N=50, Direct Addressing t=21.6 µs (40.2 KHz) TMS320C203/LC203 BLOCK DIAGRAM DSP Core Approach - 1995 10 Third Generation DSP µP Case Study TMS320C30 - 1988 TMS320C30 Key Features u 60 ns single-cycle instruction execution time n 33.3 MFLOPS (million floating-point operations per second) n 16.7 MIPS (million instructions per second) u One 4K x 32-bit single-cycle dual-access on-chip ROM block u Two 1K x 32-bit single-cycle dual-access on-chip RAM blocks u 64 x 32-bit instruction cache u 32-bit instruction and data words, 24-bit addresses u 40/32-bit floating-point/integer multiplier and ALU u 32-bit barrel shifter 11 Third Generation DSP µP Case Study TMS320C30 - 1988 TMS320C30 Key Features (cont.) u Eight extended precision registers (accumulators) u Two address generators with eight auxiliary registers and two auxiliary register arithmetic units u On-chip direct memory Access (DMA) controller for concurrent I/O and CPU operation u Parallel ALU and multiplier instructions u Block repeat capability u Interlocked instructions for multiprocessing support u Two serial ports to support 8/16/32-bit transfers u Two 32-bit timers u 1 µ CDMOS Process 12 TMS320C30 BLOCK DIAGRAM 13 TMS320C3x CPU BLOCK DIAGRAM 14 TMS320C3x MEMORY BLOCK DIAGRAM 15 TMS320C30 Memory Organization Oh Interrupt locations Oh Interrupt locations & reserved (192) & reserved (192) BFh external STRB active BFh COh External COh ROM STRB Active 7FFFFFh 0FFFh (Internal) 800000h 1000h 7FFFFFh Expansion BUS MSTRB Expansion BUS MSTRB 800000h Active (8K) 801FFFh Active (8K) 801FFFh 802000h Reserved Reserved 802000h (8K) 803FFFh (8K) 803FFFh 804000h Expansion Bus Expansion Bus 804000h 805FFFh IOSTRB Active (8K) IOSTRB Active (8K) 805FFFh 806000h Reserved Reserved 806000h 807FFFH (8K) (8K) 807FFFH 80800h Peripheral Bus Memory Mapped Peripheral Bus Memory Mapped 80800h 8097FFh Registers (Internal) (6K) Registers (Internal) (6K) 8097FFh 809800h RAM Block 0 (1K) RAM Block 0 (1K) 809800h 809BFFh (Internal) (Internal) 809BFFh 809C00h RAM Block 1 (1K) RAM Block 1 (1K) 809C00h 809FFFh (Internal) (Internal) 80A00h 809FFFh External 80A00h External 0FFFFFFh STRB Active STRB Active 0FFFFFFh Microprocessor Mode Microcomputer Mode 16 TMS320C30 FIR FILTER PROGRAM Y(n) = x[n-(N-1)] . h(N-1) + x[n-(N-2)] . h(N-2) +…+ x(n) . h(0) For N=50, t=3.6 µs (277 KHz) 17 ‘C54x Architecture 18 TMS320C54x Internal Block Diagram 19 Architecture optimized for DSP #1: CPU designed for efficient DSP processing n MAC unit, 2 Accumulators, Additional Adder, Barrel Shifter #2: Multiple busses for efficient data and program flow n Four busses and large on-chip memory that result in sustained performance near peak #3: Highly tuned instruction set for powerful DSP computing n Sophisticated instructions that execute in fewer cycles, with less code and low power demands 20 Key #1: DSP engine 40 å Y = an * xn n = 1 x a MPY ADD y 21 Key #1: MAC Unit MAC *AR2+, *AR3+, A Data Acc A Temp Coeff Prgm Data Acc A S/U S/U MPY A Fractional B Mode Bit ADD O acc A acc B 22 Key #1: Accumulators + Adder General-Purpose Math example: t = s+e-r A Bus B Bus A B C T D Shifter LD @s, A acc A acc B ALU ADD @e, A MUX U Bus SUB @r, A STL A, @t A B MAC 23 Key #1: Barrel shifter LD @X, 16, A STH @B, Y A B C D Barrel Shifter (-16-+31) S Bus ALU E Bus 24 Key #1: Temporary register LD @x, T MPY @a, A D X EXP A Encoder B Temporary For example: Register A = xa T Bus MAC ALU 25 Key #2: Efficient data/program flow #1: CPU designed for efficient DSP processing n MAC unit, 2 Accumulators, Additional Adder, Barrel Shifter #2: Multiple busses for efficient data and program flow n Four busses and large on-chip memory that result in sustained performance near peak #3: Highly tuned instruction set for powerful DSP computing n Sophisticated instructions that execute in fewer cycles, with less code and low power demands 26 Key #2: Multiple busses MAC *AR2+, *AR3+, A EXTERNAL P MEMORY MM UU D MM XX C UU EE XX E MEMORY INTERNAL SS C D Central Arithmetic T MAC A B ALU SHIFTER Logic Unit M 27 Key #2: Pipeline Prefetch Fetch Decode Access Read Execute P F D A R E K Prefetch: Calculate address of instruction K Fetch: Collect instruction K Decode: Interpret instruction K Access: Collect address of operand K Read: Collect operand K Execute: Perform operation 28 Key #2: Bus usage CNTL PC ARs EXTERNAL P MEMORY MM UU D MM XX C UU EE XX MEMORY E INTERNAL SS Central Arithmetic T MAC A B ALU SHIFTER Logic Unit 29 Key #2: Pipeline performance CYCLES P1 F1 D1 A1 R1 X1 P2 F2 D2 A2 R2 X2 P3 F3 D3 A3 R3 X3 P4 F4 D4 A4 R4 X4 P5 F5 D5 A5 R5 X5 P6 F6 D6 A6 R6 X6 Fully loaded pipeline 30 Key #3: Powerful instructions #1: CPU designed for efficient DSP processing n MAC Unit, 2 Accumulators, Additional Adder, Barrel Shifter #2: Multiple busses for efficient data and program flow n Four busses and large on-chip memory that result in sustained performance near peak #3: Highly tuned instruction set for powerful DSP computing n Sophisticated instructions that execute in fewer cycles, with less code and low power demands 31 Key #3: Advanced applications Symmetric FIR filter FIRS Adaptive filtering LMS Polynomial evaluation POLY Code book search STRCD SACCD SRCCD Viterbi DADST DSADT CMPS 32 C62x Architecture 33 TMS320C6201 Revision 2 Program Cache / Program Memory 32-bit address, 256-Bit data512K Bits RAM Pwr C6201 CPU Megamodule Program Fetch Dwn Control Host Instruction Dispatch Registers Port Instruction Decode Interface Data Path 1 Data Path 2 Control 4- Logic DMA A Register File B Register File Test Emulation L1 S1 M1 D1 D2 M2 S2 L2 Ext. Interrupts Memory Interface 2 Timers 2 Multi- channel Data Memory buffered 32-Bit address, 8-, 16-, 32-Bit data serial ports 512K Bits RAM (T1/E1) 34 C6201 Internal Memory Architecture K Separate Internal Program and Data Spaces K Program n 16K 32-bit instructions (2K Fetch Packets) n 256-bit Fetch Width n Configurable as either w Direct Mapped Cache, Memory Mapped Program Memory K Data n 32K x 16 n Single Ported Accessible by Both CPU Data Buses n 4 x 8K 16-bit Banks w 2 Possible Simultaneous Memory Accesses (4 Banks) w 4-Way Interleave, Banks and Interleave Minimize Access Conflicts 35 C62x Interrupts K 12 Maskable Interrupts , Non-Maskable Interrupt (NMI) K Interrupt
Recommended publications
  • Implementation of Optimized CORDIC Designs
    IJIRST –International Journal for Innovative Research in Science & Technology| Volume 1 | Issue 3 | August 2014 ISSN(online) : 2349-6010 Implementation of Optimized CORDIC Designs Bibinu Jacob Reenesh Zacharia M Tech Student Assistant Professor Department of Electronics & Communication Engineering Department of Electronics & Communication Engineering Mangalam College Of Engineering, Ettumanoor Mangalam College Of Engineering, Ettumanoor Abstract CORDIC stands for COordinate Rotation DIgital Computer, which is likewise known as Volder's algorithm and digit-by-digit method. It is a simple and effective method to calculate many functions like trigonometric, hyperbolic, logarithmic etc. It performs complex multiplications using simple shifts and additions. It finds applications in robotics, digital signal processing, graphics, games, and animation. But, there isn't any optimized CORDIC designs for rotating vectors through specific angles. In this paper, optimization schemes and CORDIC circuits for fixed and known rotations are proposed. Finally an argument reduction technique to map larger angle to an angle less than 45 is discussed. The proposed CORDIC cells are simulated by Xilinx ISE Design Suite and shown that the proposed designs offer less latency and better device utilization than the reference CORDIC design for fixed and known angles of rotation. Keywords: Coordinate rotation digital computer (CORDIC), digital signal processing (DSP) chip, very large scale integration (VLSI) _______________________________________________________________________________________________________ I. INTRODUCTION The advances in the very large scale integration (VLSI) technology and the advent of electronic design automation (EDA) tools have been directing the current research in the areas of digital signal processing (DSP), communications, etc. in terms of the design of high speed VLSI architectures for real-time algorithms and systems which have applications in the above mentioned areas.
    [Show full text]
  • Master's Thesis
    Implementation of the Metal Privileged Architecture by Fatemeh Hassani A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Masters of Mathematics in Computer Science Waterloo, Ontario, Canada, 2020 c Fatemeh Hassani 2020 Author's Declaration I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract The privileged architecture of modern computer architectures is expanded through new architectural features that are implemented in hardware or through instruction set extensions. These extensions are tied to particular architecture and operating system developers are not able to customize the privileged mechanisms. As a result, they have to work around fixed abstractions provided by processor vendors to implement desired functionalities. Programmable approaches such as PALcode also remain heavily tied to the hardware and modifying the privileged architecture has to be done by the processor manufacturer. To accelerate operating system development and enable rapid prototyping of new operating system designs and features, we need to rethink the privileged architecture design. We present a new abstraction called Metal that enables extensions to the architecture by the operating system. It provides system developers with a general-purpose and easy- to-use interface to build a variety of facilities that range from performance measurements to novel privilege models. We implement a simplified version of the Alpha architecture which we call µAlpha and build a prototype of Metal on this architecture.
    [Show full text]
  • A Secure, Low Cost Synchophasor Measurement Device
    University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 8-2015 A 3rd Generation Frequency Disturbance Recorder: A Secure, Low Cost Synchophasor Measurement Device Jerel Alan Culliss University of Tennessee - Knoxville, [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss Part of the Power and Energy Commons, Systems and Communications Commons, and the VLSI and Circuits, Embedded and Hardware Systems Commons Recommended Citation Culliss, Jerel Alan, "A 3rd Generation Frequency Disturbance Recorder: A Secure, Low Cost Synchophasor Measurement Device. " PhD diss., University of Tennessee, 2015. https://trace.tennessee.edu/utk_graddiss/3495 This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a dissertation written by Jerel Alan Culliss entitled "A 3rd Generation Frequency Disturbance Recorder: A Secure, Low Cost Synchophasor Measurement Device." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Doctor of Philosophy, with a major in Electrical Engineering. Yilu Liu, Major Professor We have read this dissertation and recommend its acceptance: Leon M. Tolbert, Wei Gao, Lee L. Riedinger Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) A 3rd Generation Frequency Disturbance Recorder: A Secure, Low Cost Synchrophasor Measurement Device A Dissertation Presented for the Doctor of Philosophy Degree The University of Tennessee, Knoxville Jerel Alan Culliss August 2015 Copyright © 2015 by Jerel A.
    [Show full text]
  • A Continuación Se Realizará Una Breve Descripción De Los Objetivos De Los Cuales Estará Formado El Trabajo Perteneciente
    UNIVERSIDAD POLITÉCNICA DE MADRID Escuela Universitaria de Ingeniera Técnica de Telecomunicación INTEGRACIÓN MPLAYER – OPENSVC EN EL PROCESADOR MULTINÚCLEO OMAP3530 TRABAJO FIN DE MÁSTER Autor: Óscar Herranz Alonso Ingeniero Técnico de Telecomunicación Tutor: Fernando Pescador del Oso Doctor Ingeniero de Telecomunicación Julio 2012 2 4 AGRADECIMIENTOS Un año y pocos meses después de la defensa de mi Proyecto Fin de Carrera me vuelvo a encontrar en la misma situación: escribiendo estás líneas de mi Trabajo Fin de Máster para agradecer a aquellas personas que me han apoyado y ayudado de alguna forma durante esta etapa de mi vida. Después de un año en el que he hecho demasiadas cosas importantes en mi vida (y sorprendentemente todas bien), ha llegado la hora de dar por finalizado el Máster, el último paso antes de cerrar mi carrera de estudiante. Por ello, me gustaría agradecer en primer lugar al Grupo de Investigación GDEM por darme la oportunidad de formar parte de su equipo y en especial a Fernando, persona ocupada donde las haya, pero que una vez más me ha aconsejado en numerosas ocasiones el camino a seguir para solventar los problemas. Gracias Fernando por tu tiempo y dedicación. Agradecer a mis padres, Fidel y Victoria, y a mi hermano, Víctor, el apoyo y las fuerzas recibidas en todo momento. Sé que no todos podréis estar presentes en mi defensa de este Trabajo Fin de Máster pero da igual. Ya me habéis demostrado con creces lo maravillosos que sois. Gracias por vuestro apoyo incondicional y por recibirme siempre con una sonrisa dibujada en vuestro rostro.
    [Show full text]
  • Arithmetic and Logical Unit Design for Area Optimization for Microcontroller Amrut Anilrao Purohit 1,2 , Mohammed Riyaz Ahmed 2 and R
    et International Journal on Emerging Technologies 11 (2): 668-673(2020) ISSN No. (Print): 0975-8364 ISSN No. (Online): 2249-3255 Arithmetic and Logical Unit Design for Area Optimization for Microcontroller Amrut Anilrao Purohit 1,2 , Mohammed Riyaz Ahmed 2 and R. Venkata Siva Reddy 2 1Research Scholar, VTU Belagavi (Karnataka), India. 2School of Electronics and Communication Engineering, REVA University Bengaluru, (Karnataka), India. (Corresponding author: Amrut Anilrao Purohit) (Received 04 January 2020, Revised 02 March 2020, Accepted 03 March 2020) (Published by Research Trend, Website: www.researchtrend.net) ABSTRACT: Arithmetic and Logic Unit (ALU) can be understood with basic knowledge of digital electronics and any engineer will go through the details only once. The advantage of knowing ALU in detail is two- folded: firstly, programming of the processing device can be efficient and secondly, can design a new ALU architecture as per the various constraints of the use cases. The miniaturization of digital circuits can be achieved by either reducing the size of transistor (Moore’s law) or by optimizing the gate count of the circuit. The first has been explored extensively while the latter has been ignored which deals with the application of Boolean rules and requires sound knowledge of logic design. The ultimate outcome is to have an area optimized architecture/approach that optimizes the circuit at gate level. The design of ALU is for various processing devices varies with the device/system requirements. The area optimization places a significant role in the chip design. Here in this work, we have attempted to design an ALU which is area efficient while being loaded with additional functionality necessary for microcontrollers.
    [Show full text]
  • 8-Bit Barrel Shifter That May Rotate the Data in Both Direction
    © 2018 IJRAR December 2018, Volume 5, Issue 04 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138) Performance Analysis of Low Power CMOS based 8 bit Barrel Shifter Deepti Shakya M-Tech VLSI Design SRCEM Gwalior (M.P) INDIA Shweta Agrawal Assistant Professor, Dept. Electronics & Communication Engineering SRCEM Gwalior (M.P), INDIA Abstract Barrel Shifter isimportant function to optimize the RISC processor, so it is used for rotating and transferring the data either in left or right direction. This shifter is useful in lots of signal processing ICs. The Arithmetic and the Logical Shifters additionally can be changed by the Barrel Shifter Because with the rotation of the data it also supply the utility the data right, left changeall mathemetically or logically.A purpose of this paper is to design the CMOS 8 bit barrel shifter using universal gates with the help of CMOS logic and the most important 2:1 multiplexers (MUX). CMOS based 8 bit barrel shifter has implemented in this paper and compared in terms of delay and power using 90 nm and 45nm technologies. Key Words: CMOS, Low Power, Delay,Barrel Shifter, PMOS, NMOS, Cadence. 1. Introduction In RISC processor ALU plays math and intellectual operations. Number crunching operations operate enlargement, subtraction, addition and division such as sensible operations incorporates AND, OR, NOT, NAND, NOR, XNOR and XOR. RISC processor consist ofnotice in credentials used to keep the operand in store path. The point of control unit is to give a control flag that controls the operation of the processor which tells the small scale building design which operation is done after then the time Barrel shifter is a significant element among rise operation.
    [Show full text]
  • A Lisp Oriented Architecture by John W.F
    A Lisp Oriented Architecture by John W.F. McClain Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degrees of Master of Science in Electrical Engineering and Computer Science and Bachelor of Science in Electrical Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 1994 © John W.F. McClain, 1994 The author hereby grants to MIT permission to reproduce and to distribute copies of this thesis document in whole or in part. Signature of Author ...... ;......................... .............. Department of Electrical Engineering and Computer Science August 5th, 1994 Certified by....... ......... ... ...... Th nas F. Knight Jr. Principal Research Scientist 1,,IA £ . Thesis Supervisor Accepted by ....................... 3Frederic R. Morgenthaler Chairman, Depattee, on Graduate Students J 'FROM e ;; "N MfLIT oARIES ..- A Lisp Oriented Architecture by John W.F. McClain Submitted to the Department of Electrical Engineering and Computer Science on August 5th, 1994, in partial fulfillment of the requirements for the degrees of Master of Science in Electrical Engineering and Computer Science and Bachelor of Science in Electrical Engineering Abstract In this thesis I describe LOOP, a new architecture for the efficient execution of pro- grams written in Lisp like languages. LOOP allows Lisp programs to run at high speed without sacrificing safety or ease of programming. LOOP is a 64 bit, long in- struction word architecture with support for generic arithmetic, 64 bit tagged IEEE floats, low cost fine grained read and write barriers, and fast traps. I make estimates for how much these Lisp specific features cost and how much they may speed up the execution of programs written in Lisp.
    [Show full text]
  • 4-Bit Barrel Shifter Using Transmission Gates
    4-BIT BARREL SHIFTER USING TRANSMISSION GATES M.Muralikrishna1, Nadikuda yadagiri2, GuntupallySai chaitanya3, PalugulaPranay Reddy4 1ECE, Assistant Professor, Anurag Group of Institutions/JNTU University, Hyderabad 2, 3, 4ECE, Anurag Group of Institutions/JNTU University, Hyderabad Abstract In Arithmetic shifter the procedure for Barrel shifter plays an important role in the left shifting is same as logical but in case of floating point arithmetic operations and right shifting the empty spot is filled by signed optimizing the RISC processor and used for bit. Funnel Shifter is combination of all shifters rotating andshifting the data either in left or and rotator In this paper barrel shifter is right direction. This shifter is useful in many designed using multiplexer and implemented signal processing ICs. The arithmetic and using CMOS logic.The MUX used is made with logical shifter are itself a type of barrel the help of universal gates so that the time delay shifter. The main objective of this paper is to and power dissipation is reduced. design a fully custom two bit barrel shifter using 4x1 multiplexer with the help of II. BARREL SHIFTER transmission gates and analyze the A barrel shifter is a digital circuit that can performance on basis of power consumption, shift a data word by a specified number of bits and no of transistors. The tool used to fulfill without the use of any sequential logic, only the purpose is cadence virtuoso using 180nm pure combinatorial logic. One way to technology. implement it is as a sequence of multiplexers Keywords: Barrel shifter, Transmission where the output of one multiplexer is gates, Multiplexer connected to the input of the next multiplexer in a way that depends on the shift distance.
    [Show full text]
  • Tms320c54x DSP Functional Overview
    TMS320C54xt DSP Functional Overview Literature Number: SPRU307A September 1998 – Revised May 2000 Printed on Recycled Paper TMS320C54x DSP Functional Overview This document provides a functional overview of the devices included in the Texas Instrument TMS320C54xt generation of digital signal processors. Included are descriptions of the central processing unit (CPU) architecture, bus structure, memory structure, on-chip peripherals, and the instruction set. Detailed descriptions of device-specific characteristics such as package pinouts, package mechanical data, and device electrical characteristics are included in separate device-specific data sheets. Topic Page 1.1 Features. 2 1.2 Architecture. 7 1.3 Bus Structure. 13 1.4 Memory. 15 1.5 On-Chip Peripherals. 22 1.5.1 Software-Programmable Wait-State Generators. 22 1.5.2 Programmable Bank-Switching . 22 1.5.3 Parallel I/O Ports. 22 1.5.4 Direct Memory Access (DMA) Controller. 23 1.5.5 Host-Port Interface (HPI). 27 1.5.6 Serial Ports. 30 1.5.7 General-Purpose I/O (GPIO) Pins. 34 1.5.8 Hardware Timer. 34 1.5.9 Clock Generator. 34 1.6 Instruction Set Summary. 36 1.7 Development Support. 47 1.8 Documentation Support. 50 TMS320C54x DSP Functional Overview 1 Features 1.1 Features Table 1–1 provides an overview the TMS320C54x, TMS320LC54x, and TMS320VC54x fixed-point, digital signal processor (DSP) families (hereafter referred to as the ’54x unless otherwise specified). The table shows significant features of each device, including the capacity of on-chip RAM and ROM, the available peripherals, the CPU speed, and the type of package with its total pin count.
    [Show full text]
  • The Implementation of Prolog Via VAX 8600 Microcode ABSTRACT
    The Implementation of Prolog via VAX 8600 Microcode Jeff Gee,Stephen W. Melvin, Yale N. Patt Computer Science Division University of California Berkeley, CA 94720 ABSTRACT VAX 8600 is a 32 bit computer designed with ECL macrocell arrays. Figure 1 shows a simplified block diagram of the 8600. We have implemented a high performance Prolog engine by The cycle time of the 8600 is 80 nanoseconds. directly executing in microcode the constructs of Warren’s Abstract Machine. The imulemention vehicle is the VAX 8600 computer. The VAX 8600 is a general purpose processor Vimal Address containing 8K words of writable control store. In our system, I each of the Warren Abstract Machine instructions is implemented as a VAX 8600 machine level instruction. Other Prolog built-ins are either implemented directly in microcode or executed by the general VAX instruction set. Initial results indicate that. our system is the fastest implementation of Prolog on a commercrally available general purpose processor. 1. Introduction Various models of execution have been investigated to attain the high performance execution of Prolog programs. Usually, Figure 1. Simplified Block Diagram of the VAX 8600 this involves compiling the Prolog program first into an intermediate form referred to as the Warren Abstract Machine (WAM) instruction set [l]. Execution of WAM instructions often follow one of two methods: they are executed directly by a The 8600 consists of six subprocessors: the EBOX. IBOX, special purpose processor, or they are software emulated via the FBOX. MBOX, Console. and UO adamer. Each of the seuarate machine language of a general purpose computer.
    [Show full text]
  • A Hardware Monitor Using Tms320c40 Analysis Module
    Disclaimer: This document was part of the DSP Solution Challenge 1995 European Team Papers. It may have been written by someone whose native language is not English. TI assumes no liability for the quality of writing and/or the accuracy of the information contained herein. Implementing a Hardware Monitor Using the TMS320C40 Analysis Module and JTAG Interface for Performance Measurements in a Multi-DSP System Authors: R. Ginthor-Kalcsics, H. Eder, G. Straub, Dr. R. Weiss EFRIE, France December 1995 SPRA308 IMPORTANT NOTICE Texas Instruments (TI™) reserves the right to make changes to its products or to discontinue any semiconductor product or service without notice, and advises its customers to obtain the latest version of relevant information to verify, before placing orders, that the information being relied on is current. TI warrants performance of its semiconductor products and related software to the specifications applicable at the time of sale in accordance with TI’s standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support this warranty. Specific testing of all parameters of each device is not necessarily performed, except those mandated by government requirements. Certain application using semiconductor products may involve potential risks of death, personal injury, or severe property or environmental damage (“Critical Applications”). TI SEMICONDUCTOR PRODUCTS ARE NOT DESIGNED, INTENDED, AUTHORIZED, OR WARRANTED TO BE SUITABLE FOR USE IN LIFE-SUPPORT APPLICATIONS, DEVICES OR SYSTEMS OR OTHER CRITICAL APPLICATIONS. Inclusion of TI products in such applications is understood to be fully at the risk of the customer. Use of TI products in such applications requires the written approval of an appropriate TI officer.
    [Show full text]
  • Algorithms and Hardware Designs for Decimal Multiplication
    Algorithms and Hardware Designs for Decimal Multiplication by Mark A. Erle Presented to the Graduate and Research Committee of Lehigh University in Candidacy for the Degree of Doctor of Philosophy in Computer Engineering Lehigh University November 21, 2008 Approved and recommended for acceptance as a dissertation in partial ful¯llment of the requirements for the degree of Doctor of Philosophy. Date Accepted Date Dissertation Committee: Dr. Mark G. Arnold Chair Dr. Meghanad D. Wagh Member Dr. Brian D. Davison Member Dr. Michael J. Schulte External Member Dr. Eric M. Schwarz External Member Dr. William M. Pottenger External Member iii This dissertation is dedicated to Michele, Laura, Kristin, and Amy. v Acknowledgements Along the path to completing this dissertation, I have traversed the birth of my third daughter, the passing of my mother, and the passing of my father. Fortunately, I have not been alone, for I have been traveling with God, family, and friends. I am appreciative of the support I received from IBM all along the way. There are several colleagues with whom I had the good fortune to collaborate on several papers, and to whom I am very grateful, namely Dr. Eric Schwarz, Mr. Brian Hickmann, and Dr. Liang-Kai Wang. Additionally, I am thankful for the guidance and assistance from my dissertation committee which includes Dr. Michael Schulte, Dr. Mark Arnold, Dr. Meghanad Wagh, Dr. Brian Davison, Dr. Eric Schwarz, and Dr. William Pottenger. In particular, I am indebted to Dr. Schulte for his mentorship, encouragement, and rapport. Now, ¯nding myself at the end of this path, I see two roads diverging..
    [Show full text]