DSP56800 16-Bit Digital Signal Processor Family Manual

Total Page:16

File Type:pdf, Size:1020Kb

DSP56800 16-Bit Digital Signal Processor Family Manual DSP56800 Family Manual 16-Bit Digital Signal Controllers DSP56800FM Rev. 3.1 11/2005 freescale.com Contents Chapter 1 Introduction 1.1 DSP56800 Family Architecture . .1-1 1.1.1 Core Overview . .1-2 1.1.2 Peripheral Blocks . .1-3 1.1.3 Family Members . .1-5 1.2 Introduction to Digital Signal Processing. .1-5 1.3 Summary of Features . .1-9 1.4 For the Latest Information . .1-10 Chapter 2 Core Architecture Overview 2.1 Core Block Diagram . .2-1 2.1.1 Data Arithmetic Logic Unit (ALU) . .2-3 2.1.2 Address Generation Unit (AGU) . .2-3 2.1.3 Program Controller and Hardware Looping Unit . .2-4 2.1.4 Bus and Bit-Manipulation Unit. .2-5 2.1.5 On-Chip Emulation (OnCE) Unit . .2-5 2.1.6 Address Buses. .2-5 2.1.7 Data Buses . .2-5 2.2 Memory Architecture . .2-6 2.3 Blocks Outside the DSP56800 Core. .2-7 2.3.1 External Data Memory . .2-7 2.3.2 Program Memory . .2-8 2.3.3 Bootstrap Memory . .2-8 2.3.4 IP-BUS Bridge . .2-8 2.3.5 Phase Lock Loop (PLL) . .2-8 2.4 DSP56800 Core Programming Model . .2-8 Chapter 3 Data Arithmetic Logic Unit 3.1 Overview and Architecture. .3-2 3.1.1 Data ALU Input Registers (X0, Y1, and Y0) . .3-4 3.1.2 Data ALU Accumulator Registers . .3-4 3.1.3 Multiply-Accumulator (MAC) and Logic Unit . .3-5 3.1.4 Barrel Shifter. .3-5 3.1.5 Accumulator Shifter . .3-6 3.1.6 Data Limiter and MAC Output Limiter . .3-6 Freescale Semiconductor iii 3.2 Accessing the Accumulator Registers . .3-7 3.2.1 Accessing an Accumulator by Its Individual Portions . .3-8 3.2.2 Accessing an Entire Accumulator. .3-10 3.2.2.1 Accessing for Data ALU Operations . .3-10 3.2.2.2 Writing an Accumulator with a Small Operand . .3-10 3.2.2.3 Extension Registers as Protection Against Overflow . .3-10 3.2.2.4 Examples of Writing the Entire Accumulator . .3-11 3.2.3 General Integer Processing . .3-11 3.2.3.1 Writing Integer Data to an Accumulator . .3-11 3.2.3.2 Reading Integer Data from an Accumulator. .3-12 3.2.4 Using 16-Bit Results of DSC Algorithms. .3-12 3.2.5 Saving and Restoring Accumulators. .3-12 3.2.6 Bit-Field Operations on Integers in Accumulators . .3-13 3.2.7 Converting from 36-Bit Accumulator to 16-Bit Portion . .3-13 3.3 Fractional and Integer Data ALU Arithmetic . .3-14 3.3.1 Interpreting Data . .3-16 3.3.2 Data Formats. .3-17 3.3.2.1 Signed Fractional . .3-17 3.3.2.2 Unsigned Fractional . .3-17 3.3.2.3 Signed Integer . .3-18 3.3.2.4 Unsigned Integer. .3-18 3.3.3 Addition and Subtraction . .3-18 3.3.4 Logical Operations . .3-19 3.3.5 Multiplication . .3-19 3.3.5.1 Fractional Multiplication . .3-19 3.3.5.2 Integer Multiplication . .3-20 3.3.6 Division. .3-21 3.3.7 Unsigned Arithmetic. .3-22 3.3.7.1 Conditional Branch Instructions for Unsigned Operations. .3-22 3.3.7.2 Unsigned Multiplication . .3-22 3.3.8 Multi-Precision Operations. .3-23 3.3.8.1 Multi-Precision Addition and Subtraction . .3-23 3.3.8.2 Multi-Precision Multiplication . .3-23 3.4 Saturation and Data Limiting . .3-26 3.4.1 Data Limiter . .3-26 3.4.2 MAC Output Limiter . .3-28 3.4.3 Instructions Not Affected by the MAC Output Limiter . .3-29 3.5 Rounding. .3-30 3.5.1 Convergent Rounding . .3-30 3.5.2 Two’s-Complement Rounding . .3-31 3.6 Condition Code Generation . .3-33 3.6.1 36-Bit Destinations — CC Bit Cleared. .3-33 3.6.2 36-Bit Destinations — CC Bit Set . .3-34 3.6.3 20-Bit Destinations — CC Bit Cleared. .3-34 3.6.4 20-Bit Destinations — CC Bit Set . .3-34 3.6.5 16-Bit Destinations . .3-35 3.6.6 Special Instruction Types . .3-35 iv DSP56800 Family Manual Freescale Semiconductor 3.6.7 TST and TSTW Instructions. .3-36 3.6.8 Unsigned Arithmetic. .3-36 Chapter 4 Address Generation Unit 4.1 Architecture and Programming Model . .4-2 4.1.1 Address Registers (R0-R3) . .4-4 4.1.2 Stack Pointer Register (SP). .4-4 4.1.3 Offset Register (N) . .4-4 4.1.4 Modifier Register (M01). .4-5 4.1.5 Modulo Arithmetic Unit . .4-5 4.1.6.
Recommended publications
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Performance Evaluation of Vmware and Virtualbox
    2012 IACSIT Hong Kong Conferences IPCSIT vol. 29 (2012) © (2012) IACSIT Press, Singapore Performance Evaluation of VMware and VirtualBox Deepak K Damodaran1+, Biju R Mohan2, Vasudevan M S 3 and Dinesh Naik 4 Information Technology, National Institute of Technology Karnataka, India Abstract. Virtualization is a framework of dividing the resources of a computer into multiple execution environments. More specific it is a layer of software that provides the illusion of a real machine to multiple instances of virtual machines. Virtualization offers a lot of benefits including flexibility, security, ease to configuration and management, reduction of cost and so forth, but at the same time it also brings a certain degree of performance overhead. Furthermore, Virtual Machine Monitor (VMM) is the core component of virtual machine (VM) system and its effectiveness greatly impacts the performance of whole system. In this paper, we measure and analyze the performance of two virtual machine monitors VMware and VirtualBox using LMbench and IOzone, and provide a quantitative and qualitative comparison of both virtual machine monitors. Keywords: Virtualization, Virtual machine monitors, Performance. 1. Introduction Years ago, a problem aroused. How to run multiple operating systems on the same machine at the same time. The solution to this problem was virtual machines. Virtual machine monitor (VMM) the core part of virtual machines sits between one or more operating systems and the hardware and gives the illusion to each running operating system that it controls the machine. Behind the scenes, however the VMM actually is in control of the hardware, and must multiplex running operating systems across the physical resources of the machine.
    [Show full text]
  • INTERNATIONAL JOURNAL of ENGINEERING SCIENCES & RESEARCH TECHNOLOGY a Media Player Based on ARM by Porting of Linux Archan Agrawal*, Mrs
    [Agrawal, 3(12): December, 2014] ISSN: 2277-9655 Scientific Journal Impact Factor: 3.449 (ISRA), Impact Factor: 2.114 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A Media Player Based On ARM by Porting of Linux Archan Agrawal*, Mrs. Monika S. Joshi Department of Electronics & Comm. Engg., Marathwada Institute of Technology, Aurangabad, India Abstract This paper describes the porting of embedded linux on ARM 9 platform for designing and implementing of embedded media player on S3Cmini2440 development board. A novel transplating method for linux kernel is presented here, Linux kernel as well as its cut,compile,and porting process under ARM platform are introduced. Optimized linux operating system in the processor has been installed and transplanted the SDL_FFMPEG library into S3Cmini2440 after the cross compilation. Ofthis whole system come together in playing the audio/video & picture formats files smoothly & effectively. Keywords: Embedded Linux, ARM9,Porting, S3Cmini2440, Compilation,.Embedded Media Player. Introduction compatible to any of the hardware with the With the wide application of embedded systems in architecture specific changes into it, Linux has consumer electronics, industrial control, aerospace, become popular making the embedded system market automotive electronics, health care, network more competitive. communications and other fields ,embedded system include executable programs and scripts. An has been familiar to people all walks of life, operating system provides applications with a embedded systems have been into people's lives, it is platform where they can run, managing their access changing people's production and lifestyles in a to the CPU and system memory. The user's operation variety of forms.
    [Show full text]
  • Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times²§
    Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times²§ Rafael H. Saavedra³ Alan Jay Smith³³ ABSTRACT In previous research, we have developed and presented a model for measuring machines and analyzing programs, and for accurately predicting the running time of any analyzed program on any measured machine. That work is extended here by: (a) developing a high level program to measure the design and performance of the cache and TLB for any machine; (b) using those measurements, along with published miss ratio data, to improve the accuracy of our run time predictions; (c) using our analysis tools and measurements to study and compare the design of several machines, with par- ticular reference to their cache and TLB performance. As part of this work, we describe the design and performance of the cache and TLB for ten machines. The work presented in this paper extends a powerful technique for the evaluation and analysis of both computer systems and their workloads; this methodology is valuable both to com- puter users and computer system designers. 1. Introduction The performance of a computer system is a function of the speed of the individual functional units, such as the integer, branch, and floating-point units, caches, bus, memory system, and I/O units, and of the workload presented to the system. In our previous research [Saav89, 90, 92b, 92c], described below, we have measured the performance of the parts of the CPU on corresponding portions of various work- loads, but this work has not explicitly considered the behavior and performance of the cache memory.
    [Show full text]
  • What Every Programmer Should Know About Memory
    What Every Programmer Should Know About Memory Ulrich Drepper Red Hat, Inc. [email protected] November 21, 2007 Abstract As CPU cores become both faster and more numerous, the limiting factor for most programs is now, and will be for some time, memory access. Hardware designers have come up with ever more sophisticated memory handling and acceleration techniques–such as CPU caches–but these cannot work optimally without some help from the programmer. Unfortunately, neither the structure nor the cost of using the memory subsystem of a computer or the caches on CPUs is well understood by most programmers. This paper explains the structure of memory subsys- tems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them. 1 Introduction day these changes mainly come in the following forms: In the early days computers were much simpler. The var- • RAM hardware design (speed and parallelism). ious components of a system, such as the CPU, memory, mass storage, and network interfaces, were developed to- • Memory controller designs. gether and, as a result, were quite balanced in their per- • CPU caches. formance. For example, the memory and network inter- faces were not (much) faster than the CPU at providing • Direct memory access (DMA) for devices. data. This situation changed once the basic structure of com- For the most part, this document will deal with CPU puters stabilized and hardware developers concentrated caches and some effects of memory controller design. on optimizing individual subsystems. Suddenly the per- In the process of exploring these topics, we will explore formance of some components of the computer fell sig- DMA and bring it into the larger picture.
    [Show full text]
  • Stream Processing Systems Benchmark: Streambench
    Aalto University School of Science Degree Programme in Computer Science and Engineering Yangjun Wang Stream Processing Systems Benchmark: StreamBench Master's Thesis Espoo, May 26, 2016 Supervisors: Assoc. Prof. Aristides Gionis Advisor: D.Sc. Gianmarco De Francisci Morales Aalto University School of Science ABSTRACT OF Degree Programme in Computer Science and Engineering MASTER'S THESIS Author: Yangjun Wang Title: Stream Processing Systems Benchmark: StreamBench Date: May 26, 2016 Pages: 59 Major: Foundations of Advanced Computing Code: SCI3014 Supervisors: Assoc. Prof. Aristides Gionis Advisor: D.Sc. Gianmarco De Francisci Morales Batch processing technologies (Such as MapReduce, Hive, Pig) have matured and been widely used in the industry. These systems solved the issue processing big volumes of data successfully. However, first big amount of data need to be collected and stored in a database or file system. That is very time-consuming. Then it takes time to finish batch processing analysis jobs before get any results. While there are many cases that need analysed results from unbounded sequence of data in seconds or sub-seconds. To satisfy the increasing demand of processing such streaming data, several streaming processing systems are implemented and widely adopted, such as Apache Storm, Apache Spark, IBM InfoSphere Streams, and Apache Flink. They all support online stream processing, high scalability, and tasks monitoring. While how to evaluate stream processing systems before choosing one in production development is an open question. In this thesis, we introduce StreamBench, a benchmark framework to facilitate performance comparisons of stream processing systems. A common API compo- nent and a core set of workloads are defined.
    [Show full text]
  • Digital Technical Journal, Volume 4, Number 3: NVAX-Microprocessor VAX Systems
    Editorial Jane C. Blake, Editor Kathleen M. Stetson, Associate Editor Helen L. Patterson, Associate Editor Circulation Catherine M. Phillips, Administrator Sherry L. Gonzalez Production Terri Autieri, Production Editor Anne S. Katzeff, ppographer Peter R. Woodbury, Illustrator Advisory Board Samuel H. Fuller, Chairman Richard W Beane Richard J. Hollingsworth Alan G. Nemeth Victor A. Vyssotsky Gayn B. Winters The Digital TechnicalJournal is published quarterly by Digital Equipment Corporation, 146 Main Street MLO 1-3/B68,Maynard, Massachusetts 01754-2571. Subscriptions to the Journal are $40.00 for four issues and must be prepaid in U.S. funds. University and college professors and Ph.D. students in the electrical engineering and computer science fields receive complimentary subscriptions upon request. Orders, inquiries, and address changes should be sent to the Digital TechnicalJournal at the published-by address. Inquiries can also be sent electronically to [email protected]. Single copies and back issues are available for $16.00 each from Digital Press of Digital Equipment Corporation, 1 Burlington Woods Drive, Burlington, MA 01830-4597 Digital employees may send subscription orders on the ENET to RDVAX::JOURNAL or by interoffice mail to mailstop ML01-3/B68. Orders should include badge number, site location code, and address. All employees must advise of changes of address. Comments on the content of any paper are welcomed and may be sent to the editor at the published-by or network address. Copyright O 1992 Digital Equipment Corporation. Copying without fee is permitted provided that such copies are made for use in educational institutions by faculty members and are not distributed for commercial advantage.
    [Show full text]
  • COMP520-12C Final Report Nomadfs a Block
    COMP520-12C Final Report NomadFS A block migrating distributed file system Samuel Weston This report is in partial fulfilment of the requirements for the degree of Bachelor of Computing and Mathematical Sciences with Honours (BCMS(Hons)) at The University of Waikato. ©2012 Samuel Weston Abstract A distributed file system is a file system that is spread across multiple ma- chines. This report describes the block-based distributed file system NomadFS. NomadFS is designed for small scale distributed settings, such as those that exist in computer laboratories and cluster computers. It implements features, such as caching and block migration, which are aimed at improving the perfor- mance of shared data in such a setting. This report includes a discussion of the design and implementation of No- madFS, including relevant background. It also includes performance measure- ments, such as scalability. 2 Acknowledgements I would like to thank all the friendly members of the WAND network research group. This especially includes my supervisor Tony McGregor who has provided me with a massive amount of help over the year. Thanks! On a personal level I have enjoyed developing NomadFS and have learnt a great deal as a consequence of this development. This learning includes improv- ing my C programming ability, both in user space and kernel space (initially NomadFS was planned to be developed as a kernel space file system). I have also learnt a large amount about file systems and operating systems in general. 3 nomad /'n@Umæd/ noun member of tribe roaming from place to place for pasture; 4 Contents 1 Introduction 11 2 Background 13 2.1 A file system overview .
    [Show full text]
  • Performance and Cost Analysis Between On-Demand and Preemptive Virtual Machines
    Performance and Cost Analysis Between On-Demand and Preemptive Virtual Machines Breno G. S. Costa, Marco Antonio Sousa Reis, Aleteia´ P. F. Araujo´ and Priscila Solis Department of Computer Science, University of Brasilia, Bras´ılia/DF, Brazil Keywords: Cloud Computing, Transient Server, Performance Analysis, Preemptive Machines. Abstract: A few years ago, Amazon Web Services introduced spot instances, transient servers that can be contracted at a significant discount over regular price, but whose availability depends on cloud provider criteria and the instance can be revoked at any time. Google Cloud Platform offers preemptive instances, transient servers that have similar behavior and discount level to spot instances. Both providers advertise that their transient servers have the same performance level as servers contracted on-demand. Even with the possibility of revocation at the provider’s discretion, some applications can benefit from the low prices charged by these servers. But the measured performance of both models, transient and on-demand, must be similar, and the applications must survive occasional or mass server revoking. This work compares the performance and costs of transient and on-demand servers from both providers. Results show there is no significant difference in performance measured, but there is real cost advantage using transient servers. On Amazon Web Services a MapReduce cluster composed of transient servers achieved a 68% discount when compared to the same cluster based on on-demand servers. On Google Cloud Platform, the discount achieved was 26% but it can be bigger when the clusters are larger. 1 INTRODUCTION form (GCP) transient servers, named preemptive in- stances (Google, 2017), have a fixed discount of about Cloud providers have introduced a new class of 80%, a maximum lifetime of 24 hours (with the possi- servers, called transient servers, which they can uni- bility of preemption within lifetime) and with an alert laterally revoke at any time (Singh et al., 2014).
    [Show full text]
  • Minimizing Lookup Rpcs in Lustre File System Using Metadata Delegation at Client Side
    Minimizing Lookup RPCs in Lustre File System using Metadata Delegation at Client Side Vilobh Meshram, Xiangyong Ouyang and Dhabaleswar K. Panda Department of Computer Science and Engineering The Ohio State University {meshram, ouyangx, panda}@cse.ohio-state.edu Abstract 75% of all the file system calls access metadata [18]. Therefore, the efficient management of metadata is Lustre is a massively Parallel Distributed File Sys- crucial for the overall system performance. Along tem and its architecture scales well to large amount with this it will be very beneficial if we can minimize of data. However the performance of Lustre can be the time spent in communication between the inter- limited by the load of metadata operations at the acting Client and Server nodes in case of file system Metadata Server (MDS). Because of the higher ca- calls which access metadata. pacity of parallel file systems, they are often used to Lustre is a POSIX compliant, open-source dis- store and access millions of small files. These small tributed parallel filesystem.Due to the extremely scal- files may create a meta-data bottleneck, especially able architecture of the Lustre filesystem, Lustre de- for file systems that have only a single active meta- ployments are popular in scientific supercomputing, data server. Also,in case of Lustre installations with as well as in the oil and gas, manufacturing, rich me- Single Metadata Server or with Clustered Metadata dia, and finance sectors. Lustre presents a POSIX Server the time spent in path traversal from each interface to its clients with parallel access capabili- client involves multiple LOOKUP Remote Procedure ties to the shared file objects.
    [Show full text]
  • Performance Analysis of Scientific Computing Workloads on Trusted
    Performance Analysis of Scientific Computing Workloads on Trusted Execution Environments Ayaz Akram Anna Giannakou Venkatesh Akella Jason Lowe-Power Sean Peisert UC Davis LBNL UC Davis UC Davis LBNL & UC Davis [email protected] [email protected] [email protected] [email protected] [email protected] Abstract—Scientific computing sometimes involves computa- Physical memory Zone of trust tion on sensitive data. Depending on the data and the execution Encrypted Unsecured Data Open provider data environment, the HPC (high-performance computing) user or data data data provider may require confidentiality and/or integrity guar- Runtime Operating system Untrusted antees. To study the applicability of hardware-based trusted exe- or OS or hypervisor Trusted cution environments (TEEs) to enable secure scientific computing, Secure Normal Normal Data Other app app app scientist users we deeply analyze the performance impact of AMD SEV and Intel Untrusted Compute provider SGX for diverse HPC benchmarks including traditional scientific TEE computing, machine learning, graph analytics, and emerging Fig. 1: Trusted execution environments and threat model scientific computing workloads. We observe three main findings: 1) SEV requires careful memory placement on large scale NUMA machines (1×–3.4× slowdown without and 1×–1.15× slowdown with NUMA aware placement), 2) virtualization—a prerequisite the world, trust required by data providers of the compute for SEV—results in performance degradation for workloads with facility, and liability to an organization for hosting such data irregular memory accesses and large working sets (1×–4× slow- are both very high. Moreover, traditionally, the environments down compared to native execution for graph applications) and 3) created to protect sensitive data have significant usability SGX is inappropriate for HPC given its limited secure memory size and inflexible programming model (1.2×–126× slowdown challenges.
    [Show full text]
  • Performance and Evaluation of Lisp Systems
    Performance and Evaluation of Lisp Systems Richard P. Gabriel Originally Published by The MIT Press in their Computer Systems Series (ISBN 0-262-07093-6) Preface The distance is commonly very great between actual performances and speculative possibility, It is natural to suppose that as much as has been done today may be done tomorrow: but on the morrow some difficulty emerges, or some external impediment obstructs. Indolence, interrup- tion, business, and pleasure, all take their turns of retardation; and every long work is lengthened by a thousand causes that can, and ten thousand that cannot, be recounted. Perhaps no extensive and multifarious per- formance was ever effected within the term originally fixed in the under- taker’s mind. He that runs against Time has an antagonist not subject to casualties. Samuel Johnson (Gibbon’s Miscellaneous Works) When I ran across this quote, I was at first jubilant to have found something profound about performance written by Samuel Johnson which I could use as a centerpiece for the preface to this book. But as I read I saw that he was talking much too specifically about human performance to be an appropriate general statement about performance—a statement that could be applied to the performance of a computer program. It took me a few days to see that the point Johnson made addressed the very center of what should be learned about the performance of Lisp systems by anyone who cares to study the material I’ve presented in this book. That point is that people work very hard to attain every microsecond
    [Show full text]