Run-Time Dynamically-Adaptable FPGA-Based Architecture for High-Performance Autonomous Distributed Systems

Total Page:16

File Type:pdf, Size:1020Kb

Run-Time Dynamically-Adaptable FPGA-Based Architecture for High-Performance Autonomous Distributed Systems Departamento de Automática, Ingeniería Eléctrica y Electrónica e Informática Industrial Escuela Técnica Superior de Ingenieros Industriales Run-Time Dynamically-Adaptable FPGA-Based Architecture for High-Performance Autonomous Distributed Systems Autor: Juan Valverde Alcalá Ingeniero Industrial por la Universidad Politécnica de Madrid Directores: Jorge Portilla Berrueco Doctor por la Universidad Politécnica de Madrid en Ingeniería Electrónica Eduardo de la Torre Arnanz Doctor Ingeniero Industrial por la Universidad Politécnica de Madrid 2015 Tribunal Tribunal nombrado por el Excmo. y Magfco. Sr. Rector de la Universidad Politécnica de Madrid, el día de de 2015. Presidente: Vocales: Secretario: Suplentes: Realizado el acto de lectura y defensa de la Tesis Doctoral el día de de 2015 en la Escuela Técnica Superior de Ingenieros Industriales de la Universidad Politécnica de Madrid. Calificación: EL PRESIDENTE LOS VOCALES EL SECRETARIO Dedicado a Agradecimientos Resumen en castellano Esta tesis doctoral se enmarca dentro del campo de los sistemas embebidos reconfigurables, redes de sensores inalámbricas para aplicaciones de altas prestaciones, y computación distribuida. El documento se centra en el estudio de alternativas de procesamiento para sistemas embebidos autónomos distribuidos de altas prestaciones (por sus siglas en inglés, High- Performance Autonomous Distributed Systems (HPADS)), así como su evolución hacia el procesamiento de alta resolución. El estudio se ha llevado a cabo tanto a nivel de plataforma como a nivel de las arquitecturas de procesamiento dentro de la plataforma con el objetivo de optimizar aspectos tan relevantes como la eficiencia energética, la capacidad de cómputo y la tolerancia a fallos del sistema. Los HPADS son sistemas realimentados, normalmente formados por elementos distribuidos conectados o no en red, con cierta capacidad de adaptación, y con inteligencia suficiente para llevar a cabo labores de prognosis y/o autoevaluación. Esta clase de sistemas suele formar parte de sistemas más complejos llamados sistemas ciber- físicos (por sus siglas en inglés, Cyber-Physical Systems (CPSs)). Los CPSs cubren un espectro enorme de aplicaciones, yendo desde aplicaciones médicas, fabricación, o aplicaciones aeroespaciales, entre otras muchas. Para el diseño de este tipo de sistemas, aspectos tales como la confiabilidad, la definición de modelos de computación, o el uso de metodologías y/o herramientas que faciliten el incremento de la escalabilidad y de la gestión de la complejidad, son fundamentales. La primera parte de esta tesis doctoral se centra en el estudio de aquellas plataformas existentes en el estado del arte que por sus características pueden ser aplicables en el campo de los CPSs, así como en la propuesta de un nuevo diseño de plataforma de altas prestaciones que se ajuste mejor a los nuevos y más exigentes requisitos de las nuevas aplicaciones. Esta primera parte incluye descripción, implementación y validación de la plataforma propuesta, así como conclusiones sobre su usabilidad y sus limitaciones. Los principales objetivos para el diseño de la plataforma propuesta se enumeran a continuación: Estudiar la viabilidad del uso de una FPGA basada en RAM como principal procesador de la plataforma en cuanto a consumo energético y capacidad de cómputo. Propuesta de técnicas de gestión del consumo de energía en cada etapa del perfil de trabajo de la plataforma. Propuestas para la inclusión de reconfiguración dinámica y parcial de la FPGA (por sus siglas en inglés, Dynamic Partial Reconfiguration (DPR)) de forma que sea posible cambiar ciertas partes del sistema en tiempo de ejecución y sin necesidad de interrumpir al resto de las partes. Evaluar su aplicabilidad en el caso de HPADS. Las nuevas aplicaciones y nuevos escenarios a los que se enfrentan los CPSs, imponen nuevos requisitos en cuanto al ancho de banda necesario para el procesamiento de los datos, así como en la adquisición y comunicación de los mismos, además de un claro incremento en la complejidad de los algoritmos empleados. Para poder cumplir con estos nuevos requisitos, las plataformas están migrando desde sistemas tradicionales uni-procesador de 8 bits, a sistemas híbridos hardware-software que incluyen varios procesadores, o varios procesadores y lógica programable. Entre estas nuevas arquitecturas, las FPGAs y los sistemas en chip (por sus siglas en inglés, System on Chip (SoC)) que incluyen procesadores embebidos y lógica programable, proporcionan soluciones con muy buenos resultados en cuanto a consumo energético, precio, capacidad de cómputo y flexibilidad. Estos buenos resultados son aún mejores cuando las aplicaciones tienen altos requisitos de cómputo y cuando las condiciones de trabajo son muy susceptibles de cambiar en tiempo real. La plataforma propuesta en esta tesis doctoral se ha denominado HiReCookie. La arquitectura incluye una FPGA basada en RAM como único procesador, así como un diseño compatible con la plataforma para redes de sensores inalámbricas desarrollada en el Centro de Electrónica Industrial de la Universidad Politécnica de Madrid (CEI- UPM) conocida como Cookies. Esta FPGA, modelo Spartan-6 LX150, era, en el momento de inicio de este trabajo, la mejor opción en cuanto a consumo y cantidad de recursos integrados, cuando además, permite el uso de reconfiguración dinámica y parcial. Es importante resaltar que aunque los valores de consumo son los mínimos para esta familia de componentes, la potencia instantánea consumida sigue siendo muy alta para aquellos sistemas que han de trabajar distribuidos, de forma autónoma, y en la mayoría de los casos alimentados por baterías. Por esta razón, es necesario incluir en el diseño estrategias de ahorro energético para incrementar la usabilidad y el tiempo de vida de la plataforma. La primera estrategia implementada consiste en dividir la plataforma en distintas islas de alimentación de forma que sólo aquellos elementos que sean estrictamente necesarios permanecerán alimentados, cuando el resto puede estar completamente apagado. De esta forma es posible combinar distintos modos de operación y así optimizar enormemente el consumo de energía. El hecho de apagar la FPGA para ahora energía durante los periodos de inactividad, supone la pérdida de la configuración, puesto que la memoria de configuración es una memoria volátil. Para reducir el impacto en el consumo y en el tiempo que supone la reconfiguración total de la plataforma una vez encendida, en este trabajo, se incluye una técnica para la compresión del archivo de configuración de la FPGA, de forma que se consiga una reducción del tiempo de configuración y por ende de la energía consumida. Aunque varios de los requisitos de diseño pueden satisfacerse con el diseño de la plataforma HiReCookie, es necesario seguir optimizando diversos parámetros tales como el consumo energético, la tolerancia a fallos y la capacidad de procesamiento. Esto sólo es posible explotando todas las posibilidades ofrecidas por la arquitectura de procesamiento en la FPGA. Por lo tanto, la segunda parte de esta tesis doctoral está centrada en el diseño de una arquitectura reconfigurable denominada ARTICo3 (Arquitectura Reconfigurable para el Tratamiento Inteligente de Cómputo, Confiabilidad y Consumo de energía) para la mejora de estos parámetros por medio de un uso dinámico de recursos. ARTICo3 es una arquitectura de procesamiento para FPGAs basadas en RAM, con comunicación tipo bus, preparada para dar soporte para la gestión dinámica de los recursos internos de la FPGA en tiempo de ejecución gracias a la inclusión de reconfiguración dinámica y parcial. Gracias a esta capacidad de reconfiguración parcial, es posible adaptar los niveles de capacidad de procesamiento, energía consumida o tolerancia a fallos para responder a las demandas de la aplicación, entorno, o métricas internas del dispositivo mediante la adaptación del número de recursos asignados para cada tarea. Durante esta segunda parte de la tesis se detallan el diseño de la arquitectura, su implementación en la plataforma HiReCookie, así como en otra familia de FPGAs, y su validación por medio de diferentes pruebas y demostraciones. Los principales objetivos que se plantean la arquitectura son los siguientes: Proponer una metodología basada en un enfoque multi-hilo, como las propuestas por CUDA (por sus siglas en inglés, Compute Unified Device Architecture) u Open CL, en la cual distintos kernels, o unidades de ejecución, se ejecuten en un numero variable de aceleradores hardware sin necesidad de cambios en el código de aplicación. Proponer un diseño y proporcionar una arquitectura en la que las condiciones de trabajo cambien de forma dinámica dependiendo bien de parámetros externos o bien de parámetros que indiquen el estado de la plataforma. Estos cambios en el punto de trabajo de la arquitectura serán posibles gracias a la reconfiguración dinámica y parcial de aceleradores hardware en tiempo real. Explotar las posibilidades de procesamiento concurrente, incluso en una arquitectura basada en bus, por medio de la optimización de las transacciones en ráfaga de datos hacia los aceleradores. Aprovechar las ventajas ofrecidas por la aceleración lograda por módulos puramente hardware para conseguir una mejor eficiencia energética. Ser capaces de cambiar los niveles de redundancia de hardware de forma dinámica según las necesidades del sistema en tiempo real y sin cambios para el código de aplicación. Proponer una capa de abstracción entre el código de aplicación y el uso dinámico
Recommended publications
  • AVR32 EVK1105 Evaluation Kit
    Your Electronic Engineering Resource ATMEL - ATEVK1105 - AVR32 EVK1105 Evaluation Kit Product Overview: The AVR32 EVK1105 is an evaluation kit for the AT32UC3A3256 which combines Atmel’s state of art AVR32 microcontroller with an unrivalled selection of communication interface like USB device including On-The-Go functionality, SDcard, NAND flash with ECC and stereo 16-bit DAC. The AVR32 EVK1105 is an evaluation kit for the AT32UC3A0512 which demonstrates Atmel’s state-of-the-art AVR32 microcontroller in Hi-Fi audio decoding and streaming applications. Kit Contents: The kit contains reference hardware and software for generic MP3 player docking stations. Key Features: High Performance, Low Power AVR®32 UC 32-Bit Microcontroller Multi-Layer Bus System Internal High-Speed Flash Internal High-Speed SRAM Interrupt Controller Power and Clock Manager Including Internal RC Clock and One 32KHz Oscillator Two Multipurpose Oscillators and Two Phase-Lock-Loop (PLL), Watchdog Timer, Real-Time Clock Timer External Memories MultiMediaCard (MMC), Secure-Digital (SD), SDIO V1.1 CE-ATA, FastSD, SmartMedia, Compact Flash Memory Stick: Standard Format V1.40, PRO Format V1.00, Micro IDE Interface One Advanced Encryption System (AES) for AT32UC3A3256S, AT32UC3A3128S and AT32UC3A364S Universal Serial Bus (USB) One 8-channel 10-bit Analog-To-Digital Converter, multiplexed with Digital IOs. Legal Disclaimer: The content of the pages of this website is for your general information and use only. It is subject to change without notice. From time to time, this website may also include links to other websites. These links are provided for your convenience to provide further information. They do not signify that we endorse the website(s).
    [Show full text]
  • Schedule 14A Employee Slides Supertex Sunnyvale
    UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 SCHEDULE 14A Proxy Statement Pursuant to Section 14(a) of the Securities Exchange Act of 1934 Filed by the Registrant Filed by a Party other than the Registrant Check the appropriate box: Preliminary Proxy Statement Confidential, for Use of the Commission Only (as permitted by Rule 14a-6(e)(2)) Definitive Proxy Statement Definitive Additional Materials Soliciting Material Pursuant to §240.14a-12 Supertex, Inc. (Name of Registrant as Specified In Its Charter) Microchip Technology Incorporated (Name of Person(s) Filing Proxy Statement, if other than the Registrant) Payment of Filing Fee (Check the appropriate box): No fee required. Fee computed on table below per Exchange Act Rules 14a-6(i)(1) and 0-11. (1) Title of each class of securities to which transaction applies: (2) Aggregate number of securities to which transaction applies: (3) Per unit price or other underlying value of transaction computed pursuant to Exchange Act Rule 0-11 (set forth the amount on which the filing fee is calculated and state how it was determined): (4) Proposed maximum aggregate value of transaction: (5) Total fee paid: Fee paid previously with preliminary materials. Check box if any part of the fee is offset as provided by Exchange Act Rule 0-11(a)(2) and identify the filing for which the offsetting fee was paid previously. Identify the previous filing by registration statement number, or the Form or Schedule and the date of its filing. (1) Amount Previously Paid: (2) Form, Schedule or Registration Statement No.: (3) Filing Party: (4) Date Filed: Filed by Microchip Technology Incorporated Pursuant to Rule 14a-12 of the Securities Exchange Act of 1934 Subject Company: Supertex, Inc.
    [Show full text]
  • Buses – Global Market Trends
    2017 BUSES – GLOBAL MARKET TRENDS Markets – Competition – Companies – Key Figures Extract from the study BUSES – GLOBAL MARKET TRENDS Markets – Competition – Companies – Key figures In all regions across the globe, buses remain the most widespread public transport mode. Their demand goes hand in hand with several, mostly region-specific factors, including demographics, increasing mobility of people and environmental awareness, as well as public funding. Buses are comparatively to other transportation modes cheap and easy to use, since their use does not necessarily require the implementation of a specific infrastructure. This makes buses ideal vehicles for both short- and long-distance services. Based on the current developments, this Multi Client Study offers a comprehensive insight into the structure, volumes and development trends of the worldwide bus market. In concrete terms, the market study “BUSES – GLOBAL MARKET TRENDS” includes: A look at the worldwide market for buses differentiated by region An analysis of the relevant market data including present and future market volumes Information concerning the installed fleet and future procurement potential until 2022 An assessment of current developments and growth drivers of the worldwide bus markets in the individual regions An overview of bus manufacturers including an analysis of the market shares, financial backups as well as a brief description of the current product portfolio and strategy outlook A list of the major production facilities in each of the regions including product range as well as production capacities Presentation of the development stage of alternative propulsions, their manufacturers and their occurrence worldwide The study is available in English from the August 2017 at the price of EUR 3,400 plus VAT.
    [Show full text]
  • Cross-Compiling Linux Kernels on X86 64: a Tutorial on How to Get Started
    Cross-compiling Linux Kernels on x86_64: A tutorial on How to Get Started Shuah Khan Senior Linux Kernel Developer – Open Source Group Samsung Research America (Silicon Valley) [email protected] Agenda ● Cross-compile value proposition ● Preparing the system for cross-compiler installation ● Cross-compiler installation steps ● Demo – install arm and arm64 ● Compiling on architectures ● Demo – compile arm and arm64 ● Automating cross-compile testing ● Upstream cross-compile testing activity ● References and Package repositories ● Q&A Cross-compile value proposition ● 30+ architectures supported (several sub-archs) ● Native compile testing requires wide range of test systems – not practical ● Ability to cross-compile non-natively on an widely available architecture helps detect compile errors ● Coupled with emulation environments (e.g: qemu) testing on non-native architectures becomes easier ● Setting up cross-compile environment is the first and necessary step arch/ alpha frv arc microblaze h8300 s390 um arm mips hexagon score x86_64 arm64 mn10300 unicore32 ia64 sh xtensa avr32 openrisc x86 m32r sparc blackfin parisc m68k tile c6x powerpc metag cris Cross-compiler packages ● Ubuntu arm packages (12.10 or later) – gcc-arm-linux-gnueabi – gcc-arm-linux-gnueabihf ● Ubuntu arm64 packages (13.04 or later) – use arm64 repo for older Ubuntu releases. – gcc-4.7-aarch64-linux-gnu ● Ubuntu keeps adding support for compilers. Search Ubuntu repository for packages. Cross-compiler packages ● Embedded Debian Project is a good resource for alpha, mips,
    [Show full text]
  • Introduction to AVR®
    Introduction to AVR® By BiPOM Electronics, Inc. Revision 1.02 © 2011 BiPOM Electronics, Inc. All rights reserved. All trademark names in this document are the property of their respective owners. AVR® History · The AVR® architecture was conceived by two students at the Norwegian Institute of Technology. · The original AVR® was known as μRISC (Micro RISC). · Among the first of the AVR® line was the AT90S8515, which in a 40-pin DIP package has the same pinout as an 8051 microcontroller. · The creators of the AVR® give no definitive answer as to what the term "AVR" stands for. AVR® Features · Some 8-bit, some 32-bit · TinyAVR, megaAVR, XMEGA, FPSLIC · Harvard Architecture for 8-bit devices: Separate code and data space · Flash, EEPROM and SRAM are all on a single chip, eliminating the need for external memory. · All code executed by the AVR® core must reside in the on-chip flash. · Most instructions take just one or two clock cycles. · The AVR family of processors were designed with the efficient execution of compiled C code in mind. Why AVR® ? · The AVR® instruction set is more powerful than PIC or 8051. · The AVR® runs instructions very fast (can execute 1 instruction in 1 machine clock cycle) · AVR® is a good choice for industrial projects. Frequently Asked Questions Can AVR® run an OS? - Yes, AVR32 can run Linux core 2.6.XX with BusyBox What programming languages are for programming AVR® microcontroller? - There are many different languages but most commonly used are C and BASCOM BASIC. Does BiPOM offer AVR® design services ? – Yes,
    [Show full text]
  • GNU Toolchain for Atmel AVR 32-Bit Embedded Processors
    RELEASE NOTES GNU Toolchain for Atmel AVR 32-bit Embedded Processors Introduction The Atmel AVR® 32-bit GNU Toolchain (3.4.3.820) supports all AVR 32-bit devices. The AVR 32-bit Toolchain is based on the free and open-source GCC compiler. The toolchain includes compiler, assembler, linker, and binutils (GCC and Binutils), Standard C library (Newlib). 32215A-MCU-08/2015 Table of Contents Introduction .................................................................................... 1 1. Installation Instructions .......................................................... 3 1.1. System Requirements ............................................................ 3 1.1.1. Hardware Requirements ............................................. 3 1.1.2. Software Requirements .............................................. 3 1.2. Downloading, Installing, and Upgrading ..................................... 3 1.2.1. Downloading/Installing on Windows .............................. 3 1.2.2. Downloading/Installing on Linux ................................... 3 1.2.3. Upgrading from previous versions ................................ 3 1.2.4. Manifest .................................................................. 3 1.3. Layout ................................................................................. 4 2. Toolset Background ................................................................ 5 2.1. Compiler .............................................................................. 5 2.2. Assembler, Linker, and Librarian .............................................
    [Show full text]
  • Atmel AVR Microcontrollers for Automotive
    Atmel AVR Microcontrollers for Automotive +150°C Qualified Innovative Atmel AVR Several AVR microcontrollers are qualified for operation up to +150°C ambient Microcontroller Solutions for temperature (AEC-Q100 Grade0). These devices enable designers to distribute Automotive intelligence and control functions directly into or near, transfer cases, engine sensors Increasing consumer demands for comfort, safety and reduced actuators, turbochargers and exhaust fuel consumption are driving rapid growth in the market for systems. automotive electronics. All the new functions designed to meet Automotive AVR microcontrollers available these demands require local intelligence and control, which can as Grade0 versions are the Atmel ATtiny45, be optimized by the use of small, powerful microcontrollers. ATtiny87/167, ATtiny261/461/861, ATmega88/168, ATmega16M1, Atmel® leverages its unsurpassed experience in embedded ATmega32M1, ATmega32C1, Flash memory microcontrollers to bring innovative solutions ATmega64M1, and ATmega64C1. for automotive applications. These include a wide range of Atmel AVR® 8- and 32-bit microcontrollers for everything from Automotive AVR microcontrollers are sensor or actuator control to more sophisticated networking available in two different temperature ranges applications. Atmel microcontrollers are fully engineered to to serve various applications: fulfill the quality requirements of OEMs in their drive towards AEC-Q100 Grade1 Z: -40°C to +125°C zero defects. AEC-Q100 Grade0 D, T2: -40°C to +150°C Typical Applications Atmel
    [Show full text]
  • EVC-State-Of-Evs-2020-Report.Pdf
    STATE OF ELECTRIC VEHICLES AUGUST 2020 electricvehiclecouncil.com.au STATE OF ELECTRIC VEHICLES AUGUST 2020 electricvehiclecouncil.com.au 06 2020 HIGHLIGHTS 09 EXECUTIVE SUMMARY CONTENTS 13 CHAPTER 1: MARKET UPDATE 13 Electric vehicle sales 14 COVID-19 15 Consumer attitudes survey 22 Case study: Lane Cove development building now for the electrified future 24 CHAPTER 2: AUSTRALIA'S EV INDUSTRY 24 Passenger vehicles 28 Bikes and scooter 29 Commercial vehicles and buses 36 Case study: East waste makes haste, leading the way on electric garbage 38 CHAPTER 3: CHARGING INFRASTRUCTURE 38 Public charging 42 Home and workplace charging 44 Case study: Evie Networks, the Aussie company building a nation-wide ultra-fast charging network 46 CHAPTER 4: EV MINING AND MANUFACTURING 46 Battery value chain 51 Charger manufacturing 52 Electric vehicle manufacturing 58 Case study: Voltra and BHP break new ground underground CHAPTER 5: EVS, THE ENVIRONMENT, 60 AND THE ENERGY GRID 60 Emissions impact 61 Battery stewardship 64 Built environment 67 Integration with the grid 72 Case study: Realising electric vehicle-to-grid services CHAPTER 2: AUSTRALIA'S EV INDUSTRY 74 CHAPTER 6: EV POLICY 75 Policy progress and scorecard 77 Policy highlights 80 APPENDIX 80 Appendix 1: Carmakers' commitments to electric vehicles 84 Appendix 2: Electric vehicle model availability 90 Appendix 3: Carmakers' investments into secondary battery applications 90 Appendix 4: Bus partnerships in Australia 93 REFERENCES 6 ELECTRIC VEHICLE COUNCIL 2020 highlights In 2019, EV sales increased
    [Show full text]
  • AVR 32-Bit GNU Toolchain: Release 3.4.2.435
    AVR 32-bit GNU Toolchain: Release 3.4.2.435 The AVR 32-bit GNU Toolchain supports all AVR 32-bit devices. The AVR 32- bit Toolchain is based on the free and open-source GCC compiler. The toolchain 8/32-bits Atmel includes compiler, assembler, linker and binutils (GCC and Binutils), Standard C library (Newlib). Microcontrollers About this release Release 3.4.2.435 This is an update release that fixes some defects and upgrades GCC and Binutils to higher versions. Installation Instructions System Requirements AVR 32-bit GNU Toolchain is supported under the following configurations Hardware requirements • Minimum processor Pentium 4, 1GHz • Minimum 512 MB RAM • Minimum 500 MB free disk space AVR 32-bit GNU Toolchain has not been tested on computers with less resources, but may run satisfactorily depending on the number and size of projects and the user's patience. Software requirements • Windows 2000, Windows XP, Windows Vista or Windows 7 (x86 or x86-64). • Fedora 13 or 12 (x86 or x86-64), RedHat Enterprise Linux 4/5/6, Ubuntu Linux 10.04 or 8.04 (x86 or x86-64), or SUSE Linux 11.2 or 11.1 (x86 or x86-64). AVR 32-bit GNU Toolchain may very well work on other distributions. However those would be untested and unsupported. AVR 32-bit GNU Toolchain is not supported on Windows 98, NT or ME. Downloading and Installing The package comes in two forms. • As part of a standalone installer • As Atmel Studio 6.x Toolchain Extension It may be downloaded from Atmel's website at http://www.atmel.com or from the Atmel Studio Extension Gallery.
    [Show full text]
  • Atmel Studio
    Software Atmel Studio USER GUIDE Preface Atmel® Studio is an Integrated Development Environment (IDE) for writing and debugging AVR®/ARM® applications in Windows® XP/Windows Vista®/ Windows 7/8 environments. Atmel Studio provides a project management tool, source file editor, simulator, assembler, and front-end for C/C++, programming, and on-chip debugging. Atmel Studio supports the complete range of Atmel AVR tools. Each new release contains the latest updates for the tools as well as support for new AVR/ARM devices. Atmel Studio has a modular architecture, which allows interaction with 3rd party software vendors. GUI plugins and other modules can be written and hooked to the system. Contact Atmel for more information. Atmel-42167B-Atmel-Studio_User Guide-09/2016 Table of Contents Preface............................................................................................................................ 1 1. Introduction................................................................................................................8 1.1. Features....................................................................................................................................... 8 1.2. New and Noteworthy.................................................................................................................... 8 1.2.1. Atmel Studio 7.0............................................................................................................ 8 1.2.2. Atmel Studio 6.2 Service Pack 2..................................................................................11
    [Show full text]
  • Embedded Processor Selection/Performance Estimation Using FPGA-Based Profiling
    Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2010 Embedded Processor Selection/Performance Estimation using FPGA-based Profiling Fadi Obeidat Virginia Commonwealth University Follow this and additional works at: https://scholarscompass.vcu.edu/etd Part of the Engineering Commons © The Author Downloaded from https://scholarscompass.vcu.edu/etd/2232 This Dissertation is brought to you for free and open access by the Graduate School at VCU Scholars Compass. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of VCU Scholars Compass. For more information, please contact [email protected]. School of Engineering Virginia Commonwealth University This is to certify that the dissertation prepared by Fadi Obeidat entitled EMBEDDED PROCESSOR SELECTION/PERFORMANCE ESTIMATION USING FPGA-BASED PROFILING has been approved by his committee as satisfactory completion of the dissertation requirement for the degree of Doctor of Philosophy Robert Klenke, Ph.D., Committee Chair, Department of Electrical and Computer Engineering, School of Engineering Mike McCollum, Ph.D., Dept. Department of Electrical and Computer Engineering, School of Engineering Afroditi V Filippas, Ph.D., Department of Electrical and Computer Engineering, School of Engineering Kayvan Najarian, Ph.D., Department of Computer Science, School of Engineering Vojislav Kecman, Ph.D., Department of Computer Science, School of Engineering Date i EMBEDDED PROCESSOR SELECTION/PERFORMANCE ESTIMATION USING
    [Show full text]
  • Vincent M. Weaver Sally A. Mckee
    Optimizing for Size: Exploring the Limits of Code Density Sally A. McKee Vincent M. Weaver ASPLOS XIV Poster Session, 8 March 2009 Cornell University Chalmers University of Technology Abstract Hand-Optimized Assembly Results Architectural Size Correlations 0.9020 Minimum possible instruction size Reductions in instruction count can improve cache and bandwidth utiliza- 0.8652 Number of integer registers tion, lower power consumption, and increase overall performance. Nonethe- Total Size of Executable VLIW 2560 RISC less, code density is often overlooked when studying processor architectures. 2048 CISC 0.6623 Architecture has a zero register embedded 1536 8/16-bit 0.5845 Bit-width bytes 1024 We hand-optimize an embedded benchmark for size in assembly language -0.4366 Hardware divide in ALU on 20 different instruction set architectures and investigate the architectural 512 features that contribute most heavily to code density. 0 0.4385 Number of operands in each instruction arm ppc vax sh3 z80 ia64 6502 mips s390 i386 alphaparisc sparc m88k m68k thumb avr32 -0.3356 Unaligned load/store available x86_64 crisv32pdp-11 0.2773 Year the architecture was introduced 256 Size of LZSS Decompression Code VLIW -0.2597 Auto-incrementing addressing scheme Background RISC CISC 192 embedded -0.2597 Hardware status flags (zero/overflow/etc.) 128 8/16-bit bytes -0.1252 Little or Big endian 64 The 20 architectures investigated can be broadly broken into 5 categories: -0.0487 Branch delay slot 0 -0.0079 Maximum possible instruction size Instr Length Opcode arm ppc vax z80 sh3 ia64 mips s390 6502 i386 Type Represented Architectures alpha parisc sparc m88k m68kthumb avr32 (bytes) Args pdp-11 x86_64crisv32 VLIW ia64 16/3 3 Size of String-Search Code VLIW Other Considerations alpha, arm, m88k, mips 256 RISC RISC 4 3 CISC pa-risc, ppc, sparc 192 embedded 128 8/16-bit System libraries and compiler overhead can overshadow the effects of size bytes CISC m68k, s390, vax, x86, x86 64 1-54 2 64 optimization, increasing memory footprint by several orders of magnitude.
    [Show full text]