Design and Development MIPS Processor Based on a High Performance and Low Power Architecture on FPGA

Total Page:16

File Type:pdf, Size:1020Kb

Design and Development MIPS Processor Based on a High Performance and Low Power Architecture on FPGA I.J.Modern Education and Computer Science, 2013, 5, 49-59 Published Online June 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ ijmecs.2013.05.06 Design and Development MIPS Processor Based on a High Performance and Low Power Architecture on FPGA Tina Daghooghi Shahid Chamran University, Ahvaz, Iran E-ma il: [email protected] Abstract— This paper presents the design and architecture must contain all of this state. Based on the development of a high performance and low power current architectural state, the processor executes a MIPS microprocessor and implementation on FPGA. In particular instruction with a particular set of data to this method we for achieving high performance and low produce a new architectural state. Some power in the operation of the proposed microprocessor microarchitectures contain additional nonarchitectural use different methods including, unfolding state to either simplify the logic or improve performance transformation (parallel processing), C-slow retiming [2]. The IBM System z10™ microprocessor is currently technique, and double edge registers are used to get even the fastest running 64-bit CISC (co mple x instruction set reduce power consumption. Also others blocks designed computer) microprocessor. This microprocessor operates based on high speed digital circuits. Because of at 4.4 GHz and provides up to two times performance feedback loop in the proposed architecture C-slo w improvement compared with its predecessor, the System retiming can enhance designs that contain feedback z9® microprocessor. In addition to its ultrahigh- loops. The C-slow retiming is well-known for frequency pipeline, the z10™ microprocessor offers optimization and high performance technique, it such performance enhancements as a sophisticated automatically rebalances the registers in the proposed branch-prediction structure, a large second-level private design. The proposed high performance microprocessor cache, a data-prefetch engine, and a hardwired decimal is modeled and verified using FPGA and simulation floating-point arithmetic unit. The z10 microprocessor results. The proposed methods in microprocessor have also implements new architectural features that allow been successfully synthesized and implemented in better software optimization across compiled Quartus II 9.1 and Stratix II FPGA, to target device applications. These features include new instructions EP2S15F484C3, and power is analyzed with Xpower that help shorten the code path lengths and new facilities analyzer. Results demonstrate that the proposed method for software-directed cache management and the use of has high performance. 1-M B v irtual pages [3]. In recent years, a number of researches have been Index Terms— High Performance, Lo w Power, FPGA , proposed for Implementation of microprocessor by using double edge register, Combinational Logic. the FPGA by some companies [4-11]. In continuing we present some researches, The Aviation Microelectronic Center of NPU (Northwestern Polytechnical University) I. INTRODUCTION has recently completed the development of a 32-bit super-scalar RISC microprocessor, called "Longtium" Microcontrollers and microprocessors are used in R2. In [4], the architecture of "Longtium" R2 is everyday systems. Basically, any electronic systems that presented. As explained in [5], the PowerPC 603e require computation or instruction execution require a microprocessor is a high performance, low cost, low microcontroller or microprocessor. Therefore, at the core power microprocessor designed for use in portable of electronic systems with computational capability is a computers. The 603e is an enhanced version of the microprocessor. Microprocessors have grown from 8 PowerPC 603 microprocessor and extends the bits to 16 bits, 32 bits, and currently to 64 bits. performance range of the PowerPC microprocessor Microprocessor architecture has also grown from family of portable products. The enhancements include complex instruction set computing (CISC) based on increasing the frequency to 100 MHZ doubling the on- reduced instruction set computing (RISC) based on a chip instruction and data caches to 16K bytes each, combination of RISC-CISC based and currently very increasing the cache associativity to 4-way set- long instruction word (VLIW) based [1]. So me associative, adding an extra integer unit, and increasing microprocessors are optimized for high-performance the throughput of stores and misaligned accesses. Three servers, whereas others are optimized for long battery new bus modes are added to allow for more flexibility in life in laptop computers. A computer architecture is system design. The estimated performance of the 603e at defined by its instruction set and architectural state. The 100 MHz is 120 SPECint92 and 105 SPECfp92. In [6], a architectural state of the MIPS processor consists of the new microprocessor design framework, called DOTTA program counter and the 32 registers. Any MIPS micro Copyright © 2013 M ECS I.J. Modern Education and Computer Science, 2013, 5, 49-59 50 Design and Development MIPS Processor Based on a High Performance and Low Power Architecture on FPGA (dynamic operation transport triggered array) is way on FPGA. We applied several methods to introduced. An FPGA (field programmable gate array) microarchitecture the performances and the implementation of DOTTA is presented. The aim of this effectiveness of the proposed methods of the throughput new processor framework is to eliminate bottlenecks rate and hardware cost for the proposed structure are introduced by traditional microprocessor architectures. given. This paper is organized as follows. An overview The framework defines a task-specific microprocessor of the MIPS processor will be given in section II. In that is application customisable on the target system it is section III description of the proposed methods is operating on; this has been achieved by using presented. A comparison of performance, power Xilinxtrade FPGA fabric. In [7], a 8-b it CISC consumption, chip utilization that verifies the microprocessor core development which is intended as performance of the proposed work is focus of section IV. an open core for teaching applications in the digital In section V conclusion of the paper is presented. systems laboratory. The core is fully open and therefore, the user can have access to all internal signals as well as the opportunity to make changes to the structure itself II. MIPS PROCESSOR which is very useful when lecturing microprocessor design. The main advantages of the present core, MIPS is a 32-bit architecture, so its a 32-bit data path. The control unit receives the current instruction from the compared with commercially available equivalent cores, data path and tells the datapath how to execute that are not being vendor sensitive allowing its implementation in almost any FPGA family and being instruction. Specifically, the control unit produces an open core, that can be fully monitored and modified multiplexer select, register enable, and memory writes signals to control the operation of the data path. The to fit specific design constraints. Several tests were program counter is an ordinary 32-bit register. Its output, performed to the microprocessor core, including an ’ embedded microcontroller with RAM, ROM and I/O PC, points to the current instruction. Its input, PC , indicates the address of the next instruction. The capabilities. The present development includes a meta- instruction memory has a single read port. It takes a 32- assembler and linker to embed user programs in a ROM, which is automatically generated as a VHDL description. bit instruction address input, A, and reads the 32-bit data (i.e., instruction) from that address onto the read data In [8], a new methodology based on practical sessions × with real devices and chips is proposed. Simple designs output, RD. The 32-element 32-bit register file has of microprocessors are exposed to the students at the two read ports and one write port. The read ports take 5- bit address inputs, A1 and A2, each specifying one of beginning, raising the complexity gradually toward a 5 final design with a multiprocessor integrated in a single 2 =32 reg isters as source operands. They read the 32-bit FPGA chip. In [9], the implementation and delivery of a register values onto read data outputs RD1 and RD2, microprocessor based on laboratory design is presented respectively. The write port takes a 5-bit address input, design laboratory, in an attempt to achieve tighter A3; a 32-bit write data input, WD; a write enable input, integration with theory and improve student's WE3; and a clock. If the write enable is 1, the register performance. The design process follows a hierarchical file writes the data into the specified register on the structure, requiring students to first build basic devices rising edge of the clock. The data memory has a single such as logic gates, multiplexers, one-bit me mo ry cells read/write port. If the write enable, WE, is 1, it writes etc. These basic devices are then used to build an ALU, data WD into address A on the rising edge of the clock. registers, (registers are used to build larger memories), a If the write enable is 0, it reads address A onto RD. The data path and a control unit. Designs are completed instruction memory, register file, and data memory are without any high level programming ensuring that all read comb inationally. In other words, if the address students cannot rely on the compiler to transform changes, the new data appears at RD after some specifications into imp le mentations. In [10], several case propagation delay; no clock is involved. They are studies that examine the effects of various embedded written only on the rising edge of the clock. In this processor memory strategies and peripheral sets. fashion, the state of the system is changed only at the Comparing the benchmark system to a real-world clock edge. The address, data, and write enable must system, the study examines techniques for optimizing setup sometime before the clock edge and must remain the performance and cost of an FPGA embedded stable until a hold time after the clock edge.
Recommended publications
  • Space Expansion of Feature Selection for Designing More Accurate Error Predictors
    Space Expansion of Feature Selection for Designing more Accurate Error Predictors Shayan Tabatabaei Nikkhah Mehdi Kamal Ali Afzali-Kusha Department of Electrical School of Electrical and Computer School of Electrical and Computer Engineering Engineering Engineering Eindhoven University of University of Tehran University of Tehran Technology Tehran, Iran Tehran, Iran Eindhoven, The Netherlands [email protected] [email protected] [email protected] Massoud Pedram Department of Electrical Engineering University of Southern California Los Angeles, CA, USA [email protected] ABSTRACT outputs. These errors strongly depend on application inputs [6], putting an extra emphasis on quality control in approximate Approximate computing is being considered as a promising computing. One of the proposed solutions for managing the design paradigm to overcome the energy and performance output quality is sampling the output elements and changing the challenges in computationally demanding applications. If the case approximate configuration accordingly to meet the quality where the accuracy can be configured, the quality level versus requirements [3], [7]. Given that output errors are strongly energy efficiency or delay also may be traded-off. For this dependent to the inputs, this approach is highly susceptible to technique to be used, one needs to make sure a satisfactory user missing the output elements with large errors [6]. experience. This requires employing error predictors to detect There is another approach emphasized in recent works (e.g., unacceptable approximation errors. In this work, we propose a [6], [8], [9]) which leverages light-weight quality checkers to scheduling-aware feature selection method which leverages the dynamically manage the output quality.
    [Show full text]
  • Model-Order Reduction Using Variational Balanced Truncation with Spectral Shaping Payam Heydari, Member, IEEE, and Massoud Pedram, Fellow, IEEE
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 4, APRIL 2006 879 Model-Order Reduction Using Variational Balanced Truncation With Spectral Shaping Payam Heydari, Member, IEEE, and Massoud Pedram, Fellow, IEEE Abstract—This paper presents a spectrally weighted balanced the reduced system. Extensions of explicit moment-matching truncation (SBT) technique for tightly coupled integrated circuit techniques have been proposed recently, to reduce the linear interconnects, when the interconnect circuit parameters change as time-varying (LTV) as well as nonlinear dynamic systems [7], a result of statistical variations in the manufacturing process. The salient features of this algorithm are the inclusion of the parameter [8]. variation in the RLCK interconnect, the guaranteed passivity of the An alternative model reduction technique is the balanced reduced transfer function, and the availability of provable spec- realization [9]–[11]. While being widely investigated in con- trally weighted error bounds for the reduced-order system. This trol system theory, balanced realization techniques have not paper shows that the variational balanced truncation technique received the same attention as Krylov-based and Pade-based produces reduced systems that accurately follow the time- and fre- quency-domain responses of the original system when variations in model reduction techniques in the context of interconnect the circuit parameters are taken into consideration. Experimental analysis, partly because these methods involve computation-in- results show that the new variational SBT attains, in average, 30% tensive algorithms [9], [11], [12]. Furthermore, it is well known more accuracy than the variational Krylov-subspace-based model- that the balancing transformation may be poorly conditioned order reduction techniques.
    [Show full text]
  • Design Technologies for Low Power VLSI
    To appear in Encyclopedia of Computer Science and Technology, 1995 Design Technologies for Low Power VLSI Massoud Pedram Department of EE-Systems University of Southern California Los Angeles , CA 90089 Abstract Low power has emerged as a principal theme in today’s electronics indus- try. The need for low power has caused a major paradigm shift where power dis- sipation has become as important a consideration as performance and area. This article reviews various strategies and methodologies for designing low power cir- cuits and systems. It describes the many issues facing designers at architectural, logic, circuit and device levels and presents some of the techniques that have been proposed to overcome these difficulties. The article concludes with the future challenges that must be met to design low power, high performance systems. 1. Motivation In the past, the major concerns of the VLSI designer were area, perfor- mance, cost and reliability; power consideration was mostly of only secondary importance. In recent years, however, this has begun to change and, increasingly, power is being given comparable weight to area and speed considerations. Sev- eral factors have contributed to this trend. Perhaps the primary driving factor has been the remarkable success and growth of the class of personal computing devices (portable desktops, audio- and video-based multimedia products) and wireless communications systems (personal digital assistants and personal com- municators) which demand high-speed computation and complex functionality with low power consumption. In these applications, average power consumption is a critical design con- cern. The projected power budget for a battery-powered, A4 format, portable multimedia terminal, when implemented using off-the-shelf components not opti- 2 Design Technologies for Low Power VLSI mized for low-power operation, is about 40 W.
    [Show full text]
  • Block Level Voltage/Frequency Scheduling for Low Power Consumption Under Timing Constraints
    BLOCK LEVEL VOLTAGE/FREQUENCY SCHEDULING FOR LOW POWER CONSUMPTION UNDER TIMING CONSTRAINTS A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF RESEARCH MASTER in the School of Electronic Engineering by Li-Chuan Weng B.Sc. ECE, National Chiao-Tung University, Taiwan September 2004 ©Li-Chuan Weng 2004 DUBLIN CITY UNIVERSITY, IRELAND All rights reserved. No part of this work shall be reproduced, in any form or by any means without the permission from the author. BLOCK LEVEL VOLTAGE/FREQUENCY SCHEDULING FOR LOW POWER CONSUMPTION UNDER TIMING CONSTRAINTS Dissertation of Research Master Degree School of Electronic Engineering Dublin City University Ireland by Li-Chuan Weng September 2004 Supervisor: Dr. XiaoJun Wang, Lecturer, Dublin City University Interanl Examiner: Dr. Noel O’Connor, Lecturer, Dublin City University External Examiner: Dr. Steve Grainger, Professor, Staffordshire University Declaration I hereby certify that this material, which I now submit for assessment on the program of study leading to the award of Research Master Degree of Electronic Engineering is entirely my own work and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work. Signed : ID No, MM* Date : ^ To my parents Acknowledgements This thesis would not have been possible without my advisors, Dr. XiaoJun Wang and Dr. Alan P. Su. Many thanks to Dr. Wang for his inspiration and the time and patience spent in reading this thesis, thanks to Dr. Su for the discussions regarding to my research. Dr. Wang allowing me the freedom to work on topics that I enjoyed.
    [Show full text]
  • SLA-Based, Energy-Efficient Resource Management In
    SLA-based, Energy-Efficient Resource Management in Cloud Computing Systems by Hadi Goudarzi ______________________________________________________________ A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2013 Copyright 2013 Hadi Goudarzi DEDICATION To my family for their everlasting love and support ii ACKNOWLEDGEMENTS The completion of this doctoral dissertation has been a long journey made possible through the inspiration and support of a handful of people. First and foremost, I especially would like to thank my Ph.D. advisor and dissertation committee chairman, Professor Massoud Pedram, for his enthusiasm, mentorship, and encouragement throughout my graduate studies and academic research. He has been a continuous source of motivation and support for me and I want to sincerely thank him for all I have achieved. His passion for scientific discoveries and his dedication in making technical contributions to the computer engineering community have been invaluable in shaping my academic and professional career. Special thanks go to my dissertation and qualification committees, including Professor Sandeep K. Gupta, Professor Murali Annavaram, Professor Viktor K. Prasanna, Professor Aiichiro Nakano, and Professor Mansour Rahimi for their time, guidance, and feedback. Special thanks to Dr. Shahin Nazarian with whom I have collaborated on some projects. I truly appreciate his mentorship, priceless advices, feedback, and help. I would like to thank those who have in many ways contributed to the success of my academic endeavors. They are the Electrical Engineering staff at the University of Southern California, particularly Annie Yu, Diane Demetras, and Tim Boston; the iii members of the System Power Optimization and Regulation Technology (SPORT) Laboratory; and my best and dearest friends.
    [Show full text]
  • Placement and Beyond in Honor of Ernest S. Kuh
    Placement and Beyond in Honor of Ernest S. Kuh C. K. Cheng UC San Diego March 28, 2011 Placement and Beyond • Ernest S. Kuh is a pioneer and giant in physical layout. • Board of Directors, Cadence Design Syy,stems, San Jose, Ca. (1984‐1991). • Chair, Scientific Advisory Committee, Cadence Design StSystems (1988‐1991). • C&C Prize, Japan Society for Promotion of Communication and Computers, 1996. • Phil Kaufman Award of the Electronic Design Automation Consortium, 1998. • EDAA Lifetime Achievement Award, 2008 • Robert Gustav Kirchhoff Award, 2009 Outlines • Placement • Applications • Team • Second Waves • Scaling and Trends 3 Placement • 1‐D Gate Assignment: Interval Graph • Building Block Layout: BBL, BEAR • Gate Array Layout: BAGEL • Standard Cell Placement: RAMP, PROUD • Performance Driven Placement: Congestion, Timing, Low Power 4 4 Placement • Productivity: Theory, Software Package, Application • Core of Physical Synthesis: > $500 Millions Market ESL Synthesis Logic Synthesis Placement DFT, DFM, ECO Timing Analysis Routing 5 5 Building Block Layout • Issues: Representation, Routability • Nonslicing Architecture –Repr esenta tio n: Tile Plaeane –Routability: Routing Order for 100% routing compltiletion • Applications –Digital Equipment Corporation –ECAD, Cadence 6 Standard Cell Placement • Issues: Complexity, Timing Convergence due to Interconnect Dominance • RAMP, PROUD – Analogy of a resistive network – Quadratic wire length minimization • Analytical vs. Iterative Approaches – Quadratic Programming – Simuldlated Annealing (‘83 Kirkk)kpatrick) • Applications – Qplacer 7 Analytical Placement • Quinn and Breuer, 1979: Force Model – Hook’s law for attraction – Repulsive force for pairs without connection • Antreich, Johannes, and Kirsh, 1982: Systematic Formulation • RAMP, 1983: Delete repulsive force – Sparse matrix operations • GALA, 1984: Gate Array Layout, Hughes Aircraft Comp. • PROUD, 1988: Successive Over Relaxation • Qplacer, 1992: Louis Chao, Cadence • R.S.
    [Show full text]
  • Coupling of Synthesis and Layout: Challenges and Solutions
    Coupling of Synthesis and Layout: Challenges and Solutions Jason Cong Computer Science Department University of California at Los Angels 4711 Boelter Hall, Los Angeles, CA 90095-1596 ABSTRACT As the IC technologies move into deep submicron designs, many physical design effects, such as the delay and noise associated with interconnects, are becoming signi®cant, often dominating, factors in determining the overall circuit performance and relia- bility. Therefore, it is critical to consider layout design during high-level and logic-level synthesis in order to meet tight design constraints and achieve the best possible performance in deep submicron designs. This session starts with an embedded tutorial on "Logical-Physical Co-design for Deep Submicron Circuits: Solutions and Challenges" by Prof. Massoud Pedram, followed by discussions by a panel of experts from industry and academia on the following issues: 1. Reality and Needs: Why the conventional design ¯ow with separated synthesis and physical design fail to work well for today's designs? How bad is the convergence problem using the conventional design ¯ow for today's high-performance designs? Can the conventional design ¯ow be ®xed with incremental enhancements, or a revolutionary change in the design ¯ow and design methodology is needed? 2. Challenges: How synthesis tools can consider detailed physical effects, such as interconnect delay, coupling noise, ground bounce, electro-migration failure, etc. early in the design cycle? If merging logic synthesis and physical design is inevitable, how can we to handle over 100 million transistor designs, while each design step itself is already facing serious challenges to meet the design complexity? 3.
    [Show full text]
  • Payam Heydari Address : 830 Childs Way, SUITE #254 Los Angeles, CA 90089 Tel : (213) 740-4437 Home: (323) 730-0883 Fax : (213) 740-7290 E-Mail : [email protected]
    Payam Heydari Address : 830 Childs Way, SUITE #254 Los Angeles, CA 90089 Tel : (213) 740-4437 Home: (323) 730-0883 Fax : (213) 740-7290 E-mail : [email protected] EDUCATION: 1996-present: Ph.D. - Electrical Engineering - University of Southern California - GPA : 3.95/4.0 Ph.D. Advisor : Professor Massoud Pedram Thesis Title: Noise Analysis and Optimization in High-Frequency Mixed-Signal VLSI Circuits 1993-1995: M.Sc. - Electrical Engineering - Sharif University of Technology - GPA : 3.85/4.0 M.Sc. Advisor : Professor Kambiz Nayebi Thesis Title: A High Quality Mixed Excitation LPC Vocoder for Low Bit-rate Speech Coding 1987-1992: B.Sc. - Electrical Engineering - Sharif University of Technology - GPA : 3.5/4.0* * The average undergraduate GPA at the Sharif University of Tech. is 2.9/4.0. INTERNSHIP: Summer 1997: Bell laboratories, Lucent Technologies, Murray Hill, NJ. Summer 1998: IBM T. J. Watson Research Center, Yorktown Heights, NY. HONORS & AWARDS: • Best Iranian Graduate Student in the area of Electrical Engineering in Southern California, 2001 • Technical Excellence Award from Association of Professors and Scolars of Iranian Heritage, 2001 • Best Paper Award, IEEE International Conference on Computer Design, Austin, Texas, 2000. • Recipient of Travel grant, ASP-DAC conference, Yokohama, Japan, Feb. 2001. • Recipient of Young Student Award, Design Automation Conference, Los Angeles, 2000. • Honorable Mention Award, Department of EE-Systems, University of Southern California, 2000. • Ranked 2nd amongst M.Sc. graduate students, Sharif University of Technology, 1995. • Ranked 10th in the nationwide examination for postgraduate studies, Iran, 1993. • Ranked 5th among more than 200,000 applicants in the nationwide entrance examination, Iran, 1987.
    [Show full text]
  • Global Warming – Impacts and Future Perspective
    GLOBAL WARMING – IMPACTS AND FUTURE PERSPECTIVE Edited by Bharat Raj Singh Global Warming – Impacts and Future Perspective http://dx.doi.org/10.5772/2599 Edited by Bharat Raj Singh Contributors Juan Cagiao Villar, Sebastián Labella Hidalgo, Adolfo Carballo Penela, Breixo Gómez Meijide, C. Aprea, A. Greco, A. Maiorino, Bharat Raj Singh, Onkar Singh, Amjad Anvari Moghaddam, Gabrielle Decamous, Oluwatosin Olofintoye, Josiah Adeyemo, Fred Otieno, Silvia Duhau, Sotoodehnia Poopak, Amiri Roodan Reza, Hiroshi Ujita, Fengjun Duan, D. Vatansever, E. Siores, T. Shah, Robson Ryu Yamamoto, Paulo Celso de Mello-Farias, Fabiano Simões, Flavio Gilberto Herter, Zsuzsa A. Mayer, Andreas Apfelbacher, Andreas Hornung, Karl Cheng, Bharat Raj Singh, Alan Cheng Published by InTech Janeza Trdine 9, 51000 Rijeka, Croatia Copyright © 2012 InTech All chapters are Open Access distributed under the Creative Commons Attribution 3.0 license, which allows users to download, copy and build upon published articles even for commercial purposes, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications. After this work has been published by InTech, authors have the right to republish it, in whole or part, in any publication of which they are the author, and to make other personal use of the work. Any republication, referencing or personal use of the work must explicitly identify the original source. Notice Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published chapters. The publisher assumes no responsibility for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained in the book.
    [Show full text]
  • Dr. Massoud Pedram
    Colorado State University’s Information Science and Technology Center (ISTeC) presents two lectures by Dr. Massoud Pedram Stephen and Etta Varra Professor Department of Electrical Engineering University of Southern California ISTeC Distinguished Lecture In conjunction with the Electrical and Computer Engineering Department and Computer Science Department Seminar Series “Digital Infrastructure: Reducing Energy Cost and Environmental Impacts of Information Processing and Communications Systems” Monday October 21, 2013 Reception with refreshments: 10:30 am Lecture: 11:00 am – 12:00 noon Location: Morgan Library, Event Hall Electrical and Computer Engineering Department Special Seminar Sponsored by ISTeC “Designing Energy-Efficient Information Processing Systems” Monday October 21, 2013 Lecture: 2:00 – 3:00 pm Location: Lory Student Center Rm. 224-226 ISTeC (Information Science and Technology Center) is a university-wide organization for promoting, facilitating, and enhancing CSU's research, education, and outreach activities pertaining to the design and innovative application of computer, communication, and information systems. For more information please see ISTeC.ColoState.edu. Abstracts: Digital Infrastructure: Reducing Energy Cost and Environmental Impacts of Information Processing and Communications Systems Modern society’s dependence on information and communication infrastructure (ICI) is so deeply entrenched that it should be treated on par with other critical lifelines of our existence, such as water and electricity. As is the case with any true lifeline, ICI must be reliable, affordable, and sustainable. Meeting these requirements (especially sustainability) is a continued critical challenge, which will be the focus of my talk. More precisely, I will provide an overview of information and communication technology trends in light of various societal and environmental mandates followed by a review of technologies, systems, and hardware/software solutions required to create a sustainable ICI.
    [Show full text]
  • An Efficient Training and Inference Engine for Intelligent Mobile
    1 JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services Amir Erfan Eshratifar, Mohammad Saeed Abrishami, and Massoud Pedram, Fellow, IEEE Abstract—Deep learning models are being deployed in many mobile intelligent applications. End-side services, such as intelligent personal assistants, autonomous cars, and smart home services often employ either simple local models on the mobile or complex remote models on the cloud. However, recent studies have shown that partitioning the DNN computations between the mobile and cloud can increase the latency and energy efficiencies. In this paper, we propose an efficient, adaptive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase. JointDNN not only provides an energy and performance efficient method of querying DNNs for the mobile side but also benefits the cloud server by reducing the amount of its workload and communications compared to the cloud-only approach. Given the DNN architecture, we investigate the efficiency of processing some layers on the mobile device and some layers on the cloud server. We provide optimization formulations at layer granularity for forward- and backward-propagations in DNNs, which can adapt to mobile battery limitations and cloud server load constraints and quality of service. JointDNN achieves up to 18 and 32 times reductions on the latency and mobile energy consumption of querying DNNs compared to the status-quo approaches, respectively. Index Terms—deep neural networks, intelligent services, mobile computing, cloud computing F 1 INTRODUCTION In the status-quo approaches, there are two methods DNN architectures are promising solutions in achieving for DNN inference: mobile-only and cloud-only.
    [Show full text]
  • SOROUSH ABBASPOUR [email protected] [email protected]
    SOROUSH ABBASPOUR Address: Work Address: 830 Childs Way, Suite #487 [email protected] EEB206, USC Los Angeles, CA 90089 [email protected] Los Angeles, CA 90089 (310) 699-8279 http://atrak.usc.edu/~soroush (213)740-4480 EDUCATION Ph.D. candidate in Electrical Engineering todate University of Southern California, Los Angeles, CA. GPA 3.93/4.00. Thesis: Statistical Timing Analysis for CAD Tools of VLSI Circuits M.S. student in Electrical Engineering December 2004 University of Southern California, Los Angeles, CA. GPA 4.00/4.00. B.Sc. in Electrical Engineering July 2001 Sharif Institute of Technology, Tehran, Iran. EXPERIENCE • Research Assistant, EE Dept., University of Southern 9/01 – todate California, Los Angeles, CA. − Statistical timing and power analysis for CAD tools − Static timing and power analysis for CAD tools • Teaching Assistant, EE Dept., University of Southern 1/05-todate California, Los Angeles, CA. Teaching assistant for EE680, “Computer-Aided Design of Digital Systems I (Physical Design)”, Instructor: Professor Melvin Breuer • Summer Internship, IBM Corp, East Fishkill, NY 5/04-8-04 Statistical timing analysis for CAD tools • Teaching Assistant, EE Dept., University of Southern 1/04-5/04 California, Los Angeles, CA. Teaching assistant for EE680, “Computer-Aided Design of Digital Systems I (Physical Design)”, Instructor: Professor Melvin Breuer • Summer Internship, Magma Design Automation Company, 5/03 – 8/03 Santa Clara, CA Static timing analysis for CAD tools • Teaching Assistant, EE Dept., University of Southern 1/03 – 5/06 California, Los Angeles, CA. − Teaching assistant for EE680, “Computer-Aided Design of Digital Systems I (Physical Design)”, − Instructor: Professor Massoud Pedram • System Design, R&D section, Kharazmi Office Machines Inc., 10/99-7/01 Tehran, Iran.
    [Show full text]