Design Automation Methodologies for Extensible Processor Platform

FACULTY OF ENGINEERING SCHOOL OF COMPUTER SCIENCE AND ENGINEERING Design Automation Methodologies for Extensible Processor Platform Newton Cheung A thesis presented to the faculty of the University of New South Wales in candidacy for the degree of Doctor of Philosophy March 2005 °c Copyright 2005 by Newton Cheung All right reserved. I hereby declare that this submission is my own work and to the best of my knowledge it contains no material previously published or written by another person, nor material which to a substantial extent has been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and con- ception or in style, presentation and linguistic expression is acknowledged. ———————————————— Newton Lim-Tung Cheung Abstract This thesis addresses two ubiquitous trends in the embedded system world - the increasing importance of design turnaround time as a design metric, and the move towards closing the design productivity gap. Adopting the right choice of design approach has been recognised as an integral part of the design flow in order to meet desired char- acteristics such as increasing software content, satisfying the growing complexities of an application, reusing off-the-shelf components, and exploring design metrics tradeoff, which closes the design productivity gap. The importance of design turnaround time is motivated by the intensive competition between manufacturers, especially makers of mainstream electronic consumer products, who shrinks the product life cycle and requires faster time-to-market to maximise economic benefits. This thesis presents a suite of design automation methodologies to automatically design embedded systems for an application in the state-of-the-art design approach - the extensible processor platform. These design automation methodologies systematise the extensible processor platform’s design flow, with particular emphasis on solving four challenging design problems: i) code segment identification; ii) instruction generation; iii) architectural customisation selection; and iv) processor evaluation. Our suite of design automation methodologies includes: i) a semi-automatic design system - to design an extensible processor that maximises the application performance while satisfying the area constraint. By specifying a fitting function to identify suitable i ii ABSTRACT code segments within an application, a two-level hierarchy selection algorithm is used to first select a predefined processor and then select the right instruction, and a performance estimator is used to estimate an application’s performance; ii) a tool to match instructions - to automatically match the pre-designed instructions with computation- ally intensive code segments, reducing verification time and effort; iii) an instructions estimation model - to estimate the area overhead, latency, power consumption of extensible instructions, exploring larger design space; and iv) an instructions generation tool - to generate new extensible instructions that maximises the speedup while minimising power dissipation. A number of techniques such as system decomposition, combinational equivalence checking and regression analysis etc., have been heavily relied upon in the creation of the final design system. This thesis shows results at every stage to demonstrate the efficacy of our design methodologies in the creation of extensible processors. The methodologies and results presented in this thesis demonstrate that automating the design process for an extensible processor platform results in significant performance increase - on average, an increase of 4.74× (up to 15.71×) compared to the original base processor. Our system achieves significant design turnaround time savings (2.5% of the full simulation time for the entire design space) with majority Pareto points obtained (91% on average), and can lead to fewer and faster design iterations. Our instruction matching tool is 7.3× faster on average compared to the best known approaches to the problem (partial simulations). Our estimation model has a mean absolute error as small as 3.4% (6.7% max.) for area overhead, 5.9% (9.4% max.) for latency, and 4.2% (7.2% max.) for power consumption, compared to estimation through the time consuming synthesis and simulation steps using commercial tools. Finally, the instruction generation tool reduces energy consumption by a further 5.8% on average (up to ABSTRACT iii 17.7%) compared to extensible instructions generated by previous approaches. iv ABSTRACT Acknowledgments This thesis could not have been completed without the help and encouragement of many people directly and indirectly, all of whom are impossible to mention here. I express my greatest gratitude to all, for making this thesis possible. First, I would like to thank my supervisor, A/Prof. Sri Parameswaran, for his insightful guidance and continuous support throughout the course of my Ph.D. degree. His ingenious approach to research and passionate attitude to work are qualities that any person would have appreciated. I would also like to thank Prof. Jörg Henkel, from whose invaluable advice and thoughtful comments I was lucky to benefit during the past three years. His knowledge of design automation and system-level design constituted the development of deeper and more inventive ideas, and his kind encouragement has significantly contributed to this thesis. My working experience at NEC Laboratory America, Inc. helped me gain several practical insights and skills. I am grateful to Prof. Jörg Henkel and A/Prof. Sri Parameswaran for giving me this opportunity to work on the project. Venkata Jakkula’s help during the project is greatly appreciated. I would also like to thank the computer engineering group faculty: A/Prof. Sri Parameswaran, Dr. Oliver Diessel and Dr. Annie Guo, administrative staff: Rochelle McDonald and Karen Corrigan, and the graduate students: Andhi Janapsatya, Jorgen Peddersen, Ashley Partis, Keith So, Usama Malik, George Ferizis, Ivan Lu, Jeremy Chan, Swarna Radhakrishnan, Seng Lin Shee, Lih Wen Koh and Shannon Koh for v vi ACKNOWLEDGMENTS providing an excellent environment for learning and research. The time I spent at the school during my Ph.D. degree was memorable to say the least. I am grateful to A/Prof. Jingling Xue, A/Prof. Hossam ElGindy, Dr. Aleksandar Ignjatović, Dr. Frank Engel, Dr. Oliver Diessel, Dr. Annie Guo and Dr. Manuel Chakravarty for various interesting and informative technical discussions. I would like to thank Andhi Janap- satya, a great friend and fellow graduate student, for providing instant feedback and thoughtful comments. Sincere thanks are due to Jorgen Peddersen for kindly proof- reading papers and providing constructive improvements. I would also like to thank Keith So for providing algorithmic advice and interesting challenges. The support and encouragement of my family and friends has been the cornerstone upon which I built my thesis. My parents, David and Kitty Cheung, and my sister, Jane Cheung, have been valuable to me for giving their genuine love and caring supports for the past twenty-five years, hence it is to them that I dedicate my achievements. My wonderful girlfriend, Amy Tso, has supported me in a special way that no one else could have. I would like to thank all my classmates at the University of New South Wales, Sydney and the University of Queensland, Brisbane for their support and companionship during the course of my university life. I would like to thank all my relatives and friends for their loving care and prayers. Last but not least, I would like to thank God for His merciful grace and endless love through His works in my life. Table of Contents Abstract i Acknowledgments v Table of Contents vii List of Figures xi List of Tables xv List of Publications xvii 1 Introduction 1 1.1 EmbeddedSystemsChallenges. 2 The Trends Towards Designing Embedded Systems . 5 DesignApproachandAutomation. 6 1.2 ExtensibleProcessorPlatform . 7 1.2.1 FeaturingaBaseProcessor. 10 1.2.2 Designing Extensible Instructions . 12 1.2.3 Including/Excluding Predefined Blocks . 14 1.2.4 SettingArchitecturalParameters . 15 Design Automation in Extensible Processor Platform . 15 vii viii TABLE OF CONTENTS 1.3 ThesisOverview............................... 16 2 Literature Review 21 2.1 EmbeddedSystemsandtheirEarlyHistory . 21 2.2 Design Approaches for Embedded Systems . 24 2.2.1 Application Specific Integrated Circuits . 24 2.2.2 GeneralPurposeProcessors . 28 2.2.3 Digital Signal Processors . 29 2.2.4 Field Programmable Gate Arrays . 31 2.2.5 Application Specific Instruction-set Processors . 34 2.3 Architecture of Application Specific Processors . 36 2.3.1 Very Long Instruction Word Processors . 38 2.3.2 Reconfigurable Processors . 41 2.3.3 ExtensibleProcessors. 44 2.4 Problems in Designing Extensible Processors . 47 2.4.1 Code Segment Identification . 47 2.4.2 Extensible Instruction Generation . 53 2.4.3 Architectural Customisation Selection . 58 2.4.4 Processor Evaluation and Estimation . 62 3 Methodology Overview 67 3.1 ExistingDesignFlow............................ 67 3.2 Overview of Our Automation Methodologies . 71 3.3 Modified Design Flow for Extensible Processors

Load more