Optimisation Techniques for Stack Based Architectures July 1996

Optimisation Techniques for Stack Based Architectures Christopher Bailey A thesis submitted in partial fulfilment of the requirements of the University of Teesside for the degree of Doctor of Philosophy. The research was conducted within the Computer Architecture Research Unit of the Division of Electronic and Computer Engineering, in collaboration with the Science and Engineering Research Council (SERC) and Microprocessor Engineering Ltd. July 1996 Acknowledgements This research thesis is a result of a SERC CASE award (August 1992-August 1995), in conjunction with Microprocessor Engineering Ltd, Southampton, England. In submitting this thesis for examination and future reference, I wish to acknowledge the support received during my research studentship at the University of Teesside. My thanks go in particular to Stephen and Linda Pelc, and Microprocessor Engineering Ltd, who supported the CASE studentship and made this research possible. Their encouragement and financial support have been of great value throughout the research period. I would also like to express thanks to Professor Reza Sotudeh, for his guidance as my primary research supervisor, and the support he has offered throughout my studentship. Professor L, J., Herbst should also be thanked as second supervisor, and for his diligent assistance throughout the research period. The C-compiler referred to throughout this thesis, and utilised during the research programme, was developed by Damien Kelly to whom I am grateful. Finally Bill Stoddart of the School of Computing and Mathematics has also provided me with much encouragement in our research meetings as a second supervisor, and provided many welcome insights into the research conducted. Author’s Declaration In declaration of prior publication of the contents of this thesis by the author it should be noted, in accordance with the guidelines of BS 4281[1], that the following chapters are partially based upon previous conference and journal publications: Chapters 4 and 6 contain material published in (Bailey 1993a, 1994a, 1995a, and 1995b). Chapter 7 contains results previously published in (Bailey 1994b, 1995a, and 1995b). Several results from Chapter 8 may be found in Bailey (1994a), whilst Chapters 1 to 5, and 9 to 11 are generally of previously unpublished content. All of the papers published as a result of this research are to be found in Appendix-A of this thesis. [1] BS 4281 : 1990, British Standard Recommendations for the Presentation of theses and dissertations. ii ———————— Abstract ———————— ———————— iii ABSTRACT Recent research into computer architecture and processing methods have been significantly influenced by the development of RISC paradigms, and the continuing debate over RISC versus CISC. The emergence of such wholly ’new’ processor paradigms, which divorce themselves from instruction complexity in order to optimise instruction efficiency, has resulted in significant revisions in computer architecture and design. Yet the popular image of processor design philosophy, as a simple issue of RISC versus CISC, over-simplifies matters when in truth alternative architectures have pre- dated and co-existed with many mainstream processor concepts. The thesis concentrates upon one class of alternative architecture: the stack processor, which although compatible within general processor classifications is perhaps more precisely viewed as a third class of processor technology. After assessing the historical perspective, through which stack processor technology may be viewed, an introduction to the fundamentals of stack-based computation is given. A review of the current state of relevant research covers key areas. Current stack processor design concepts are assessed and compared, and High-Level-Language issues such as local variable support are examined in hardware and software terms. Performance issues such as instruction fetch efficiency and stack management are also presented. In each case techniques for improved performance are identifiable, and reference to current research is highlighted. Following a review of stack processor architecture, the thesis presents new and original work in a number of areas. An initial discussion of stack processor behaviour introduces several quantitative metrics for understanding stack processor behaviour, with data presented both in a FORTH and C contexts. A number of techniques for improved performance are examined, including software, hardware, and instruction set enhancements. An analysis of factors influencing bus bandwidth reveals the significance of several key areas which are then addressed in terms of current optimisation techniques. The results indicate trade-offs that were not previously recognised, or were not beneficial when applied to mainstream architectures. The individual effects and trade-offs of these optimisations are quantified in the environment of a 32-bit stack-based processor model, confirming previous research findings. iv Mathematical formalisation of stack-processor behaviour has received limited attention in previous research publications. Hence an attempt is made to represent the behaviour of major system components as mathematical models which, when combined, permit an overall model for stack-processor performance to be presented. Since the model reflects the absolute and relative trade-offs of hardware and software optimisation features studied, a range of processor configurations can be compared on a quantitative basis, unlike previous empirically-bound work. Finally a revised model for stack processor hardware is specified, with enhancements for high-level-language support and general performance improvement. With the use of VHDL and logic synthesis techniques, major system components are implemented at the gate level. Hence many optimisation effects are resolved to the point that gate-level trade-offs are included in the final analysis. The thesis shows that, within the context of 32-bit processor architecture, stack processors can effectively support key features of high-level-language execution without compromising stack processor philosophy, and increase relative performance with respect to mainstream processor technology. v ———————— Contents ———————— ———————— vi Acknowledgment and declaration ii Abstract iii List of Figures xiii List of Tables xvii List of Symbols xix List of Equations xxii List of Abbreviations xxiv 1 Introduction p1 1.0 Introduction p2 1.1 Background to stack processors p2 1.2 Structure and content p3 1.3 Dynamic machine-stack behaviour p4 1.4 The UTSA experimental stack processor platform p4 1.5 Stack buffers and the stack-memory bottleneck p5 1.6 Local variables: hardware, software, and instruction sets p5 1.7 Instruction fetch bandwidth reduction p6 1.8 VHDL and hardware synthesis of the UTSA model p7 1.9 Overall performance, assessment and comparison p8 2 Stack Processors: Technology and Trends p9 2.0 Stack processors - technology and trends p10 2.1 Stack processors - the alternative RISC p11 2.1.1 The register file paradigm p12 2.1.2 RISC : register windows and context preservation p14 2.1.3 The stack processor paradigm p15 2.2 Stack processors - a brief historical perspective p17 2.2.1 ALGOL - the first era of stack machines p19 2.2.2 The impact of high-level-language developments p20 2.2.3 PASCAL, C, and the arrival of RISC p20 2.2.4 FORTH and FORTH engines p21 2.2.5 Stack processors - the present view p21 2.3 The case against stack processors p23 2.4 Modern stack processor technology p26 vii 2.5 Stack buffering strategies p28 2.6 Instruction encoding strategies p30 2.7 High level language support p32 2.7.1 Local variables, and frame stacks p33 2.7.2 Local variable optimisation and stack buffer behaviour p33 2.8 Quantitative measurements and mathematical models p35 3 Research Objectives and Research Tool Developments p37 3.0 Research Objectives p38 3.1 Development of a research tool suite p41 3.1.1 The UTSA C compiler p42 3.1.2 Software optimisation: local variable scheduling p42 3.1.3 Software optimisation: peephole optimiser p43 3.1.4 UTSA binary assembler: an investigative tool p43 3.1.5 The UTSA simulator: virtual machine and simulation platform p43 3.1.6 Stack buffer simulator: an investigative tool p44 3.1.7 VHDL models and logic synthesis p45 3.1.8 FORTH tracer p45 4 Quantitative Assessment of Stack machine behaviour p46 4.0 Introduction to chapter p47 4.1 Stack behaviour, measurement and modelling p47 4.1.1 Introducing some terminology for stack behaviour p48 4.2 The stack characteristics of FORTH programs p49 4.2.1 Stack Depth Probability of FORTH programs p49 4.2.2 Stack-Depth Modulation for FORTH programs p51 4.2.3 Limited depth change and the Cut-Back-K controversy p52 4.3 The stack characteristics of compiled C-code p54 4.4 FORTH and C-code, behavioural comparison p57 4.5 Baseline stack traffic for FORTH and C-code p58 4.6 C-Code and bus bandwidth utilisation p59 4.7 A memory traffic model for stack processor systems p62 viii 5 University of Teesside Stack Architecture (UTSA) p63 5.0 Preamble to Chapter 5 p64 5.1 The UTSA concept p65 5.2 The local variable question p66 5.2.1 The UTSA local variable implementation p67 5.3 Stack manipulation - generalisation and scaleability p68 5.4 Call, branch, and operand size p72 5.4.1 UTSA branch operations p72 5.4.2 Branch prediction strategies p73 5.4.3 Call operations p73 5.5 UTSA instruction packing scheme p75 6Stack Buffering, Traffic Behaviour, and Performance Comparisons p77 6.0 Preamble to Chapter 6 p78 6.1 The stack-buffer concept p79 6.2 Automatically

Load more