A Formal Process for Systolic Array Design Using Recurrences

A Formal Process for Systolic Array Design Using Recurrences Jonathan Puddicombe Revised version of thesis submitted for degree of Doctor of Philosophy University of Edinburgh October 1992 (Revised version submitted June 1993) Declaration This thesis was composed by me and the work described in it is my own, except where indicated. Acknowledgements My thanks go to my supervisors, Peter Denyer, especially for his faith, optimism and encouragement, and Peter Grant, especially for his detailed comments on my work and his faithfulness, patience and perseverance with me. My thanks also go to Sanjay Rajopadhye, whose work was the springboard for mine, to S.E.R.C. for funding me for two years, and to everyone who gave me support of a technical, financial or moral nature. W Abstract A systolic array is essentially a parallel processor which consists of a grid of locally- connected sub-processors which receive, process and pump out data synchronously in such a way that the pattern of data-flow to and from each processor is identical to the flow to and from the other processors. Such arrays are repetitive and modular and require little length of communication interconnection, so that they are relatively simple to design and are amenable to efficient VLSI implementation. The systolic architecture has been found suitable for implementing many of the algorithms used in the field of signal- and image-processing. A formal design method is a well-defined process for constructing, given a well-defined function from a certain class, a well-defined object (e.g. a design) which performs that function. When proven correct, such methods are useful for designing equipment which is safety-critical or where a design fault discovered after manufacture would be expensive. This thesis presents a formal design method for producing high-level implementations for certain signal-processing and other algorithms. These high-level implementations can themselves usually be easily implemented as systolic arrays. As a necessary preliminary to the method, a calculus is defined. The basic concept, that of a "computation", is powerful enough to express both abstract algorithms and those whose suboperations have been assigned a place and a time to execute. Computations may be composed or abstracted (by having their variables hidden) or may have their variables renamed. The "simulation" of one computation by another is defined. Using this calculus it is possible to formalise concepts like "dependency" (of data or control) and "system of recurrence equations", which often appear in the literature on systolic array design. The design method is then presented. It consists of five stages: pipelining of data dependencies, scheduling, pipelining of the control variables, allocation of subprocessors to the subcomputations, and the final stage (in which the design is constructed). The main concepts are not new, but here they have been formalised, iv arranged and linked in a clearly defined way. The output of the method is a high-level design description which defines the functionality of each subprocessor in the array (for both data and control). It also defines scheduling and allocation of all the operations which are to be executed and the data and control input requirements of the array. The method is used to design a simple one-dimensional systolic convolver and then to design a more complicated two-dimensional systolic array which performs Given's algorithm for QR-factorisation, a task required in certain signal-processing applications such as adaptive estimation and bearing measurement. Alternative designs are briefly discussed. For the convolver and the two arrays for QR-factorisation, sketches of the architectures are given but these are hand-produced and are not the product of the method. A detailed proof is given that, subject to assumptions about the well-deflnedness of the computations handled and created, the design method will produce only designs which meet their specifications; however the final high-level design may imply a low-level implementation which may contain an interconnection structure which is arguably non- local. A proof is given that the well-defmedness conditions hold which are required for the validation of data-pipelining. V Contents Terminology ......._ -_...-.......... ..----".- IX Glossary of Terms 1 Introduction 1 1.1 Subject of Thesis . 1 1 .2 Systolic Arrays ..................................................................................2 1.2.1 What is a systolic array? ......................................................... 2 1.2.2 How do SM compare with other parallel processors? ........... 5 1.3 Formal Design Methods .................................................................... 9 1.3.1 What are formal design methods and their advantages over informalmethods? ..................................................................9 1.4 A Formal Design Method for Systolic Arrays ................................11 1.5 Overview of the Thesis ...................................................................15 2 Systolic Arrays and Formal Design Methods ...................................... 16 2.1 Examples of Systolic Arrays ...........................................................16 2.2 Formal Design Methods ..................................................................32 2.3 Design of Systolic Arrays ...............................................................33 2.3.1 Beginnings............................................................................33 2.3.2 A developing discipline........................................................34 2.4 summary .........................................................................................37 3 Computations and Recurrences ........................................................... 38 Vi 3.1 Computations 38 3.1.1 Composition.......................................................................... 39 3.1.2 Hiding ................................................................................... 44 3.1.3 Renaming.............................................................................. 45 3.1.4 Simulation............................................................................. 48 3.1.5 Example: TripleAdd ............................................................. 49 3.2 Embedded Computations ................................................................ 51 3.3 Recurrences ..................................................................................... 53 3.3.1 Example: Convolution.......................................................... 58 3.3.2 Shorthand expressions for computations ..............................64 3.4 Space-time networks ....................................................................... 65 3.5 Summary, discussion and further work .......................................... 70 3.5.1 Summary............................................................................... 70 3.5.2 Discussion............................................................................. 70 3.5.3 Further work ......................................................................... 72 4 The Formal Design Method .................................................................73 4.1 Data-pipelining ............................................................................... 78 4.1.1 Example................................................................................ 83 4.2 Scheduling ...................................................................................... 92 4.2.1 Example................................................................................ 93 4.3 Control-pipelining ........................................................................... 94 4.3.1 Example ................................................................................. 98 4.4 Allocation ......................................................................................105 4.4.1 Example .............................................................................. 105 vu 4.5 The Final Stage and Summary of the Design Process . 106 4.6 The Architecture ........................................................................... ill 114 4.6.1 Summary of section ............................................................ 4.7 Summary of chapter, discussion and further work .......................114 114 4.7.1 Summary............................................................................. 114 4.7.2 Discussion............................................................................ 116 4.7.3 Further work ....................................................................... 5 The Formal Design Method Applied to QR-Factorisation Example 119 129 5.1 Data-pipelining ............................................................................. 134 5.1.1 OtherOptions...................................................................... 134 5.2 Scheduling .................................................................................... 5.2.1 Other Options...................................................................... 135 135 5.3 Control pipeiining ......................................................................... 5.3.1 Pipelining of cont................................................................ 135 5.3.2 Pipe1iningof cox ........................................................... 138 5.3.3 Pipelining of coy ......................................................... 142 5.3.4 Amalgamation of just-generated computations..................1

Load more