On Fast Algorithm and Vlsi Design of Finite Computational Structures
Total Page:16
File Type:pdf, Size:1020Kb
ON FAST ALGORITHM AND VLSI DESIGN OF FINITE COMPUTATIONAL STRUCTURES By ROM- S HEN KAO A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1991 Dedicated to my parents ACKNOWLEDGMENTS In the years leading to this dissertation, I have had the great pleasure of working for and with an adviser who provided an environment conducive to learning. I would like to take this opportunity to thank my advisor, Dr. Fred J. Taylor. I would also like to thank Dr. Y. C. Chow, Dr. H. Lam, Dr. M. Law, and Dr. J. Principe for serving on my supervisory committee. Special thanks to my fellow students, who have offered useful advice. Thanks go to Ah- mad Ansari, Glenn Dean, Jun Li, Jon Mellott, Jeremy Smith, Eric Strom, Glenn Zelniker, and especially, John Ruckstuhl, for his courtesy and professional work on setting up the SUN systems. I would also like to thank those I have come to know at the University of Florida. They make my years at the university a unique and memorable experience. This work would not be possible without the encouragement and support from my wife, Jeannie. Finally, I would like to thank my parents for teaching me the philosophy of life and the value of education, and most of all for their love, patience, and support. - in - TABLE OF CONTENTS ACKNOWLEDGMENTS iii LIST OF TABLES vii LIST OF FIGURES viii KEY TO SYMBOLS AND ABBREVIATIONS xiii ABSTRACT xv CHAPTERS 1 INTRODUCTION 1 1.1 Overview 1 1.2 Problem Statement and Objectives 2 1.3 Dissertation Organization 5 2 FINITE COMPUTATIONAL STRUCTURES 8 2.1 Introduction 8 2.2 Algebraic Foundations 8 2.3 Modular Arithmetic and Number Theory 10 2.3.1 Chinese Remainder Theorem (CRT) for Integers 17 2.3.2 Residue Number System Arithmetic (RNS) 17 2.4 Residue Polynomials 19 2.4.1 Properties of Polynomial 19 2.4.2 Convolutions and Polynomial RNS 21 2.5 Finite Fields Based on Polynomial Rings 25 m 2.5.1 Construction of the Finite Field GF( p ) 25 m 2.5.2 Properties of the Finite Field GF(p ) 29 m 2.5.3 Bases of GF( p ) over GF(p) 32 3 COMPUTATIONS IN FINITE FIELDS 35 3.1 Introduction 35 3.2 Index Calculus 35 3.3 The Table Lookup Method for Small Fields 36 - iv - 3.4 Discrete Logarithm and Exponentiation Problems 45 3.5 Primal Basis Multiplier 49 3.6 Dual Basis Multiplier 55 3.7 Normal Basis Multiplier 60 4 FINITE STRUCTURE TRANSFORMS 63 4.1 Introduction 63 4.2 Properties of Transforms and Some Prime Field Transforms 65 4.3 Complex Fields Transforms and Complex Arithmetic 69 4.3.1 Complex Number Theoretic Transform (CNNT) 70 4.3.2 Complex Residue Number System (CRNS) 71 4.3.3 Quadratic Residue Number System (QRNS) 71 4.3.4 Extended Algebraic Integer and Polynomial Residue Number System (PRNS) 73 4.4 Transforms and Computations in Extension Fields 80 4.4.1 Index Calculus Complex Residue Number System (ICCRNS) .... 80 4.4.2 Finite Extension Field Transforms 83 5 FAST DISCRETE FOURIER TRANSFORMS OVER FINITE FIELDS 93 5.1 Introduction 93 5.2 Fast Prime-Factor Finite Field Transform 93 5.2.1 The Application of Normal Basis and Conjugacy Properties 99 5.2.2 A Normal Basis Architecture for the Finite Field Transform 100 5.2.3 System Performance Analysis and Discussions 103 5.3 The Basis-Change Algorithm for Fast Finite Field Transforms 105 5.3.1 The Conjugate Sets in a Fast Finite Field Transform 106 5.3.2 The Basis-Change Algorithm 107 5.3.3 System Development 109 5.3.4 System Complexity Evaluation and Example Ill 6 THE PIPELINE POLYNOMIAL RNS PROCESSOR 7.1 WITH FERMAT NUMBER TRANSFORM 115 6.1 Introduction 115 6.1.1 Reasons to Use Fermat Prime Number As M 116 6.1.2 Algebraic Congruence to a Fermat Prime 117 6 . 1.3 The Multiplier-Free Fast FFT-Type Isomorphic Mappings 119 6.2 A Pipeline Third-order Polynomial RNS Processor 122 6.3 System Implementation of the Third-order Polynomial RNS Processor . 128 6.3.1 A Nine-bit PRNS Processor Development on the HP-DCS 130 6.3.2 A Five-bit PRNS Processor Development on the Magic CAD Tool 148 7 CONCLUSION AND FUTURE RESEARCH 167 Conclusion 167 - v - 7.2 Future Research 173 APPENDICES A ALGEBRAIC FOUNDATIONS: DEFINITIONS AND THEOREMS 176 A. 1 GROUPS 176 A. 2 RINGS AND FIELDS 177 A. 3 THEOREMS 180 B CIRCUIT DESIGNS: HP-DCS AND MAGIC CAD TOOLS 182 B. l The HP-DCS Schematics of the Nine-Bit Third-Order PRNS Processor . 182 B.2 The HP-DVI Timing Diagrams of the Nine-Bit Third-Order PRNS Processor 188 B.3 The Magic Design of the Five-Bit Third-Order PRNS Processor 199 B.4 CIF and Postscript Files of a Two-Input NAND Gate 212 B.4.1 CIF: A Geometry Language 212 B.4. 2 PostScript: A Geometry Language 215 BIBLIOGRAPHY 218 BIOGRAPHICAL SKETCH 223 - vi - a LIST OF TABLES Table 2. 1 The Elements and Their Order of the Multiplicative Group of the Ring Z15 14 Table 2.2 The Possible and Actual Order of the Elements in the Multiplicative Group of the Ring Z15 15 Table 2.3 The Elements and Their Order of the Field Z 17 16 2.4 Table Elements ft = + aia~ + a\ + ao of Fiia ) Generated by f\{x) ... 28 Table 2.5 Elements = + ci\ ft (liar + a + ao of F2 (a ) Generated by fi(x) . 29 Table 3. 1 Illustration of Indices for N = 9 36 4 3.2 Table The Zech’s Logarithms in GF(2 ) 39 4 Table 4.3 Field Elements of GF(2 ) Represented Under Primal, Primitive Normal, and Normal Bases 92 Table 5. 1 The Complexity Comparison Between Conventional and Normal Basis Approaches 103 Table 6.1 The Timing Information of the Simulation of the PRNS Processor 147 - Vll - LIST OF FIGURES Figure 2.1 Example of Abelian Finite Groups 9 Figure 3.1 A Finite Field Arithmetic Unit Based On the All-Table Approach .... 37 Figure 3.2 A Discrete Logarithm/Exponentiation Multiplier 38 Figure 3.3 A Logarithmic Finite Field Arithmetic Unit 41 Figure 3.4 The Building Block of the Finite Field Multipliers, a) Storage element such as register and accumulator, b) Multiple input Modulop adder. c) Modulo p multiplier for mul- tiplying a data with gi . d) Modulo p adder 49 Figure 3.5 A Generic Primal Basis aa -Multiplier 50 Figure 3.6 A Primal Basis Accumulated aa^-Multiplier 52 Figure 3.7 A Primal Basis Nested ab Multiplier 54 Figure 3.8 A Generic Dual Basis aa -Multiplier 56 Figure 3.9 A Dual Basis Accumulated ao^-Multiplier 57 Figure 3.10 A Dual Basis Summed -Multiplier 59 Figure 3.11 A Dual Basis Nested ab Multiplier 60 m Figure 3.12 A Cyclic Shift Register for Power Forming in GF(p ) 61 Figure 3.13 A Normal Basis Multiplier 62 - viii - 1 Figure 4.1 The Representation of Complex Number t in Z[co 1 76 Figure 4.2 The Pictorial Representation of the Mh- order Single Modulus PRNS System Figure 4.3 The Cyclic Shift Property of Normal Basis Representation of V 91 Figure 5.1 The VLSI Butterfly Module of a Cyclotomic CB . Coset of Length / . 100 Figure 5.2 The s Stages Fast Galois Field Transform System 101 Figure 5.3 The 255-point Good-Thomas FFT Pipeline Stages 102 Figure 5.4 The Conjugate Set Evaluation Circuit 108 Figure 5.5 An Example of the Scaling Shifter 110 Figure 5.6 An Example of the Vector Combiner 110 Figure 5.7 The Primal Basis Generator 1 1 Figure 5.8 The Normal Basis Generator 112 Figure 6.1 The Isomorphic Forward Transform f4 124 Figure 6.2 The Isomorphic Inverse Transform 125 Figure 6.3 The Isomorphic Mapping Scalar Modules 127 Figure 6.4 The 1 Isomorphic Inverse Transform F4 128 Figure 6.5 System Block of the 9-bit Third-order PRNS Arithmetic Unit with FFT-Type Fast Isomorphic Mappings 129 Figure 6.6 Top Level Schematic of the PRNS System 131 Figure 6.7 The Hierarchical Design of the Third-Order PRNS Processor 132 - ix - Figure 6.8 The Boolean Equation of the Module - negmodp 134 Figure 6.9 The Multiwindow Demonstration of FTI Device Selection 135 Figure 6.10 The Modulo M Adder 136 Figure 6. 11 The MDL.vj Unit Implemented Using a Table Lookup Method 137 Figure 6.12 The Boolean Equation of the Module - summodp 138 Figure 6.13 The Boolean Equation of the Module - sumovck 139 Figure 6.14 A Nine-bit Modular Adder 140 Figure 6.15 The Modulo M Multiplier 141 Figure 6. 16 The Scalar Module RSQ 143 Figure 6.17 The Isomorphic Forward Mapping Module 144 Figure 6.18 The Isomorphic Inverse Mapping Module 145 Figure 6.19 The Logic and Timing Simulation Result of the Third-Order PRNS Processor 146 Figure 6.20 Floorplan of the Five-bit Modular Multiplier 149 Figure 6.21 A Hierarchical Design of the Five-Bit Modular Multiplier 150 Figure 6.22 Basic CMOS Cells 151 Figure 6.23 A Two-Input CMOS NAND Gate Magic Layout 152 Figure 6.24 An AND-OR Programmable Logic Array 153 Figure 6.25 A Pseudo-nMOS Programmable Logic Array Design 154 Figure 6.26 Typical Input Protection Circuit 155 Figure 6.27 The PLA Transistor-Level Layout of the Negator Cell - palneg 156 - x - Figure 6.28 A CMOS Magic Layout of the Five-bit Modular Multiplier 158 Figure 6.29 The Embedded Test Circuitry of the Five-bit Modular Multiplier 159 Figure 6.30 The Timing Diagram of the Operation of the Modular Multiplier 160 Figure 6.31 Floorplan of the Five-Bit Polynomial RNS Processor 162 Figure 6.32 A Hierarchical Design of the Five-Bit Polynomial RNS Processor 163 Figure 6.33 A Split-then-Add Scaler Module 164 Figure 6.34 A Transistor Layout of the Scaler Module scalerRSQ 165 Figure B.