Bit Slicing Repetitive Logic

Bit Slicing Repetitive Logic Where logic blocks are duplicated within a system, there is much to gain from optimization. • Optimize single block for size. The effect is ampliﬁed by the number of identical blocks. • Optimize interconnect. With careful design, the requirement for routing channels between the blocks can be eliminated. Interconnection is by butting in 2 dimensions. 4001 Bit Slicing Instead of creating an ALU function by function, we create it slice by slice. A[7:0] B[7:0] A[7] B[7] A[0] B[0] Z[7] Z[7:0] Z[0] • Each bit slice is a full 1-bit ALU. • N are used to create an N–bit ALU. 4002 Bit Slicing FnAND FnOR FnXOR FnADD FnSHR FnSHL Cout SHRin A Z FA B A Cin A SHLin Z= A&B Z= A|B Z= A!=B Z= A+B Z= A>>1 Z= A<<1 • Simple 1–bit ALU. • The control lines select which function of A and B is fed to Z. • Some functions, e.g. Add, require extra data I/O. 4003 Bit Slicing FnAND FnAND Z Z Vdd GND B B A A • Data busses horizontal, control lines vertical. • Compact gate matrix implementation1. 1bit slice designs can also be built around standard cells although the full custom approach used here gives better results 4004 • Bit Sliced ALU (3-bits, scalable). CarryOut ShiftRightIn ShiftLeftOut FnAND FnOR FnXOR FnAdd FnSHR FnSHL Cout SHRin A Z[2] FA B[2] A[2] Cin A SHLin Cout SHRin A Z[1] FA B[1] A[1] Cin A SHLin Cout SHRin A Z[0] FA B[0] A[0] Cin A SHLin CarryIn ShiftRightOut ShiftLeftIn 4005 Bit Slicing We can extend this principle to the whole datapath. Reg1 Reg2 Reg3 AB Z • 1–bit datapath. FnAND FnOR FnXOR FnADD FnSHR FnSHL LDr1ENr1A ENr1B LDr2ENr2A ENr2B LDr3ENr3A ENr3B Cout SHRin A Z DQ DQ DQ Reg1 Reg2 Reg3 FA Load Load Load B A Cin A SHLin 4006 Bit Sliced Datapath CarryOut ShiftRightIn ShiftLeftOut FnAND FnOR FnXOR FnADD FnSHR FnSHL LDr1ENr1A ENr1B LDr2ENr2A ENr2B LDr3ENr3A ENr3B Cout SHRin A Z[3] DQ DQ DQ Reg1 Reg2 Reg3 FA Load Load Load B[3] A[3] Cin A SHLin Cout SHRin A Z[2] DQ DQ DQ Reg1 Reg2 Reg3 FA Load Load Load B[2] A[2] Cin A SHLin Cout SHRin A Z[1] DQ DQ DQ Reg1 Reg2 Reg3 FA Load Load Load B[1] A[1] Cin A SHLin Cout SHRin A Z[0] DQ DQ DQ Reg1 Reg2 Reg3 FA Load Load Load B[0] A[0] Cin A SHLin CarryIn ShiftRightOut ShiftLeftIn 4007 Bit Slicing • Distributed Multiplexing • Local Multiplexing F[2:0] LDr1ENr1A ENr1B LDr2ENr2A ENr2B LDr3ENr3A ENr3B SHRin Cout Z FnAND FnORF[1:0] FnXOR FnADD FnSHR FnSHL LDr1ENr1A ENr1B LDr2ENr2A ENr2B LDr3ENr3A ENr3B DQ DQ DQ Cout SHRin A Reg1 Reg2 Reg3 Z F[2] FA Load Load Load DQ DQ DQ Reg1 Reg2 Reg3 FA Load Load Load B B A A A Cin Cin A SHLin 4008 Bit Slicing - Example C Z B[3] Y[3] A[3] FA B[2] Y[2] A[2] FA B[1] Y[1] A[1] FA B[0] Y[0] A[0] HA Fn 4009 Bit Slicing - Example 1 Bit ALU (no consideration of bitslicing) CZ B[0] Y[0] A[0] HA Fn Bitslice Design for ALU C nZ B Y A FA CinFn nZin 4010 Bit Slicing - Example C Z B[3] Y[3] A[3] FA B[2] Y[2] A[2] FA B[1] Y[1] A[1] FA B[0] Y[0] A[0] FA Fn 4011 Bit Slicing CV Y[3] Z B[3] Y[3] A[3] FA B[2] Y[2] A[2] FA B[1] Y[1] A[1] FA B[0] Y[0] A[0] FA Fn 4012 Bit Slicing Original Bitslice Design for ALU Modiﬁed Bitslice Design for ALU • Wiring is arranged so that no over cell routing is required. C nZ B CYCinFn nZ B Y Y A FA A FA CinFn nZin Cin Fn nZin 4013 Bit Slicing CV Fn Y[3] Z B[3] Y[3] A[3] FA * * B[2] Y[2] A[2] FA B[1] * * Y[1] A[1] FA B[0] * * Y[0] A[0] FA Fn *a number of wires are left unconnected 4014 Bit Slicing Single Bit Shift ASin ShiftCYCin Fn nZ B Y FA A A Shift Cin Fn nZin This simple ALU bitslice now supports 4 functions • Y = A + B • Y = A | B • Y = (A >> 1) + B • Y = (A >> 1) | B 4015 Bit Slicing Shift CV Fn Y[3] Z B[3] Y[3] FA A[3] B[2] Y[2] FA A[2] B[1] Y[1] FA A[1] B[0] Y[0] FA A[0] Shift Fn 4016 Bit Slicing Multi-Bit Shift This barrel shifter bitslice includes wiring for 4 bit right shift, wiring for 2 bit right shift and wiring for 1 bit right shift. X1 X2 X3ASin X4 BSin CSin Y A A X1 X2 X3RShift[2] BC X4 RShift[1] RShift[0] • Y = A >> RShift[2:0] 4017 Bit Slicing Y[4] A[4] Y[3] A[3] Y[2] A[2] Y[1] A[1] Y[0] A[0] RShift[2] RShift[1] RShift[0] 4018 Bit Slicing Multi-Bit Rotate By providing extra feedthrough paths we can create a bitslice for multi-bit rotation. Xa Xb Xc XdX1 X2 X3 ASin Xe XfX4 BSin Xg CSin Y A Xa Xb Xc XdA X1 X2 X3RRotate[2] Xe Xf B X4RRotate[1] Xg C RRotate[0] N.B. Most single bit rotate functions will rotate through the carry. This barrel rotation, based on the SPARC implementation, does not make use of the carry for input or output. Rotate Through Carry 76 5 4 3 2 1 0 C Rotate Avoiding Carry 76 5 4 3 2 1 0 4019 Bit Slicing Y[4] A[4] Y[3] A[3] Y[2] A[2] Y[1] A[1] Y[0] A[0] RRotate[2]RRotate[1] RRotate[0] 4020 Bit Slicing 1 bit ALU Reg1 Reg2 Reg3 1 bit ALU Reg1 Reg2 Reg3 1 bit ALU Reg1 Reg2 Reg3 Cout 4 bit Carry P P3 FA 1 bit ALU Lookahead Unit Reg1 Reg2 Reg3 G G3 1 bit ALU Reg1 Reg2 Reg3 P P2 1 bit ALU Reg1 Reg2 Reg3 FA G G2 1 bit ALU Reg1 Reg2 Reg3 4 bit Carry CLA Lookahead Unit Unit 1 bit ALU Reg1 Reg2 Reg3 FA P P1 1 bit ALU Reg1 Reg2 Reg3 G G1 1 bit ALU Reg1 Reg2 Reg3 P P0 1 bit ALU Reg1 Reg2 Reg3 FA 4 bit Carry G G0 Cin 1 bit ALU Lookahead Unit Reg1 Reg2 Reg3 • Bitslice Exceptions Where full bitslicing is not suitable we attempt to disrupt the bitslice as little as possible. 4021.

Bit Slicing Repetitive Logic

System Design for a Computational-RAM Logic-In-Memory Parailel-Processing Machine

Block Ciphers: Fast Implementations on X86-64 Architecture

An FPGA-Accelerated Embedded Convolutional Neural Network

Advanced Synthesis Cookbook

Usuba, Optimizing Bitslicing Compiler Darius Mercadier

Fpnew: an Open-Source Multi-Format Floating-Point Unit Architecture For

Formally Verified Big Step Semantics out of X86-64 Binaries

Study of the Posit Number System: a Practical Approach

SSE Implementation of Multivariate Pkcs on Modern X86 Cpus

High-Throughput Elliptic Curve Cryptography Using AVX2 Vector Instructions

Floating-Point Formats.Pdf

8 General Purpose Machines