FPGA Technology in Beam Instrumentation and Related Tools

FPGA technology in beam instrumentation and related tools Javier Serrano CERN, Geneva, Switzerland DIPAC 2005. Lyon, France. 7 June 2005. Plan of the presentation z FPGA architecture basics z FPGA design flow z Performance boosting techniques z Doing arithmetic with FPGAs z Example: RF cavity control in CERN’s Linac 3. DIPAC 2005. Lyon, France. 7 June 2005. A preamble: basic digital design Clk [31:0] DataInB[31:0] [31:0] D[31:0] Q[31:0] [31:0] D[0] Q[0] [31:0] 0 dataBC[31:0] dataSelectC [31:0] [31:0] [31:0] D[31:0] Q[31:0] [31:0] DataOut[31:0] DataSelect [31:0] 1 DataOut[31:0] DataOut_3[31:0] [31:0] High clock rate: [31:0] + [31:0] sum[31:0] DataInA[31:0] [31:0] D[31:0] Q[31:0] 144.9 MHz on a [31:0] [31:0] 6.90 ns dataAC[31:0] Xilinx Spartan IIE. DataSelect D[0] Q[0] D[0] Q[0] dataSelectC dataSelectCD1 Clk [31:0] D[31:0] Q[31:0] [31:0] 0 [31:0] [31:0] [31:0] [31:0] DataInB[31:0] [31:0] D[31:0] Q[31:0] [31:0] [31:0] D[31:0] Q[31:0] [31:0] DataOut[31:0] dataACd1[31:0] [31:0] 1 dataBC[31:0] DataOut[31:0] DataOut_3[31:0] Higher clock rate: [31:0] [31:0] + [31:0] D[31:0] Q[31:0] [31:0] DataInA[31:0] [31:0] D[31:0] Q[31:0] [31:0] 151.5 MHz on the [31:0] [31:0] sum_1[31:0] sum[31:0] dataAC[31:0] 6.60 ns same chip. DIPAC 2005. Lyon, France. 7 June 2005. FPGA internal architecture 1/3 CLB: Configurable Logic Block DLL: Delay Locked Loop Example: Xilinx Spartan-IIE family architecture DIPAC 2005. Lyon, France. 7 June 2005. FPGA internal architecture 2/3 Simplified view of Spartan-IIE CLB Slice (two identical slices inside each CLB) Members of the Spartan-IIE family range from the XC2S50E (16*24=384 CLBs) to the XC2S600E (48*72=3456 CLBs). DIPAC 2005. Lyon, France. 7 June 2005. FPGA internal architecture 3/3 Other design resources in modern FPGAs: z Clock control blocks (DLL or PLL). z Fast differential signaling support (LVDS, LVPECL,…). z Fast hard-wired DSP blocks made of multipliers and accumulators. z High speed external RAM interfacing, plus lots of internal RAM. z Multi gigabit transceivers (useful for global orbit feedback). z Embedded CPU cores (PowerPC, ARM,…). z Digitally Controlled Impedance active I/O termination. DIPAC 2005. Lyon, France. 7 June 2005. FPGA vs. DSP chips DSP FPGA Data In Data In Register Reg0 Reg1 Reg255 C0 C1 C255 X X X X + + Data Out MAC Data Out Loop 256 times per Data In Virtex-4SX55: 512 MAC units sample for a 256 tap FIR filter. @ 500 MHz = 256 GMAC/s ! DIPAC 2005. Lyon, France. 7 June 2005. FPGA design flow Design Entry DummyOut <= DummyInA when Selector='1' else DummyInB; Behavioral Selector simulation DummyInB 0 DummyOut DummyInA 1 DummyOut RTL View Synthesis IBUF I O Selector Selector_ibuf Place and Route IBUF LUT3_AC OBUF I O DummyInB 0 I O DummyOut 1 DummyInB_ibuf DummyOut_obuf DummyOut IBUF I O Post P&R DummyInA simulation DummyInA_ibuf Technology view DIPAC 2005. Lyon, France. 7 June 2005. FPGA flow: P&R results DIPAC 2005. Lyon, France. 7 June 2005. FPGA flow: floorplanning myCounter0: process(Reset(0), Clk) begin if Reset(0)='1' then counter0 <= (others=>'0'); elsif Clk'event and Clk='1' then counter0 <= counter0 + 1; end if; end process myCounter0; DIPAC 2005. Lyon, France. 7 June 2005. Increasing performance 1/5 Buffering z Delay in modern designs can be as much as 90% routing, 10% logic. Routing delay is due to long nets + capacitive input loading. z Buffering is done automatically by most synthesis tools and reduces the fan out on affected nets: net2 net1 net2 net1 net3 Before buffering After buffering DIPAC 2005. Lyon, France. 7 June 2005. Increasing performance 2/5 Replicating registers (and associated logic if necessary) Consumer 1 Consumer 1 Consumer 2 Consumer 2 Producer Producer Consumer 3 Consumer 3 Consumer 4 Consumer 4 Before After DIPAC 2005. Lyon, France. 7 June 2005. Increasing performance 3/5 Retiming (a.k.a. register balancing) LargeLarge Small combinatorialcombinatorial Small Delay logiclogic delay delay Delay Before BalancedBalanced BalancedBalanced delaydelay delaydelay After DIPAC 2005. Lyon, France. 7 June 2005. Increasing performance 4/5 Pipelining LargeLarge combinatorialcombinatorial logiclogic delay delay Before SmallSmall SmallSmall SmallSmall delaydelay delaydelay delaydelay After DIPAC 2005. Lyon, France. 7 June 2005. Increasing performance 5/5 Time multiplexing De-multiplexer Multiplexer 50 MHz 50 MHz 50 MHz 50 MHz 50 MHz 50 MHz logic logic logic logic logic logic Data In Data Out 50 MHz 50 MHz 50 MHz 50 MHz 50 MHz 50 MHz 100 MHz logic logic logic logic logic logic 50 MHz DIPAC 2005. Lyon, France. 7 June 2005. An example 1/2 Boosting performance of an IIR filter Simple first order IIR: y[n+1] = ay[n] + b x[n] Problem found in the phase filter of a PLL used to track bunch frequency in CERN’s PS b y X + Z-1 x X a Performance bottleneck in the feedback path DIPAC 2005. Lyon, France. 7 June 2005. An example 2/2 Boosting performance of an IIR filter Look ahead scheme: From y[n+1] = ay[n] + b x[n] we get y[n+2] = ay[n+1] + bx[n+1] = a2y[n] + abx[n] + bx[n+1] x Now we have two clock ticks for the ab b X X feedback! y Z-1 + Z-1 + Z-2 X FIR filter (can be pipelined a2 to increase throughput) DIPAC 2005. Lyon, France. 7 June 2005. Performing arithmetic in FPGAs 1/2 z Binary adders: made of N full adders, each implementing: z sk = xk XOR yk XOR ck z ck+1 = (xk AND yk) OR (xk AND ck) OR (yk AND ck) z Easy to pipeline. z Multipliers: hardwired (if your chip has them) or “pencil and paper”: X is successively shifted by k positions. Then, whenever a = 1, X2k is N −1 k k accumulated. These multipliers can be P= A ⋅ X∑ = k 2 a X k =0 pipelined, as opposed to the hardwired variety. DIPAC 2005. Lyon, France. 7 June 2005. Performing arithmetic in FPGAs 2/2 z Dividers: pencil and paper method. Start with an empty auxiliary register B N −1 k and start shifting bits from A into it (right ∑ ak 2 to left). Whenever B-X is positive, A k =0 Q= = replace B with B-X. After every shift we X X get a bit of the quotient: 0 if B-X is negative, 1 otherwise. z Keep in mind that these are good solutions when both operands are variable. Example with one fixed operand: 0.5625a=9a/16=a/2 + a/16. Used at CERN to get baseline from BPM signal through lossy integrator. z Sin, cos, sinh, cosh, atan, atanh, square root and vector rotation: CORDIC. DIPAC 2005. Lyon, France. 7 June 2005. Distributed Arithmetic (DA) 1/2 Digital Signal Processing is about sums of products: N −1 y=∑ c[][] n ⋅ x n n=0 c[n] constant (prerequisite to use DA) Let’s assume: x[n] input signal B bits wide N −1 B−1 ⎛ ⎞ xb[n] is bit number b Then: b y=∑∑[ c⎜ ] n ⋅b [ x ] ⋅ n⎟ 2 of x[n] (either 0 or 1) n=0 ⎝ b=0 ⎠ B−1 N −1 And after some b ⎛ ⎞ y=2∑∑ ⋅⎜c [ n ] ⋅b x [⎟ n ] rearrangement of terms: b=0 ⎝ n=0 ⎠ This can be implemented with an N-input LUT DIPAC 2005. Lyon, France. 7 June 2005. Distributed Arithmetic (DA) 2/2 B−1 N −1 b ⎛ ⎞ y=2∑∑ ⋅⎜c [ n ] ⋅b x [⎟ n ] b=0 ⎝ n=0 ⎠ xB[0] …… x1[0] x0[0] xB[1] …… x1[1] x0[1] y LUT + Register …….... …….... …….... xB[N-1] …… x1[N-1] x0[N-1] 2-1 DIPAC 2005. Lyon, France. 7 June 2005. COordinate Rotation DIgital Computer (CORDIC) 1/2 General vector rotation: 'x = x ⋅ cosφ − y ⋅sinφ y'= y ⋅ cosφ + xsin ⋅ φ Rearranging: x'= cosφ[] x − ytan ⋅ φ y'= cosφ[] y + xtan ⋅ φ We restrict rotation angles to be: tanφ= ±−i 2 The cosine can be treated as a cosδ()=cos ( − δ ) constant since: i i −i xi=+1 K i[] x i − i y ⋅ i2 d ⋅ Giving the CORDIC equations: −i yi=+1 K i[] y i + i x ⋅ i2 d ⋅ cosK = () arctan−i = 1 2 With: i 1+ 2−2i d=i ±1 DIPAC 2005. Lyon, France. 7 June 2005. CORDIC 2/2 z Two working modes: z Rotation mode: rotates the input vector by a specified angle given as an argument. z Vectoring mode: rotates the vector until it aligns with the x axis while recording the angle required to make that rotation. z Usage examples: z To compute (ρ,φ) from (x,y) (polar to cartesian transformation) feed (x,y) to the CORDIC rotator in vectoring mode, then find the results in x and the phase accumulator. z To compute sin φ, feed (x=1,y=0) to the CORDIC in rotation mode, then find the result in y. DIPAC 2005. Lyon, France. 7 June 2005. Case study: low level RF cavity control in CERN’s Linac 3 1/4 Klystron amplifier CAVITY LRFSC card Forward Pickup Reflected Cavity Q Set Points from Control Room I DIPAC 2005. Lyon, France. 7 June 2005. Case study: low level RF cavity control in CERN’s Linac 3 2/4 80 MHz LO 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 X 20 MHz LPF 0 -0.2 -0.2 -0.4 -0.4 -0.6 -0.6 -0.8 -0.8 -1 Mixer -1 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Sampling the 20 MHz at 100 MHz exactly 4 times its from cavity frequency produces I, Q, -I, -Q, I, Q… DIPAC 2005.

FPGA Technology in Beam Instrumentation and Related Tools

3.2 the CORDIC Algorithm

CORDIC-Like Method for Solving Kepler's Equation

CORDIC V6.0 Logicore IP Product Guide

A Review on Hardware Accelerator Design and Implementation of CORDIC Algorithm for a Gaming Application

A.1 CORDIC Algorithm

An Optimization of CORDIC Algorithm and FPGA Implementation

A Unified Reconfigurable CORDIC Processor for Floating-Point Arithmetic

Design and Analysis of Double Precision Floating Point Division Operator Based on CORDIC Algorithm

A Highly Optimized Arithmetic Software Library and Hardware Co

A Trigonometric Hardware Acceleration in 32-Bit RISC-V

Cordic Algorithm and Its Applications In

CORDIC-Based LMMSE Equalizer for Software Defined Radio