<<

Hardware Implementations of Fixed-Point Atan2

Hardware Implementations of Fixed-Point Atan2

Florent de Dinechin Matei I¸stoan

Universit´ede Lyon, INRIA, INSA-Lyon, CITI-Lab

ARITH22

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 Hardware Implementations of Fixed-Point Atan2 Introduction: Methods for computing Atan2 Methods for Computing atan2 in Hardware

Yet another arithmetic . . . • . . . that is useful in telecom (to recover the phase of a signal) (12–24 bits of precision) • . . . and in general for cartesian to polar coordinate transformation • and an interesting function, nonetheless

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.8 1 0.6 0.8 0.4 x 0.6 0.4 0.2 0.2 y 0 0

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 2 / 24 Hardware Implementations of Fixed-Point Atan2 Introduction: Methods for computing Atan2 Common Specification

• target function y 1 (x, y) 1 y α = atan2(y, x) f (x, y) = arctan ( ) π x −1 1 x

• input: fixed-point format −1 −1 0 1 ky y arctan ( ) = arctan ( ) [ ) kx x • output: fixed-point format and binary angles y (0, 1) π 2 −1 0 1 (−1, 0) π 0 (1, 0) −π x =⇒ [ )

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 3 / 24 Hardware Implementations of Fixed-Point Atan2 Introduction: Methods for computing Atan2 Common Specification

• target function y 1 (x, y) 1 y α = atan2(y, x) f (x, y) = arctan ( ) π x −1 1 x

• input: fixed-point format −1 −1 0 1 ky y arctan ( ) = arctan ( ) [ ) kx x • output: fixed-point format and binary angles y (0, 1) π 2 −1 0 1 (−1, 0) π 0 (1, 0) −π x =⇒ [ )

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 3 / 24 Hardware Implementations of Fixed-Point Atan2 Introduction: Methods for computing Atan2 Common Specification

• target function y 1 (x, y) 1 y α = atan2(y, x) f (x, y) = arctan ( ) π x −1 1 x

• input: fixed-point format −1 −1 0 1 ky y arctan ( ) = arctan ( ) [ ) kx x • output: fixed-point format and binary angles y (0, 1) π 2 −1 0 1 (−1, 0) π 0 (1, 0) −π x =⇒ [ )

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 3 / 24 Hardware Implementations of Fixed-Point Atan2 Introduction: Methods for computing Atan2 Common Specification

• target function y 1 (x, y) 1 y α = atan2(y, x) f (x, y) = arctan ( ) π x −1 1 x

• input: fixed-point format −1 −1 0 1 ky y arctan ( ) = arctan ( ) [ ) kx x • output: fixed-point format and binary angles y (0, 1) π 2 −1 0 1 (−1, 0) π 0 (1, 0) −π x =⇒ [ )

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 3 / 24 Target platform: FPGAs (Field Programmable Gate Arrays)

Hardware Implementations of Fixed-Point Atan2 Introduction: Methods for computing Atan2 A Meaningful Comparison

3 different methods for evaluating atan2 in hardware • same accuracy specification: f (x, y) computed with last-bit accuracy (faithful rounding) • same implementation effort

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 4 / 24 Hardware Implementations of Fixed-Point Atan2 Introduction: Methods for computing Atan2 A Meaningful Comparison

3 different methods for evaluating atan2 in hardware • same accuracy specification: f (x, y) computed with last-bit accuracy (faithful rounding) • same implementation effort

Target platform: FPGAs (Field Programmable Gate Arrays)

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 4 / 24 Hardware Implementations of Fixed-Point Atan2 Introduction: Methods for computing Atan2 Hello FPGAs!

Island-style homogeneous FPGAs

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 5 / 24 Hardware Implementations of Fixed-Point Atan2 CORDIC First Method: An Unrolled CORDIC

  x0 = x y0 = y  α0 = 0

 −i xi+1 = xi − 2 si yi  −i yi+1 = yi + 2 si xi −i  αi+1 = αi − si arctan 2

 p 2 2  xn −→ K x + y yn −→ 0  y αi −→ arctan x

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 6 / 24 =⇒ p = w − 1 − dlog2εw−1e bits for the xi and yi datapath • we can stop updating xi when 2i − 1 > p (unrolled operator)

=⇒ gα = 1 + dlog2((w − 1) × 0.5)e guard bits for the αi datapath

Hardware Implementations of Fixed-Point Atan2 CORDIC CORDIC Iteration: Datapath Implementation

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 7 / 24 =⇒ p = w − 1 − dlog2εw−1e bits for the xi and yi datapath • we can stop updating xi when 2i − 1 > p (unrolled operator)

=⇒ gα = 1 + dlog2((w − 1) × 0.5)e guard bits for the αi datapath

Hardware Implementations of Fixed-Point Atan2 CORDIC CORDIC Iteration: Datapath Implementation

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 7 / 24 =⇒ gα = 1 + dlog2((w − 1) × 0.5)e guard bits for the αi datapath

Hardware Implementations of Fixed-Point Atan2 CORDIC CORDIC Iteration: Accurate Datapath Implementation

=⇒ p = w − 1 − dlog2εw−1e bits for the xi and yi datapath • we can stop updating xi when 2i − 1 > p (unrolled operator)

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 7 / 24 Hardware Implementations of Fixed-Point Atan2 CORDIC CORDIC Iteration: Accurate Datapath Implementation

=⇒ p = w − 1 − dlog2εw−1e bits for the xi and yi datapath • we can stop updating xi when 2i − 1 > p (unrolled operator)

=⇒ gα = 1 + dlog2((w − 1) × 0.5)e guard bits for the αi datapath

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 7 / 24 Hardware Implementations of Fixed-Point Atan2 CORDIC Hello, again, FPGAs!

Current heterogeneous FPGAs

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 8 / 24 Hardware Implementations of Fixed-Point Atan2 Recip-Mult-Atan Polynomial Approximations

Polynomial approximation, and their derivatives (bipartite etc.):

• the straight-forward solution for implementing univariate functions • problem: area asymptotically exponential in the input width... for a bivariate function, we double the input width. • solutions: - range reduction? - multiple consecutive one-input functions?

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 9 / 24 Hardware Implementations of Fixed-Point Atan2 Recip-Mult-Atan Polynomial Approximations

Polynomial approximation, and their derivatives (bipartite etc.):

• the straight-forward solution for implementing univariate functions • problem: area asymptotically exponential in the input width... for a bivariate function, we double the input width. • solutions: - range reduction? - multiple consecutive one-input functions?

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 9 / 24 Hardware Implementations of Fixed-Point Atan2 Recip-Mult-Atan Polynomial Approximations

Polynomial approximation, and their derivatives (bipartite etc.):

• the straight-forward solution for implementing univariate functions • problem: area asymptotically exponential in the input width... for a bivariate function, we double the input width. • solutions: - range reduction? - multiple consecutive one-input functions?

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 9 / 24 Hardware Implementations of Fixed-Point Atan2 Recip-Mult-Atan 1 The x and arctan (x) Functions

y 1 arctan ( ) = arctan (y × ) x x

reciprocal function arctangent function

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 10 / 24 Hardware Implementations of Fixed-Point Atan2 Recip-Mult-Atan Range Reductions – Symmetry and Parity

y

x

y  π |x| arctan = − − arctan x 2 |y|

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 11 / 24 Hardware Implementations of Fixed-Point Atan2 Recip-Mult-Atan Range Reductions – Scaling

2s y  y  arctan = arctan |x| |y| 2s x x

bitwise OR y 1 LZC

s = 1 s s = 2 s = 0 ShiftX ShiftY normalized s = 3 domain

xr yr 0 x 0 1

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 12 / 24 Hardware Implementations of Fixed-Point Atan2 Recip-Mult-Atan 1 The x and arctan (x) Functions – Reduced Domain

reciprocal function on [0.5, 1) arctangent function on [0, 1)

Now we can evaluate them with tables, or multipartite tables, or polynomial approximators, etc.

(all available as faithful FloPoCo operators)

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 13 / 24 Hardware Implementations of Fixed-Point Atan2 Recip-Mult-Atan Reciprocal-Multiply-Arctangent

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 14 / 24 Hardware Implementations of Fixed-Point Atan2 Recip-Mult-Atan Reciprocal-Multiply-Arctangent

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 14 / 24 Hardware Implementations of Fixed-Point Atan2 Recip-Mult-Atan Reciprocal-Multiply-Arctangent

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 14 / 24 Hardware Implementations of Fixed-Point Atan2 Recip-Mult-Atan Reciprocal-Multiply-Arctangent: Datapath Dimensioning

−w Goal: minimize architecture cost, such that |εtotal | < ulp = 2

1 1 |εtotal | < 3 |εrecip| + 3 |εmult | + |εatan|

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 14 / 24 Hardware Implementations of Fixed-Point Atan2 Recip-Mult-Atan Delays, Delays, Delays

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 15 / 24 Hardware Implementations of Fixed-Point Atan2 Bi-variate Polynomial Approximations y The arctan (x ) Function

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.8 1 0.6 0.8 0.4 x 0.6 0.4 0.2 0.2 y 0 0

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 16 / 24 Hardware Implementations of Fixed-Point Atan2 Bi-variate Polynomial Approximations First Order Bi-variate Polynomial Approximation

α ≈ T1(x, y) = ax + by +

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 17 / 24 Hardware Implementations of Fixed-Point Atan2 Bi-variate Polynomial Approximations First Order Bi-variate Polynomial Approximation

α ≈ T1(x, y) = ax + by + c

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 17 / 24 −w Goal: |εtotal | < ulp = 2

εtotal = εmeth + εrnd + εfinal rnd

Hardware Implementations of Fixed-Point Atan2 Bi-variate Polynomial Approximations Second Order Bi-variate Polynomial Approximation

α ≈ T2(x, y) = ax +by +c

+dx 2 + ey 2 + fxy

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 18 / 24 Hardware Implementations of Fixed-Point Atan2 Bi-variate Polynomial Approximations Second Order Bi-variate Polynomial Approximation

α ≈ T2(x, y) = ax +by +c

+dx 2 + ey 2 + fxy

−w Goal: |εtotal | < ulp = 2

εtotal = εmeth + εrnd + εfinal rnd

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 18 / 24 Hardware Implementations of Fixed-Point Atan2 Comparisons Comparisons: Logic-only Synthesis Bitwidth LUT Latency (ns) CORDIC 8 173 9.3 12 435 14.6 16 734 19.7 24 1504 31.0 32 2606 43.1 Bitwidth LUT Latency (ns) Bitwidth LUT Latency (ns) Taylor degree 1 Taylor degree 2 8 207 12.64 8 356 13.72 12 1258 14.74 12 469 14.75 16 37744 20.20 16 1509 17.90 Bitwidth Method LUT Latency (ns) RecipMultAtan 8 degree 0 175 11.8 12 degree 0 683 16.2 12 degree 1 443 19.0 16 degree 1 1049 19.1 24 degree 2 2583 35.2 32 degree 2 6190 40.7 32 degree 3 5423 50.8 Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 19 / 24 Hardware Implementations of Fixed-Point Atan2 Comparisons Logic-only Synthesis: Area

Taylor 1 Taylor 2 CORDIC 104 RecipMultAtan LUTs 103

8 12 16 Bitwidth

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 20 / 24 Hardware Implementations of Fixed-Point Atan2 Comparisons Logic-only Synthesis: delay

CORDIC Taylor 1 Taylor 2 41.3 RecipMultAtan

31 Latency (ns) 19.7 14.6 9.3

8 12 16 24 32 Bitwidth

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 21 / 24 Hardware Implementations of Fixed-Point Atan2 Comparisons Comparisons: 16-bit Pipelined Architectures

Method LUT + Reg. BRAM + DSP Speed cycles@freq. 816 + 44 2@191 CORDIC 799 + 202 0+0 5@274 796 + 336 8@389 320 + 51 2@112 RecipMultAtan 1 2+1 315 + 68 3@199 425 + 199 10@130 RecipMultAtan 2 0+5 432 +250 14@253 331 + 53 1@135 Taylor degree 2 327 + 103 4+6 3@144 329 + 140 5@220

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 22 / 24 Hardware Implementations of Fixed-Point Atan2 Comparisons Conclusions

Very unlike ”Fixed-Point on FPGAs“ (in HEART 2013) CORDIC? • efficient • scales well

Polynomial Approximations? • limited by memory requirements • no unique optimal solution • Could it be saved by better bivariate polynomial approximation?

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 23 / 24 Hardware Implementations of Fixed-Point Atan2 Comparisons Questions?

Thank you for your attention! Questions?

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 24 / 24 Hardware Implementations of Fixed-Point Atan2 Comparisons Hidden Frame

Hidden Frame

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 Hardware Implementations of Fixed-Point Atan2 Comparisons First Method: CORDIC

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 Hardware Implementations of Fixed-Point Atan2 Comparisons The CORDIC Iteration: Datapath Dimensioning

The xi datapath: −i x xei+1 = xei − si 2 yei + ui x −i y x = xi + εi − si 2 (yi + εi ) + ui x −i y x = xi+1 + εi − si 2 εi + ui x • using the error bound εi for εi and y x −p εi , and ui < 2

−i −p εi+1 = εi (1 + 2 ) + 2

=⇒ p = w − 1 − dlog2εw−1e bits for the xi and yi datapath

• we can stop updating xi when 2i − 1 > p (useful because unrolled operator)

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 Hardware Implementations of Fixed-Point Atan2 Comparisons The CORDIC Iteration: Datapath Dimensioning (2)

The αi datapath:

−pα−1 • εatan(2−i ) = 2 (or 0.5 ulp on the pα precision) −w • εfinal round = 2

=⇒ gα = 1 + dlog2((w − 1) × 0.5)e extra guard bits needed

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 Hardware Implementations of Fixed-Point Atan2 Comparisons Bi-variate Polynomial Approximations: Datapath Dimensioning

−w Goal: |εtotal | < ulp = 2 εtotal = εmeth + εfinal rnd + εrnd

• method error: εmeth • due to neglected terms of Taylor series • w+1 constraint on k =⇒ k ≥ d 3 e • rounding errors: εrnd = εaδx + εbδy + εc + εdδx 2 + εeδy 2 + εf δxδy

• depends on δx, δy =⇒ εround depends on k −w−1 • εmethod + εround < 2 =⇒ number of guard bits g ⇐⇒ compromise between size of tables and size of multipliers

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22 Hardware Implementations of Fixed-Point Atan2 Comparisons y The arctan (x ) Function

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.8 1 0.6 0.8 0.4 x 0.6 0.4 0.2 0.2 y 0 0

Florent de Dinechin, Matei I¸stoan Hardware Implementations of Fixed-Point Atan2 ARITH22