C.-S. Peng et al.: CORDIC-Based Architecture with Channel State Information for OFDM Baseband Receiver 403 CORDIC-Based Architecture with Channel State Information for OFDM Baseband Receiver Chia-Sheng Peng, Student Member, IEEE, Yuan-Shin Chuang, and Kuei-Ann Wen, Senior Member, IEEE

Abstract — An efficient architecture for OFDM baseband sub-carriers and seriously destroy quadrature property of receiver based on coordinate rotation digital computer QAM signals, especially for higher-order QAM signals. (CORDIC) algorithm is proposed with channel state Multipath channels, on the other hand, generate channel information (CSI). Two dual-mode CORDIC modules are response including amplitude and phase. Therefore it is designed for synchronization and equalization. A modified necessary to estimate and compensate these effects by demapping method using CSI helps to provide sub-channel synchronization and equalization [5]. But under frequency- status, and therefore decreases packet error rates especially selective fading environment, some sub-carriers are probably for some sub-channels with extremely low SNR. A combined in deep fading and result in extremely low SNR. Most part of algorithm suitable for CORDIC is proposed for not only the error bits results from these sub-carriers and obviously estimate and compensation of channels but synchronization increase packet error rate (PER) in packet-based transmission. for carrier frequency offset and sampling clock offset. Using channel state information (CSI) to decrease PER under Allocation, timing analysis and complexity for all functional such environment has been proposed for WLAN [6]-[7]. blocks in the receiver are proposed, including front-end processing, FFT, inner receiver, and outer receiver. Complete CORDIC algorithm realizes conversions between tests for packet error rate are simulated under an integrated rectangular and polar coordinates, and many other useful platform considering of RF front-end non-ideal parameters, applications [8]. Some methods use CORDIC for frequency filters, quantization, and channel models. Simulation results of offset compensation and FFT calculation [9]. In this paper two practical circuits on AWGN and channel models are presented dual-mode CORDIC modules are designed before and after and prove the improvement of the receiver. The design FFT: the former serves for calculation and correction for CFO, occupies about 424k equivalent gate count and 7.3 mm2 core and the latter for the algorithm unifying channel estimation, size in 0.18-µm CMOS. phase tracking and equalization. For performance improvement under multipath fading channels, a new Index Terms — OFDM, LAN, CORDIC, channel state information demapping rule is proposed that reduces hardware complexity of the equalizer and employs CSI to achieve 3 to 6 dB improvement for different data rates. Unlike some methods I. INTRODUCTION adopting sub-channel power as CSI calculation in the decoder Orthogonal Frequency Division Multiplexing (OFDM) [7], the method employs sub-channel amplitude in the technique has many advantages to overcome multipath effects modified demapping to eliminate the use of complex-valued and narrowband interference by using guard intervals and divisions from the equalizer. Under some special cases of splitting sub-carriers; moreover, easily working together with severer channel responses 18 dB improvement (from 50 dB to interleaving and coding techniques, OFDM-based systems 32 dB CNR required) for 64QAM with 3/4 coding rate can be substantially improve performance under fading channels [1]. achieved. Besides, for full packet transmissions the paper also Therefore, more and more wireless systems adopt coded proposes the designs of frame detection, automatic gain OFDM as transmission techniques, for example, WLAN IEEE control (AGC), and symbol timing detection. 802.11a/g and ETSI DVB-T [2]-[3]. However, OFDM-based A software simulation platform is also introduced that systems are sensitive to channels, carrier frequency offset integrates baseband, RF, analog components, and channel (CFO) and sampling clock offset (SCO); other effects caused models. On the platform, non-ideal parameters, channels and from RF front-end non-ideal properties, such as phase noise, noise can be co-simulated with the proposed baseband transceiver. Simulation results for PER of practical circuits on This work was sponsored jointly by the Ministry of Education and the AWGN and channel models are also proposed to verify the National Science Council, Taiwan under the contract number NSC-93-2220- E-009-011. improvement of using CSI method. Allocation, timing analysis The authors are with the Department of Electronics Engineering, National and complexity for all functional blocks in the receiver are Chiao Tung University, Hsinchu, 300, Taiwan presented, including front-end processing, FFT, inner receiver, (E-mail: [email protected], [email protected], [email protected]) and outer receiver. The RAM-free design makes the baseband more convenient as part of system-on-a-chip and the PA non-linearity, I/Q mismatch, etc, also degrade quality of total area occupies about 424k equivalent gate count and 7.3 transmission [4]. Among these effects, phase errors caused by mm2 core size in 0.18-µm CMOS. CFO, SCO and phase noise (PN) result in phase rotation for all

Contributed Paper Manuscript received January 12, 2005 0098 3063/05/$20.00 © 2005 IEEE 404 IEEE Transactions on Consumer Electronics, Vol. 51, No. 2, MAY 2005

Inner Receiver ˆ Gain Frame A k to VGA Frame detect Filtered Detection Channel Estimate and AGC RSSI φˆ k

r CORDIC-Assistant z Y I/Q n n Radix-8 k ,l CORDIC-Based ADC SRRCF CFO Acq. And Equalizer Compensation FFT z θˆ trigger l,n k ,l

Symbol Timing Symbol Timing Pilot-Based Detection (STD) Controller (STC) Phase Tracking timing

De- ˆ P k ,l Output scrambler ˆ bits Dˆ Yk,l Fig. 1. Packet format of the OFDM system Viterbi De- De- k ,l Data and Pilots Decoder interleave mapping Extractor Header Header Check data rate II. DESIGN OF OFDM RECEIVER Info. Outer Receiver The system uses a packet-based method that a packet is transmitted with preamble, header and payload data as shown Fig. 2. Functional blocks of the proposed OFDM receiver in Fig. 1. The preamble signals, including 10 short and 2 long preambles, are used for frame detection, gain control, CFO compensation, symbol timing detection, and channel estimation. The header signal stores payload length and data rate, which indicates the modulation type and the coding rate. All payload data are partitioned into OFDM symbols and totally L in number. Each OFDM symbol occupies 4µ second and contains 4 pilots and 48 data symbols. Data symbols can be modulated with BPSK, QPSK, 16QAM or 64QAM, and coded with puncture-rate: 1/2, 2/3 and 3/4. The allocated bandwidth and the sampling is 20 MHz, and the data rates vary from 6-54 Mbits/sec. A. Functional blocks for OFDM receiver Fig. 2 indicates the functional blocks and signals of the proposed OFDM receiver according to the packet format above. The design includes an inner receiver and an outer receiver: the former is defined for signal processing such as synchronization and equalization, whereas the latter for logic processing such as deinterleaving and decoding [10]. Symbols transmitted recover from infection of non-ideal parameters and Fig. 3. (a) Decision statistic Mn in frame detection under different channel response, and the demapping as bridge from inner SNR (b) Adjusted power of input signal toward the target PAPR (c) Matched-filter output from STD receiver to outer receiver translates these symbols into a bit- level sequence. The outer receiver detects and corrects errors decibel Pn,dB by a look-up table (LUT) that compares with the bits from the sequence. peak power ratio of ADCs. The VGA gain to adjust is B. Front-end processing obtained by = + − Front-end processing includes functions in front of inner Gnext,dB G pre,dB Pn,dB PAPRt,dB (2) receiver: SRRCF, frame detection, AGC, and STD. The where PAPRt,dB is the target PAPR of input signal, and Gnext,dB complex-valued input signal after SRRCF is down-sampled and G are the next and the previous gain, respectively. and operates at 20 MHz with 10-b resolution. Because the pre,dB The adjustment of AGC is done twice so that Pn,dB is adjusted short preamble repeats every 16 samples, the frame detection toward the target PAPR. As shown in Fig. 3b there are many uses a sliding decision statistic that is calculated as different initial input gains that vary from about 40 to 55 dB, N 2 N 2  2  i.e., the input signal to ADCs is too small and then it will be = ⋅ *   (1) M n ∑ rn−N −16+i rn−N +i ∑ rn−N +i adjusted to the target value 9 dB. =  =  i 1 i 1 Because of real-valued property in frequency domain, the where N is the length of the correlator [11]. The sliding Mn long preambles are symmetric-conjugate and therefore STD retains statistical that varies with SNR as shown in Fig. 3a. can take half the long preamble, 32 taps in number, to Therefore the threshold of the design is decided according to implement a matched-filter that is designed with 2-b lower SNR value down to 3 dB. The denominator of (1) is coefficients. As Fig. 3c shows, there are two peaks at the end viewed as the power of input signal and can be translated into of two long preambles. The proposed STD traces the second C.-S. Peng et al.: CORDIC-Based Architecture with Channel State Information for OFDM Baseband Receiver 405

8 12 12 14 16/64-point 12 De- CSI 8 12 Correlator 8 14 14 Mux LFSR Dual-Mode Radix-8 Dual-Mode 12 8 De- 14 CORDIC 8 64-point 14 CORDIC Ik, l Aˆ 12 Shift Mux. k 8 Module FFT/IFFT 14 Module 12 Qk,l Registers CSI-based 4 12 12 12 12 Demapping

Mode θˆ Mode Select k,l φˆ Select CFO Phase Pilot/Data Ch. and Phase k,l Accumulator FFT/IFFT Select Tracking Select CFO Estimation Equalization and and Compensation Phase Tracking

Fig. 4. Architecture of CORDIC-based inner receiver

ψ peak and acquires the symbol timing, which will be sent to ADD xi / x STC shown in Fig. 2. Based on the symbol timing detected SUB 1 M µ x i+1 xi+1 i i -i u 2 -1 x M y from STD, STC selects samples and also overcome timing i Iterative Sign ADD u y i+1 ψ C x ϕ Unit / i drift caused by SCO because even a tiny SCO exists in a clock SUB m y + Sign ϕ i 1 ϕ -1 −i i+1 yi ϕ p i+1 ta n (2 ) source that drives ADCs or DACs, through a long period, the i ADD Sign mode / SUB symbol timing will lead or lag. STC in the design can tan-1 (2−i ) mode compensate the timing slide according to the estimated SCO in (a) (b) the phase tracking. ψ x B … N x out x U 0 B C. CORDIC-based architecture for the inner receiver B U F k U F N y 0 F y Iterative Iterative F Iterative N E F The inner receiver architecture is illustrated in Fig. 4 where Unit Unit … F Unit E R y 0 E k out R N there are two main functional blocks both based on dual-mode R … ϕ CORDIC modules. In front of FFT, CFO will be estimated and N compensated. The 64-point FFT transforms signals from time- ta n-1 ( 2− 0 ) mode ta n-1 ( 2− 1 ) ta n-1 ( 2−− (N 1 ) ) domain into frequency-domain. After FFT, channel response (c) will be estimated, filtered, and equalized; simultaneously, Fig. 5. (a) Structure of an iterative unit (b) Symbol of a single unit (c) phase error will be tracked and compensated. A novel Architecture of the pipeline dual-mode CORDIC module algorithm unifies equalization with phase tracking in polar- T T coordinate so that a dual-mode CORDIC module can deal with where i=0,1,..., N-1, [xi yi] is the input vector, [xi+1 yi+1] the all the operations in the algorithm by precise allocation. vector rotated, θi the practical rotative angle in this iteration, T 1) Design of dual-mode CORDIC module and µi decides the direction of the rotation. The vector [xi yi] The proposed dual-mode CORDIC is an iterative algorithm rotates clockwise if µi =1 whereas anticlockwise if µi = -1, as µi used to rotate a 2x1 vector with a certain angle (phase-rotation is given by mode), or to acquire amplitude and angle from a 2x1 vector −⋅sign( x ) sign ( y ) (rectangular-to-polar mode)  ii (4) (rectangular-to-polar mode). General methods to realize these µ =  i−1 i −−µθ ψ sign(∑ rr ) (phase-rotation mode) functions need LUTs, complex-valued multipliers, and  r=0 dividers. CORDIC reduces complexity in hardware by use of where ψ is the reference angle to rotate. As the simple components like adders, comparators and shifters. T converges at the last iteration, the vector [xi yi] is multiplied Besides, implementation of a CORDIC module is easily N −1 by the factor k =+112∏ −2i to maintain the same designed by pipeline because of the similar operations in the N i=0 iteration. The accuracy of the CORDIC module increases with amplitude of the input vector. larger iterative number N and ith iterative formula is defined as −i xx+ 12−µ  In phase-rotation mode the direction of the rotation is ii1 = i (3) −i  decided by comparison of the accumulated angle and its yy+ µ 21  ii1 i reference in each rotation. The vector rotates clockwise if the θ = −−1 i i tan (2 ) former is larger than the latter or else it rotates anticlockwise. In rectangular-to-polar mode the vector rotates to positive x- Contributed Paper Manuscript received January 12, 2005 0098 3063/05/$20.00 © 2005 IEEE 406 IEEE Transactions on Consumer Electronics, Vol. 51, No. 2, MAY 2005 axis. The desired angle of the input vector is the last some angles. Therefore a dual-mode CORDIC module can i−1 accumulative angle ξ = − µ θ and the amplitude is the serve for these operations. Both the coarse and the fine CFO i ∑ κ κ κ =0 estimate are acquired by the CORDIC module and stored to length on x-axis after all iterative rotations are finished. CFO phase accumulator shown in Fig. 4, and then Φc and Φf in The architecture of the proposed dual-mode CORDIC is (9) will be sent into the CORDIC module to rotate the signal shown in Fig.5. The equations in (3) and (4) are implemented sequence {rn} in the shift registers. -i as shown in Fig .5a. The multiplication 2 adopts shifters for xi 3) Filtered channel estimation After the pre-FFT signal is compensated by CFO and the and yi. According to the control signal “mode”, µi selects addition or subtraction in “ADD/SUB” blocks. Fig. 5b shows guard interval is removed, the post-FFT signal of the lth the symbol of a single iterative unit. In Fig. 5c the output port OFDM symbol is given by N −1 xout and ξN are amplitude and angle of the input vector in = ⋅ − π (10) Yl,k ∑ zl,n exp( j2 (n / N)k rectangular-to-polar mode whereas xout and yout represent the n=0 output vector being rotated in phase-rotation mode. The buffer where N is the number of FFT points. The wideband signal is located between every two iterative units can shorten critical transmitted over frequency-selective fading channel. Because path and therefore raise clock rate. The signal ψ and “mode” the duration of a packet is relatively shorter than the coherence are propagated through buffers and then all units are able to time of indoor channel, i.e. velocity less than 10 km/hr, the operate at different modes with different angles. However, channel is assumed to be constant during a packet period. increasing the number of buffers increases hardware Then the channel transfer function at kth subcarrier frequency complexity. Based on requirement of accuracy and clock rate can be represented as time-invariant model: in practical design, the module adopts nine iterative units and H = ∑ h exp( − j2πk (τ /T ) (11) inserts three buffers totally. k i i u i 2) CFO estimation and compensation where h is the complex-valued tap, τ is its path delay, and T Coarse CFO is estimated when AGC has been operated. The i i u the duration of FFT. Under consideration of residual CFO ∆f, angle calculated from the same correlator in frame detection ’ SCO value ζ = ( T -T ) / T , and symbol timing offset n , the calculates CFO value in maximum likelihood method: s s s ε post-FFT signal can be simplified as an equivalent model [10]: N * = = ∠ ⋅ (5) Yl,k α(n ε )al,k H k exp[ j2π(k/N)n ε ] Φc (∑ rn−N −16+i rn−N +i ) (12) = φ + + i 1 exp[ j2πl k(Tu Tg )/Tu ] Wl,k The value in (5) means the phase rotation caused by CFO where α(nε) is attenuation function of nε, al,k the transmit data, within 16 samples and then the coarse CFO value can be ’ Tg the guard interval, Ts the sampling time, Ts the offset calculated by the sampling period Ts: sampling time, Wl,k the noise caused from AWGN, nε , ICI, ∆ = π ⋅ (6) fc Φc /(2 16Ts ) Hz phase noise and other non-ideal parameters, and Because the angle in (5) only varies between π and -π, the φ ≈ ∆ + ς ⋅ k fTu k . The channel transfer function combined coarse CFO estimate has the largest range up to ±625 KHz. with the effect caused from symbol timing error nε together Because the phase compensation is operated at Ts, the true with attenuation function can be represented as CFO value in (6) is not necessary: only the angle caused by ~ = α π CFO within a sample is required. Before fine CFO is estimated, Hk (nε )Hk exp[ j2 (k / N)nε ] (13) the signal must be rotated according to the accumulated angle Therefore (12) can be re-written as so that the residual CFO is within ±156.25 KHz, which is the = ~ ζ ⋅ + α Yl,k a l,k H k exp[ j2.5πl ( k )] largest estimate range of fine CFO: (14) + W = ⋅ − l,k cn rn exp( jnΦc /16) (7) where α=∆fTu is phase shift caused from the residual CFO in where the sampling period is neglected and the division 16 is an OFDM symbol, (Tu+Tg )/Tu)=1.25, and k = -26,…-1,1,...26 easy to be implemented. After STD decides the symbol timing, in the system. Considering that (ζ•k+α) is small enough in (14) the estimate of fine CFO can take the correlation value of the and the pre-known signal is defined as ã in the preamble with two identical long preambles: l,k l=λ,λ+1, the estimate of kth sub-channel is generally given by 64 Φ = ∠ ⋅ * (8) Hˆ = (Y / a~ +Y / a~ ) / 2 (15) f (∑ cl−128+i cl−64+i ) k λ,k λ,k λ+1,k λ+1,k i=1 And then the signal is equalized as where l is the last index of long preambles. The final CFO- = ˆ compensated signal is represented as aˆl,k Yl,k / H k (16)

As shown in Fig. 6 the amplitude of Ĥk is shorter than the z = r ⋅ exp[− j(Φ /16 + Φ / 64)n] (9) n n c f practical value; moreover adjusting the phase shift in (14) can which includes the coarse and the fine CFO estimate. improve the estimates of channel response and phase tracking. Equations of (5) and (8) calculate angles from complex-valued Therefore the design uses a CORDIC module to translate the signals, whereas (7) and (9) rotate complex-valued signals by C.-S. Peng et al.: CORDIC-Based Architecture with Channel State Information for OFDM Baseband Receiver 407

~ π ς ⋅ +α + H k exp[ j2.5 ( k )(l 1)] ˆ = − θˆ + ⋅ ρ = λ + λ + Yl,k Yl,k exp( j l,k k STC ) , l 2, 3,... (23) = ˆ φˆ Al+1,k exp( j l+1,k ) where ρSTC=0 or ±π/64 is the compensative phase from STC, depending on the criterion of SCO. Symbol timing error 1 Y Y + Hˆ = ( l,k + l 1,k ) accumulates when SCO exists, and therefore, the design k 2 a a l,k l+1,k proposes the criterion to decide whether the STC should be triggered or not: ˆ ~ 2.5π (m − 0.5)ζ λ+ ≥ π/64 (24) H exp[ j2.5π (ς ⋅k +α)l] m 2.5π (ς ⋅k +α) k = ˆ φˆ Al,k exp( j l,k ) Once the criterion in (24) happens, the STC moves the symbol ˆ timing forward or backward according to the sign of ζ λ+ and m Fig. 6. Illustration of channel estimation the compensative phase ρSTC is selected correspondingly. c The output signal from (23) extracts 4 pilots and 48 data signals, all adjusted with phase tracking: ˆ ˆ . The omplex-valued signal from (14), which is divided by the Pl,k and Dl,k known transmit signal ãl,k into polar-coordinate: pilots occupies at k= -21,-7,7,21 in each OFDM symbol. The ~ ~ ~ Y / a = H exp[ j2.5π (ς ⋅ k +α)l]+W / a phase recovers from the known pilots: l,k l,k k l,k l,k (17)  ∠ ˆ = = ˆ φˆ = λ λ +  Pl ,k , Pl ,k 1 Al,k exp( j l,k ), l , 1 φˆ =  (25) p ,l ,k ∠Pˆ + π , P = −1 Then the amplitude and the phase of kth sub-channel are  l ,k l ,k estimated by the two long preambles: The adaptive tracking of residual CFO is then obtained by ˆ ˆ ˆ ˆ A = (Aλ + Aλ+ ) / 2 Θ = φ [4 ⋅ 2.5π ⋅ (m − 0.5)] k ,k 1,k (18) λ+m ∑ p,λ+m,k k=−21,−7,7,21 (26) φˆ = φˆ + φˆ k ( λ,k λ+1,k ) / 2 αˆλ+ = αˆλ+ − + µ Θλ+ Note that the phase in (18) is not an actual phase of sub- m m 1 0 m and that of SCO is obtained by channel but being a reference for phase compensation. Based   on the correlative property between adjacent sub-channels, the  φˆ − φˆ   ∑ p,λ+m,k ∑ p,λ+m,k  estimate of sub-channel can be improved further by delivering  k=21,7 k=−21,−7  (27) Λλ+ = them into a filter with 3 taps [0.25, 0.5, 0.25], which reduces m 56 ⋅ 2.5π (m − 0.5) noise about 3 dB. Simulation reveals that the total system ζˆ = ζˆ + µ Λ performance on PER will improve about than 1 dB using the λ+m λ+m−1 1 λ+m filter for 64QAM with 3/4 coding rate. where µ0 and µ1 are step sizes that adaptively control the speed 4) Phase tracking of convergence. The estimate of SCO is also used for the The angle shown in Fig. 6 helps to estimate SCO and criterion in (24) for symbol timing adjustment. The amplitude residual CFO, given by and the phase of (17) and (25) are acquired by the dual-mode ˆ ˆ CORDIC module after FFT as shown in Fig. 4, whereas the δ = φλ + −φλ k 1,k ,k (19) phase rotation in (23) uses another mode of the same module. = 2.5π (ς ⋅ k + α) + N − N w,λ +1,k w,λ,k The data signal in (23) is only compensated with phase error where Nw,l,k means phase noise caused by Wl,k /ãl,k in (17). caused by channel response, residual CFO, and SCO, but not Because k is symmetric, the first estimate of residual CFO can normalized by amplitude of channel response, which will be be obtained by considered in the demapping. 1 αˆ = ∑δ , k = −26...26,k ≠ 0 (20) 5) Modified demapping using CSI λ+1 π ⋅ k 2.5 52 k As a bridge between the inner and the outer receivers, and the first estimate of SCO is given by demapping translates complex-valued symbols into a bit-level 26 −1 26 −1 sequence according to mapping rules and modulation types in δ − δ δ − δ ∑ k ∑ k ∑ k ∑ k the transmitter. In general, before demapping, the data signal ς = k=1 k=−26 = k=1 k=−26 (21) ˆλ+1 extracted from (23) is normalized by channel amplitude 26 1755π 2.5π ⋅ 2 ⋅ ∑ k estimated in (18) and given by = k 1 ' = ˆ ˆ The phase which shall be compensated for (λ+m)th OFDM Dl,k Dl,k / Ak (28) symbol and kth sub-channel is defined as Thus the signal in (28) can be regarded as the signal in the θˆ = φˆ + π ζˆ ⋅ + α − transmitter coupled with noise, as shown in Fig. 7. The noise λ+m,k k 2.5 ( λ+m−1 k ˆλ+m−1)(m 0.5) (22) mainly results from thermal noise and channel attenuation, = m 2,3,4,... which also inflict inaccuracy on synchronization, channel Therefore the post-FFT signal is compensated with the phase estimation and equalization, and causes more noise further. from (22) and shown as Also, the noise is enhanced by the division in (28) because of denominators with smaller channel amplitudes. A bit-level Contributed Paper Manuscript received January 12, 2005 0098 3063/05/$20.00 © 2005 IEEE 408 IEEE Transactions on Consumer Electronics, Vol. 51, No. 2, MAY 2005

0 57 64 113 128 177 196

First Symbol Input Serial to Parallel Output Parallel to Serial

Operation

Second Input Serial to Parallel Output Parallel to Serial Symbol Operation

Butterfly Butterfly Timing of Computation Computation Operation Complex Number Multiplication Section 02 8 48 55 56 Fig. 9. Timing diagram of radix-8 64-points FFT/IFFT operation

Fig. 7. Constellations from different mapping rules under different C/N conditions demapping rules considering CSI for 64QAM in 802.11a, for example, are proposed, without any division: d , d ∆ l,k,0 l,k,3 = ˆ 15 λ Kmod Ak 10  λ > λ , Gr dl,k

b0  5 τ = − λ < −λ l,k ,0  , Gr dl,k 0  Gr dl,k , otherwise dl,k,1 , dl,k,4 15  λ < λ , Grdl,k 3 10   − λ, G d > 5λ b1 5 τ =  r l,k l,k ,1 G d + 4λ, − 5λ < G d < −3λ 0  r l,k r l,k 4λ − G d , 3λ < G d < 5λ output, 4 bits d , d  r l,k r l,k 15 l,k,2 l,k,5  λ, 3λ < G d < 5λ 10  r l,k

b2 − λ λ < < λ 5  , 7 Gr dl,k or Gr dl,k  G d + 6λ, − 7λ < G d < −5λ 0 τ =  r l,k r l,k l,k ,2 − + λ − λ < < −λ 0128 256 384 512 640 768 896 1023  (Gr dl,k 2 ), 3 Gr dl,k input, 10 bits  G d − 2λ, λ < G d < 3λ Index  r l,k r l,k Fig. 8. Demapping functions of 64QAM  6λ − G d , 5λ < G d < 7λ  r l,k r l,k (29)

where Kmod is a normalized factor of modulation, and Gr the sequence after demapping is denoted as {dl,k,ν}, in which ν=0,…,M-1, and M is the number of bits per symbol coefficient for different coding rates. corresponding to different modulation, e.g., M=1,2,4,6 for D. Design of FFT/IFFT and outer receiver BPSK, QPSK, 16QAM and 64QAM, respectively. Since the As shown in Fig. 2, FFT and outer receiver are regarded as complex-valued signal in (28) is composed of real and image independent modules in the receiver. Because of half-duplex parts, the demapping function of 64QAM, for example, extract operation in the system, an identical dual-mode module 3 bit-level signals from each part, as shown in Fig. 8. The FFT/IFFT can be designed for both transmitter and receiver. slopes in Fig. 8 depend on the minimum distance between any Under analysis for radix-8 design, it needs 16 operations of two constellations and the space ∆ is half the minimum radix-8 butterfly and 49 operations of complex-valued distance. Based on requirements of soft-decision and hardware multiplication to implement a 64-point FFT/IFFT. Therefore complexity, the output sequence from demapping is assigned the design employs a radix-8 butterfly and a complex-valued to 4-b resolution. Besides, the resolution is restricted for multiplier to acquire latency with only 113 clock cycles, by saving more buffers in the deinterleaving and the decoder. meticulous arrangement for timing as shown in Fig. 9. Due to frequency-selective fading channels, sub-channel Although outer receiver includes many functions, it is viewed power varies randomly. Some sub-channels attenuate severely as an integrated module fully processing bit-level sequence and result in extremely low SNR, and then most part of error from demapping. The deinterleaving is the inverse operation of bits happens on these sub-channels. The modified demapping block interleaving, and the de-puncture fills up those using CSI adopts sub-channel amplitude to weight the bit-level punctured bits with neutral values. The decoder is sequence; and therefore, the reliability of those bits acquired implemented by Viterbi algorithm, which searches for the from these severer sub-channels can be lowered, and the maximum path metric through performance of the decoder based on maximum likelihood will L−1 K −1 M −1 ' nrz (30) be improved. Weighting the path metric by sub-channel power ∑∑∑dl,k ,υ cl,k ,ν from CSI has been highlighted in [7], but using amplitude as l=0 k=0 ν =0 where nrz denotes the nonreturn-to-zero codeword weighting factor is a better way because the division in (28) {cl,k,ν } ’ can be eliminated and the performance can be maintained. The associated with a particular path through the trellis, and d l,k,ν is C.-S. Peng et al.: CORDIC-Based Architecture with Channel State Information for OFDM Baseband Receiver 409

Short GI2 L1 L2 Header Payload Packet End

Coarse CFO 4 < 320 cycles L1 L2 Header Payload Compensation (4 OFDM symbols) Test D Symbol Timing Detection Baseband Up- 128 Pattern A 4 Transmitter 10 Convertor Generator Time Fine CFO Estimate with 4 with + Fine CFO Compensation L1 L2 Header Payload 2x-SRRCF domain CRC-16 MAC-TX D PA Path Interface A (C/N STC -5x12 = -60 10 5th-oder 113 Elliptic FFT Processing Header LPF Mul L1 L2 Payload Frame Detect 40MHz Frequency Ch Frequency Synthesizer Header Check M Domain 40MHz Equalization 8 RSSI Header Payload 6 1 Total Latency cycles: Demapping D AW Baseband A Packet 4+128+4+4+113+(-5x12)+8+1+48+30 = 280 48 Receiver 10 Down- Deinterleaving Error 6 30 Processing with Convertor (-5x12): reduced latency cycles by STC End Check Decoding Fig. 11. Simulation environment of the system Fig. 10. Timing diagram of the receiver dual-mode CORDIC, and occupies about 74k gate count. Table 1. Gate count list of the baseband processor Inner R eceiver 74.2K FFT/IFFT is designed for high performance (SQNR=63 dB) before FFT CORDIC_1 10.1K and low latency (113 clock cycles) with 66k gate count. The Fram e detection 22.8K AGC 1.7K inner receiver after FFT occupies about 69k gate count, Sym bol timing detecton 8.8K including a CORDIC module, phase tracking circuit, and a FFT_processor 66.1K Radix-8 butterfly7.8K demapping using CSI. The outer receiver operates at the same Complex multiplier 5.9K clock as the inner receiver, and therefore the Viterbi decoder Inner R eceiver 68.8K after FFT CORDIC_2 14.4K requires 3 modules of ACS to achieve 54 Mbps date rate. The Phase tracking 25.4K outer receiver with 4-b soft-decision and traceback length=90 CSI dem apping 9.1K Outer R eceiver 178.5K occupies about 179k gate count. The total equivalent gate deinterleaver 35K count is about 424k and the core size is 7.3 mm2 in 0.18-µm depuncture0.3K ACS x 398.1K CMOS. The RAM-free design makes the baseband processor traceback 42.7K more convenient and process-independent in SoC design. For SRRCF19 10.2K TX and Test 21.9K consideration of practical effects caused from channel and Total 424.2K RF/analog, the integrated baseband processor written in Verilog is not only tested by test patterns, but co-simulated in acquired from (29), weighted by CSI. Based on wordlength the simulation environment instead. analysis for soft-decision, 4-b resolution is adopted to ’ represent d l,k,ν and an extra 1 bit for dummy neutral value from III. SIMULATION AND RESULT de-puncture. Based on convergence analysis, the traceback depth is set as 90 stages to achieve higher performance. After A. Simulation environment the decoder all bits are de-scrambled and then outputs to MAC Modern communication systems adopt more complex with 8-b wordlength. techniques and specifications, and therefore there are many E. Implementation modes to be selected: modulation, coding, payload length, etc; moreover, operating at undetermined environment, the systems The total latency of the receiver, defined as delay from the face problems to estimate random parameters under different end of a packet to an instant that all payload bits are decoded noise level and channel conditions, and need hundreds of and de-scrambled, directly affects practical transmission packets to be tested for PER under one condition. All these throughput, i.e. the long latency the low throughput. The total factors mentioned previously make it very difficult to use only latency is restricted to four OFDM symbols, i.e. 320 cycles at test patterns to verify the design. Besides, it is important to 20 MHz sampling clock rate. As shown in Fig. 10 each consider relation with analog and RF components. Therefore operation in the receiver results in different latencies. All an integrated simulation platform shown in Fig. 11 is adopted operations involved with CORDIC occupy 4 cycles, and the to co-simulate baseband, analog and RF signals. For PER equalizer therefore needs 8 cycles (4 for pilots and 4 for testing the transmit sequence is generated and coded with CORDIC). Because there are 16 cycles for guard interval in an CRC-16, and then the MAC-to-TX interface serves to move OFDM symbol, STC reduces 5 cycles every symbol and the data from MAC to transmitter. After coding and continues for 12 times. Thus the total latency saved by STC is modulation in transmitter, the data signal conforms to OFDM only 280 cycles, which are mainly occupied in STD, FFT and WLAN specification and outputs with 2x over-sample (40 the traceback in the decoder. Main components in the MHz/sample) with 10-b resolution. The LPFs after DAC use baseband processor are listed in Table 1 with sub-items and 5th-order elliptic filter to remove duplicate spectra but result in gate counts. The inner receiver before FFT includes frame nonlinear phase response and transmission ripples, which can detection, AGC, STD, STC, and CFO synchronization with a Contributed Paper Manuscript received January 12, 2005 0098 3063/05/$20.00 © 2005 IEEE 410 IEEE Transactions on Consumer Electronics, Vol. 51, No. 2, MAY 2005

26 30 No Loss No Loss 25 Ideal PA AW G N 28 Back-off=4dB 24 ETSI Ch.A Back-off=5dB Back-off=6dB 23 ETSI Ch.B 26 Back-off=7dB ETSI Ch.C Back-off=8dB dB )

22 ( 24 (d B ) out 21 SN R 22 age SNR out SNR out 20 Aver

19 20 A verage A verage

18 18 17

16 16 20 21 22 23 24 25 20 21 22 23 24 25 26 27 28 29 30 C/N (d B ) C/N in (dB ) (a) (b) 30 0.35 No Loss Ideal No PN PN =-90dBc at 100kH z 28 0.3 PN =-80dBc PN =-100dBc at 100kH z PN =-85dBc B ackoff=6dB 26 PN =-90dBc 0.25 B ackoff=7dB PN =-100dBc B ackoff=8dB dB) PER (

24 0.2

e, GainIm=0.5dB, PhaseIm =1deg out Rat GainIm=1dB, PhaseIm=2deg or SN R 22 r 0.15 Er

age Aver Packet 20 0.1

18 0.05

16 0 20 21 22 23 24 25 26 27 28 29 30 19.52020.52121.522 C/N (dB) C/N (dB) (c) (d)

Fig. 12. Evaluation of the inner receiver performance: (a) channel (b) power amplifier back-off (c) phase noise (d) PER under different conditions

be viewed as part of channel response. In RF transmitter the 2 = 1 ' − ~ (31) quadrature up-converter operates at 2.4GHz or 5GHz band. EVM ∑∑ Dl,k dl,k LK Both up-converter and down-converter face problems of I/Q lk where L is the number of OFDM symbols, K the number of mismatch, gain and phase imbalance, and phase noise from ~ local frequency synthesizer; whereas, linear PA must consider data sub-channels in an OFDM symbol, and dl,k the pre- efficient back-off based on OFDM signal with larger PAPR. known signal in transmitter. The average SNR value can be

Multipath channel models use ETSI indoor models and acquired from SNR= -20log10(EVM), which means the ratio of exponential decaying models. The C/N controller adjusts signal and noise after processing of inner receiver. The method attenuation of transmit signal to the power level compared with using EVM or average SNR shortens simulation time because noise floor, i.e. about -173 dBm/Hz. VGA is controlled by 6-b only several packets are required. Fig. 12 shows SNR before signal from AGC of the receiver and the adjust range depends demapping for 64QAM with 3/4 coding rate under different on RF front-end components. The baseband receiver outputs a testing conditions: (a) AWGN and ETSI Channel A, B, and C number of indicative signals: Frame Detect, Header Check, (NLOS 50, 100, and 150 rms delay spread, respectively) (b) and RSSI. The data after decoding are checked by CRC-16 to PA back-off defined as difference between input P1dB and count PER value. In this environment the baseband processor input average power (c) phase noise at 100KHz from local written in HDL or MATLAB can be simulated in a full frequency synthesizer. Fig. 12(a) indicates the SNR loss of transceiver system, considering all determined and random inner receiver implemented in HDL is about 1 dB under conditions. AWGN and larger than 1.5 dB under channel models. In Fig. B. Simulation results 12(b) the observed SNR loss is used to select the back-off for PA: back-off larger than 7 dB is close to ideal PA and the There are two main methods to evaluate performance of a practical loss for PER=0.1 is smaller than 0.5 dB as shown in transceiver system: error vector magnitude (EVM) used for Fig. 12(d). The SNR values in Fig. 12(c) also help to select signal processing in the inner receiver, and PER or BER for phase noise: PN=100 dBc is close to the ideal value with full transmission including coding and modulation. EVM is practical loss=0.25 dB as shown in Fig. 12(d). To predict the calculated from the statistic of error vectors defined as practical C/N value in the transceiver from EVM or average differences between ideal signals and equalized signals before SNR before demapping, the performance on PER and BER in demapping: the outer receiver is proposed in Fig. 13 for 8 kinds of data C.-S. Peng et al.: CORDIC-Based Architecture with Channel State Information for OFDM Baseband Receiver 411

30 802.11a C/N required [2] C/N required from ref. [12] 25 C/N required from ref. [13] C/N required from the proposed design

20 SNR required from the outer Rx

15 C/N or SNR(dB) or C/N 10

5 PER=10%

0 6 9 12 18 24 36 48 54 Data Rate (Mbps) Fig. 13. Performance on BER and PER of outer receiver Fig. 14. Performance comparison under PER=0.1 5 rates, based on input signal with i.i.d. Gaussian noise. Channel A with some sub-channels suffering extremely deep According to Fig. 13 all SNR values required from data rates fading.0 The channel estimation extracts power values of 48 and PER=0.1 are listed in the column marked as AWGN in sub-channels where there are two sub-channels that are 22 dB -5 Table 2. For example, the SNR required for 54Mbps is at least less than average power as shown in Fig. 15. The simulation Table 2. Performance results considering CSI reveals-10 that it can not reach PER=0.1 even when C/N=50 dB Rate (M bps) M od. C oding RateAWGNRayleigh Ch. CSI CSI Gain 22dB 6 BPSK 1/20.88.65.23.4 -15 9 BPSK 3/43.6 15.8 11.14.7

12 Q PSK 1/23.8 12.68.54.1 Normalized sub-channelpower (dB) -20 18 Q PSK 3/46.5 20.6 14.66 24 16Q A M 1/29.8 18.6 14.54.1 -25 36 16Q A M 3/4 12.8 27.5216.5 0 5 10 15 20 25 30 35 40 45 Sub-channel index, k 48 64Q A M 2/3 17.3 28.2 23.64.6 Figure 15. Power values of sub-channels from ETSI Channel A 54 64Q A M 3/4 18.5 32.7 26.56.2

18.5 dB for PER less than 0.1, but the practical C/N required from implementation shown in Fig. 12 (d) is at least 20 dB. Therefore the implementation loss is 1.5 dB, caused from algorithms and quantization in the receiver. Table 2 also shows the performance for PER=0.1 and 1024 bytes using 4-b soft-decision and CSI method under i.i.d. for 54Mbps if no CSI is used. But in the design the target is Rayleigh channel, i.e., each complex-valued symbol before reached when C/N=32 dB. Such special case highlights demapping is multiplied by a complex-valued response with excellent improvement of CSI method under frequency- real and image parts both as independent Gaussian noise. selective fading channels. Under this condition the average SNR required is shown in the column marked as Rayleigh Ch. in Table 2. The columns IV. CONCLUSION marked as CSI and CSI gain represent the SNR required using An efficient design for WLAN OFDM receiver is proposed, CSI and its improvement. There are 3dB to 6 dB improvement including algorithm, architecture, and simulation results. Using from 6 Mbps to 54 Mbps by use of CSI in the i.i.d. Rayleigh CORDIC-based design in architecture helps to simplify and channel. unify the algorithm for equalization and synchronization in the C. Performance of the design inner receiver. Employing CSI improves performance on PER The performance of the baseband receiver is presented in and BER in outer receiver under severer multipath fading Fig. 14, compared with two reference designs [12]-[13] and channels. The modified demapping combining equalization 802.11a requirement. The test in this paper is based on with CSI eliminates complex-valued divisions and reduces PER=0.1 and a packet with 1024 bytes in AWGN, and hardware implementation cost. The simulation platform, which considers RF and analog effects in the simulation platform. co-simulates hardware design with RF, analog parts and The required C/N in Fig. 14 reveals the receiver design using channel models, is very useful in verification and performance CORDIC-based architecture performs well under hardware evaluation. Simulation results and complexity analysis reveal evaluation. On the other hand, to emphasize the benefit using the design is high performance and efficient for OFDM CSI in the demapping, Fig. 15 shows a special case of ETSI systems.

Contributed Paper Manuscript received January 12, 2005 0098 3063/05/$20.00 © 2005 IEEE 412 IEEE Transactions on Consumer Electronics, Vol. 51, No. 2, MAY 2005

ACKNOWLEDGMENT Chia-Sheng Peng (S’99) was born in Nantou, Taiwan, in 1976 and received his B.S. degree in Electrical This work was conducted by the Trans Wireless Technology Engineering from National Tsing Hua University, Laboratory (TWT Lab.) and sponsored jointly by the Ministry HsinChu, Taiwan, in 1998. He is currently working of Education and the National Science Council, Taiwan under toward the Ph.D. degree in Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan. His the contract number NSC-93-2220-E-009-011. The authors research interests include wireless communication also appreciate Ching-Wen Kung, Chia-Hsin Lin and K.H. Lin design on algorithm and architecture for synchronization, for their assistance in the design. equalization, and modulation.

Yuan-Shin Chuang was born in Kaohsiung, Taiwan, in REFERENCES 1977. He received the B.S. and M.S. degrees in [1] R. van Nee and R. Prasad, OFDM Wireless Multimedia Electronics Engineering, National Chiao-Tung University, Communications, Artech House, Boston, 2000. HsinChu, Taiwan, in 2001 and 2003, respectively. He is [2] IEEE, “Wireless LAN Medium Access Control and Physical Layer currently working toward the Ph.D. degree in Electronics specifications: High-speed physical layer in the 5 GHz band,” Engineering, National Chiao Tung University, Hsinchu, P802.11a/D7.0, July 1999. Taiwan. His research interests include circuit design on [3] ETSI, “Digital Video Broadcasting: Framing Structure, Channel wireless communication systems. coding, and Modulation for Digital Terrestrial Television,” European Telecommunication Standard, EN 300-744, Aug. 1997. [4] B. Come, et al, “Impact of front-end non-idealities on bit error rate Kuei-Ann Wen (SM’02) received her B.S., M.S. and performance of WLAN-OFDM transceivers,” in Proc. of and Ph.D. degrees in electrical engineering from National Wireless Conference, pp. 91-94, Sept. 2000. Cheng-Kung University, Taiwan, in 1983, 1985 [5] M. Speth, D. Daecke and H. Meyr, “Minimum Overhead Burst respectively. She is currently a full Professor in the Dept. Synchronization for OFDM Based Broadband Transmission,” in Proc. of EE at National Chiao Tung University, Taiwan. Dr. of GlobeComm, vol. 5, pp. 2777-2782, Nov. 1998. Wen's research interests are in the areas of SoC design [6] W. Lee, H. Park, and J. Park, “Viterbi decoding method using channel and VLSI wireless communication circuit/system also state information in COFDM system,” IEEE Trans. on Consumer including RF circuits and baseband design. She covers Electronics, vol. 45, no. 3, pp. 533-537, Aug. 1999. and enhances the curriculum for SoC integration, VLSI computing signal [7] M.R.G. Butler, et al, “Viterbi decoding strategies for 5 GHz wireless processing, and RF wireless communication. Dr. Wen has been involved in LAN systems,” in Proc. of Vehicular Tech. Conf., pp. 77-81, 2001. several key research projects including advanced IC/VLSI for academic [8] Y.H. Hu, “CORDIC-based VLSI architectures for digital signal excellence, and technology transfer for successful high-tech. She also set up processing,” IEEE Signal Processing Magazine, vol. 9, pp. 16-35, 1992. the research cooperation laboratories with Agilent, Mentor Graphics and [9] H. Zhana, Z. Wang, and S.S. Chandra, “Implementation of frequency Synopsys for the development of wireless SoC design automation as well as offset correction using CORDIC algorithm for 5 GHz WLAN the flow for industrial experts. Besides, she leads the Trans. Wireless applications,” in Proc. of ICCS, vol. 2, pp. 983-987, 2002. Technology (TWT) Laboratory sponsored by United Microelectronics Co. [10] M. Speth, S.A. Fechtel, G. Fock, and H. Meyr, “Optimum Receiver (UMC) for the development of advanced CMOS RF. She also serves as Design for Wireless Broad-Band Systems Using OFDM-Part I,” IEEE president of IP Center and the leader of SoC Center at NCTU. Trans. on Comm, vol. 47, no. 11, pp. 1668-1677, Nov. 1999. [11] John Terry and Juha Heiskala, OFDM Wireless LANs: A Theoretical and Practical Guide, pp. 51-55, Sams Publishing, Indianapolis, 2002. [12] J. Thomson, et al, “An Integrated 802.11a Baseband and MAC Processor,” in Proc. of ISSCC, vol. 1, pp. 126-451, Feb. 2002. [13] T. Fujisawa, et al., “A Single-Chip 802.11a MAC/PHY with a 32-b RISC Processor,” IEEE Journal of Solid-State Circuits, vol. 38, no. 11, pp. 2001-2009, Nov. 2003.