Low Density Parity Check Encoder and Decoder on Silago Coarse Grain Reconfigurable Architecture
Total Page:16
File Type:pdf, Size:1020Kb
DEGREE PROJECT IN ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2019 Low Density Parity Check Encoder and Decoder on SiLago Coarse Grain Reconfigurable Architecture WEIJIANG KONG KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Abstract Low density parity check (LDPC) code is an error correction code that has been widely adopted as an optional error correcting operation in most of today’s communication protocols. Current design of ASIC or FPGA based LDPC accelerators can reach Gbit/s data rate. However, the hardware cost of ASIC based methods and related interface is considerably high to be integrated into coarse grain reconfigurable architectures (CGRA). Moreover, for platforms aiming at high level synthesis or system level synthesis, they don’t provide flexibility under low-performance low-cost design scenarios. In this degree project, we establish connectivity between SiLago CGRA and a typical QC-LDPC code defined in IEEE 802.11n standard. We design lightweight LDPC encoder and decoder blocks using FSM+Datapath design pattern. The encoder provides sufficient throughput and consumes very little area and power. The decoder provides sufficient performance for low speed modulations while consuming significantly lower hardware resources. Both encoder and decoder are capable of cooperating with SiLago based DRRA through standard Network on Chip (NOC) based shared memory, DiMArch. And extra hardware for interface is no longer necessary. We verified our design through RTL simulation and synthesis. Encoder went through logic and physical synthesis while decoder went through only logic synthesis. The result acquired proves that our design is closely coupled with the SiLago CGRA while provides a solution with low- performance and low-cost. Keywords LDPC, CGRA, Reconfigurable architecture, VLSI design, ASIC Abstract LDPC-kod med låg densitet är en felkorrigeringskod som har vidtagits i stor utsträckning som en valfri felsökande operation i de flesta av dagens kommunikationsprotokoll. Nuvarande design av ASIC- eller FPGA- baserade LDPC-acceleratorer kan nå Gbit / s datahastighet. Hårdvarukostnaden för ASIC-baserade metoder och relaterade gränssnitt är emellertid avsevärt hög för att integreras i grova kornkonfigurerbara arkitekturer (CGRA). Dessutom ger plattformar som syftar till syntese på hög nivå eller syntes på systemnivå inte flexibilitet under lågprestanda med låg kostnadsscenarier. I detta examensarbete upprättar vi anslutning mellan SiLago CGRA och en typisk QC-LDPC-kod definierad i IEEE 802.11n-standarden. Vi designar lätta LDPC-kodare och avkodarblock med FSM + Datapath- designmönster. Kodaren ger tillräcklig genomströmning och förbrukar mycket lite areal och effekt. Avkodaren ger tillräckligt med prestanda för moduleringar med låg hastighet medan den förbrukar betydligt lägre hårdvaruressurser. Både kodare och avkodare kan samarbeta med SiLago-baserade DRRA genom standard Network on Chip (NOC) baserat delat minne, DiMArch. Och extra hårdvara för gränssnittet är inte längre nödvändigt. Vi verifierade vår design genom RTL-simulering och syntes. Kodaren genomgick logik och fysisk syntes medan avkodare genomgick endast logisk syntes. Det förvärvade resultatet bevisar att vår design är nära kopplad till SiLago CGRA och ger en lösning med låg prestanda och låg kostnad. Nyckelord LDPC, CGRA, Konfigurerbar arkitektur, VLSI design, ASIC Acronyms ASIC Application Specific Integrated Circuit AGU Address Generation Unit AWGN Additive White Gaussian Noise BP Belief Propagation CAD Computer Aided Design CGRA Coarse Grain Reconfigurable Architecture CSD NOC Circuit Switched Data NOC DiMArch Distributed Memory Architecture DRRA Dynamic Reconfigurable Resource Array DSP Digital Signal Processing EDA Electronic Design Automation FSMD FSM+Datapath HLS High Level Synthesis IP Intellectual Properties LDPC Low Density Parity Check MS Min-Sum NOC Network On Chip PSCC NOC Packet Switched Control and Configuration NOC QC Quasi Cyclic RACCU Runtime Address Constraint Computational Unit RCS NOC Regional Circuit Switch NOC RISC Reduced Instruction Set Computer RTL Register Transfer Level SCU Sequencer Computational Unit SLS System Level Synthesis SRAM Static Random-Access Memory TDMP Turbo Decoding Message Passing TPMP Two phase Message Passing VLSI Very Large Scale Integrate Table of Contents 1 Introduction .................................................................................................... 1 1.1 Problem Statement and Basic Concept Design ................................. 2 1.2 Thesis Outline .......................................................................................... 4 2 SiLago Design Platform ................................................................................. 5 2.1 Architecture of DiMArch ....................................................................... 7 2.2 Architecture of DRRA ............................................................................. 7 2.3 System-Level Architectural Synthesis Framework (Sylva) .......... 10 3 Low Density Parity Check Codes ............................................................... 13 3.1 Background of LDPC Codes ................................................................ 13 3.2 LDPC in IEEE 802.11n .......................................................................... 14 3.3 Back Substitution Encoding for IEEE 802.11n ................................ 16 3.4 TDMP Min-Sum Decoding ................................................................... 17 4 LDPC Encoder Architecture ....................................................................... 19 4.1 Design Challenges ................................................................................. 19 4.2 Design of Encoder Datapath ............................................................... 20 4.3 Design of Encoder Controller ............................................................. 21 5 LDPC Decoder Architecture ....................................................................... 23 5.1 Design Challenges ................................................................................. 23 5.2 Design of Decoder Datapath ............................................................... 24 5.3 Design of Decoder Controller ............................................................. 26 6 Result and Analysis ...................................................................................... 29 6.1 Result and Analysis for LDPC Encoder ............................................. 29 6.2 Result and Analysis for LDPC Decoder ............................................. 30 7 Conclusion and Future Work ..................................................................... 33 7.1 Conclusion .............................................................................................. 33 7.2 Future Work ........................................................................................... 33 1 Introduction One of the driving factors to improve the computation capacity of VLSI design is to push down the physical dimension of transistors reducing its cost. To make such efforts sustainable, power consumption must be scaled down proportionally to the length of transistors. “The proceed of this trend is measurable, predictable and unusually rapid,” as observed and formalized by Gordon Moore into the predicative and guiding rule for semiconductor industry, the Moore’s law [1]. Moore’s law can be briefly summarized as “complexity of integrated circuits doubles every two years” [2]. It is also indicated by the connection between other factors within cost and performance matrix of chips. As showed in Figure 1, performance per kilowatt-hour for computer systems has improved by the factor of 4 × 104 since 1985 [1]. High cost efficiency and accessibility of computation bring the inevitable trends of mobile computing and embedded computing, where the behavior of integrated circuits is tightly constrained by battery technology. Figure 1, Computations per kilowatt-hour over time [1] Variations in technological trends has created gaps between applications, VLSI and design technology [3]. The gap between the growing complexity of applications and the improvement in device performance caused by technological scaling is known as architecture efficacy gap. Architectural optimization based on desirable VLSI technologies are developed to close this gap and thus meet the performance demands of applications. In the meantime, the gap between scaling trends and actual available 1 productivity is called design productivity gap. This gap inspires researchers to keep developing new process and Computer Aided Design (CAD) tools to meet the demand of advanced architecture design. Opposition to the engineering efforts that grow exponentially together with the design complexity, the marketing window for chips usually lasts 6 or 8 months which is only one third of the develop development period [4]. Missing the best sales time means manufactures will have to take the risk that its products being obsolete in front of its next generation competitors. As a result, designers working in industry has adopted methods to reuse modularized Intellectual Properties (IPs) in order to eliminate the duplicated workloads. Such efforts raise the thought of modular design discipline. As point out by [5], a design scheme which is based on higher level of abstraction and communicates through standard NOC interface could benefit an efficient and reliable design. SiLago Coarse