A Coprocessor for Fast Searching in Large Databases: Associative Computing Engine
Total Page:16
File Type:pdf, Size:1020Kb
A Coprocessor for Fast Searching in Large Databases: Associative Computing Engine DISSERTATION submitted in partial fulfillment of the requirements for the degree of Doktor-Ingenieur (Dr.-Ing.) to the Faculty of Engineering Science and Informatics of the University of Ulm by Christophe Layer from Dijon First Examiner: Prof. Dr.-Ing. H.-J. Pfleiderer Second Examiner: Prof. Dr.-Ing. T. G. Noll Acting Dean: Prof. Dr. rer. nat. H. Partsch Examination Date: June 12, 2007 c Copyright 2002-2007 by Christophe Layer — All Rights Reserved — Abstract As a matter of fact, information retrieval has changed considerably in recent years with the expansion of the Internet and the advent of always more modern and inexpensive mass storage devices. Due to the enormous increase in stored digital contents, multi- media devices must provide effective and intelligent search functionalities. However, in order to retrieve a few relevant kilobytes from a large digital store, one moves up to hundreds of gigabytes of data between memory and processor over a bandwidth- restricted bus. The only long-term solution is to delegate this task to the storage medium itself. For that purpose, this thesis describes the development and prototyping of an as- sociative search coprocessor called ACE: an Associative Computing Engine dedicated to the retrieval of digital information by means of approximate matching procedures. In this work, the two most important features are the overall scalability of the design allowing the support of different bit widths and module depths along the data path, as well as the obtainment of the minimum hardware area while ensuring the maximum throughput of the global architecture. After simulation, the correctness of the results delivered by the FPGA prototyping board could be verified using a text encoding scheme based on hashing theory. The performance of the hardware platform has been compared with the same application running in software on a general purpose processor. Accordingly, a speed-up of three to four orders of magnitude is achieved for the FPGA system versus many recent personal computers with different internal hardware configurations. Acknowledgments First and foremost, I express my gratitude to Prof. H.-J. Pfleiderer for his constant support, motivation and encouragements. Thank you for welcoming me in the Institute of Microelectronics at the University of Ulm and giving me the opportunity to pursue my research interests. Furthermore, I am very grateful to Prof. T. G. Noll of the RWTH Aachen for his interest in my work and his agreement to report about it. Thank you for the extremely helpful feedback, for the suggestions about the completion of the manuscript and for advising on this dissertation. I would also like to thank Prof. P. Ruj´an for his continuous encouragements and for pursuing my work over further international projects. Dr. G. Lapir is gratefully acknowledged for the discussions at an earlier stage of this project and for providing information and materials about the associative search method. I would like to thank my colleagues from the Institute of Microelectronics, especially M. Bschorr, C. G¨unter, O. Pf¨ander, J. Rauscher, W. Schlecker, K. Schmidt, R. Schreier and E. Schubert for all the constructive discussions and comments, as well as for the very comfortable atmosphere at work. Nonetheless, I am indebted to G. Kirilov for his expertise and contribution to the synthesis of the memory controller. Finally, special thanks go to Mrs. H¨ofer for her kindness, help and support. Ulm, October 18, 2007 Christophe Layer Table of Contents Chapter I ⋄ Introduction ......................................... 1 1 Information Retrieval Systems ....................... ................ 1 2 The Problem with Storage Devices ...................... ............. 3 2.1 Running into the Memory Wall ........................ .......... 3 2.2 Need for Hardware Support.......................... ............ 5 3 Organization of the Thesis ........................... ............... 6 Chapter II ⋄ Background ........................................ 9 1 Designing Digital Systems........................... ................ 9 1.1 Integrated Circuits Implementation Strategies . ................ 10 1.2 Microprocessor Design Techniques . .............. 14 1.3 Field Programmable Gate Arrays..................... ............ 16 1.4 Semiconductor Memory Devices . ............ 18 2 Pattern Matching in Strings .......................... ............... 20 2.1 Classical Algorithms............................. ............... 20 2.2 Flexible Pattern Matching......................... .............. 21 2.3 Dynamic Programming .............................. ........... 22 2.4 Hash Functions and Text Signatures . ............ 23 3 Sorting Techniques and Algorithms . ............... 25 3.1 Sequential Sorting............................... ............... 25 3.2 Bit-Level Structures ............................. ............... 28 3.3 Sorting Networks................................. .............. 30 3.4 Summarization ................................... ............. 31 Chapter III ⋄ Related Work ..................................... 33 1 Stateofthe Art in Software ........................... .............. 33 1.1 Web Search Engines ................................ ............ 33 1.2 Software Functionalities for Computers . ............... 34 2 Hardware Accelerators.............................. ................ 35 2.1 Associative and Parallel Processors Systems . .............. 35 2.2 Merging Logic and Memory ........................... .......... 36 2.3 Special Purpose Coprocessors ...................... .............. 38 2.4 Summarization ................................... ............. 39 3 Associative Access Method ........................... ............... 40 3.1 Building the Signature File ........................ .............. 40 3.2 The Retrieval Process............................. .............. 43 Chapter IV ⋄ System Level Analysis ........................... 49 1 Motivation and Expectations ......................... ............... 49 1.1 Problem Statement................................ ............. 49 1.2 Proposed Research ................................ ............. 50 2 Profiling andAnalysis ............................... ............... 52 2.1 Exploration of the Software Model................... ............. 53 VIII Table of Contents 2.2 Sequential Algorithm Analysis . .............. 54 3 Hardware Accelerator Design ......................... ............... 58 3.1 Parallelization of the Algorithm ................... ............... 58 3.2 Modular System Architecture ....................... ............. 62 Chapter V ⋄ Architectural Hardware Design ................. 65 1 System Management and Peripherals . ............. 65 1.1 Operation Scheduling ............................. .............. 65 1.2 Designing the Memory Interface..................... ............. 66 2 Building the Computational Data Path .................. ............. 71 2.1 Penalty Calculating Unit.......................... .............. 71 2.2 Score Calculating Unit............................ .............. 74 3 Hardware Sorting and Merging......................... .............. 80 3.1 Parallel Sorting with Bitonic Networks . .............. 80 3.2 Optimization Methodologies ....................... .............. 83 3.3 Hardware Merging Solutions ........................ ............. 85 Chapter VI ⋄ Results and Evaluation .......................... 91 1 Hardware Implementation............................ ............... 91 1.1 Adaptation to the Development Platform.............. ............ 91 1.2 Synchronizing the Processing Units . .............. 94 1.3 Synthesis Results................................ ............... 98 2 Evaluation of the Hardware Model ...................... ............. 100 2.1 Benchmarking Environment . ............ 100 2.2 Scaling the Design................................ .............. 103 2.3 Performance Appraisal............................ .............. 104 Chapter VII ⋄ Conclusion and Outlook ........................ 109 1 On the Associative Computing Engine . ............. 109 2 FutureWork ........................................ .............. 111 Appendix A ⋄ On the Realization of Logarithms ............. 113 1 ErrorAnalysis..................................... ................ 113 2 ErrorCorrection ................................... ................ 116 3 Exponential Function............................... ................ 118 Appendix B ⋄ High Throughput Memory Controller ........ 119 1 FiniteStateMachine ................................ ............... 119 2 SDRAMTimings ...................................... ............ 120 List of Abbreviations ............................................. 123 Bibliography and References .................................... 125 List of Figures 1.1 The process of retrieving information . ................... 2 1.2 Storage system trends ............................. .................. 4 1.3 Hardware performance trends and memory wall . ............... 4 1.4 Hierarchical organization of the thesis . .................... 6 2.1 Flexibility, performance and power consumption trade-off ................ 10 2.2 Increasing the throughput using pipelining . ................... 11 2.3 Accumulator design using the cut-set technique . .................. 12 2.4 Retiming an architecture with feedback . .................. 13 2.5 Representation of an island-style FPGA . ................. 16 2.6