Fast Division Algorithm with a Small Lookup Table Patrick Hung

Total Page:16

File Type:pdf, Size:1020Kb

Fast Division Algorithm with a Small Lookup Table Patrick Hung Fast Division Algorithm with a Small Lookup Table Patrick Hung, Hossam Fahmy, Oskar Mencer, Michael J. Flynn Computer Systems Laboratory Stanford University, CA 94305 email: hung, hfahmy, oskar, flynn @arithmetic.stanford.edu Abstract 2. Basic Algorithm This paper presents a new division algorithm, which requires ¾Ñ Let and be two -bit fixed point numbers between one two multiplication operations and a single lookup in a small Ü Ý ¾¼ ½ and two defined by Equations 1 and 2 where . table. The division algorithm takes two steps. The table ½ ¾ ´¾Ñ ½µ Ü ·¾ Ü · ·¾ Ü ½·¾ ¾ ¾Ñ ½ lookup and the first multiplication are processed concurrently ½ (1) ½ ¾ ´¾Ñ ½µ ½·¾ Ý ·¾ Ý · ·¾ Ý in the first step, and the second multiplication is executed ¾ ¾Ñ ½ ½ (2) in the next step. This divider uses a single multiplier and a Ñ To calculate , is first decomposed into two groups: ´¾Ñ ·½µ ¾Ñ lookup table with ¾ bits to produce -bit results Ð that are guaranteed correct to one ulp. By using a multiplier the higher order bits ( ) and the lower order bits ( ). Ñ ·½ contains the most significant bits and Ð contains the and a ½¾ KB lookup table, the basic algorithm generates a ½ remaining Ñ bits. ¾-bit result in two cycles. ½ ´Ñ ½µ Ñ ½·¾ Ý · ·¾ Ý ·¾ Ý ½ Ñ ½ Ñ (3) ´Ñ·½µ ´¾Ñ ½µ ¾ Ý · ·¾ Ý 1. Introduction Ñ·½ ¾Ñ ½ Ð (4) Ñ ½ ¾ ¾ ÑÜ Division is an important operation in many areas of comput- The range of is between and ( ), and Ñ ´¾Ñ ½µ ¼ ¾ ¾ ÐÑÜ ing, such as signal processing, computer graphics, network- the range of Ð is between and ( ). ing, numerical and scientific applications. In general, divi- Dividing by , we get Equations 5 and 6. Since Ñ ¾ ¡ sion algorithms may be divided into five categories: digit re- Ð , the maximum fractional error in Equation 6 is less ¾Ñ ½¾ currence, functional iteration, high radix, table lookup, and than ¾ (or ulp). variable latency. These algorithms differ in overall latency ´ µ Ð and area requirements. An overview of division algorithms (5) ¾ ¾ · Ð can be found in [4]. Ð ´ µ Ð This paper introduces a new high radix division algorithm (6) ¾ based on the well-known Taylor series expansion. A number of high radix division algorithms were also proposed in the Using Taylor series, Equation 5 can be expanded at Ð as past based on the Taylor series. For example, Farmwald [2] in Equation 7. The approximation in Equation 6 is equivalent proposed using multiple tables to look up the first few terms to combining the first two terms in the Taylor series. in the Taylor series. Later, Wong [5] proposed an elaborate ¾ iterative quotient approximation with multiple lookup tables. Ð Ð ´½ · µ (7) Wong demonstrated that only the first two terms in the Taylor ¾ · Ð series are necessary to achieve fast division because of the time to evaluate all the power terms. Figure 1 shows the block diagram of the algorithm. In The previous algorithms consider each individual term in ¾ the first step, the algorithm retrieves the value of ½ from a the Taylor series separately; hence, many lookup tables are ´ µ Ð lookup table and multiplies with at the same time. needed and the designs are complicated. Our proposed algo- ¾ ½ ¡ ´ µ Ð In the second step, and are multiplied rithm combines the first two terms of Taylor series together, together to generate the result. and only requires a small lookup table to generate accurate results. This algorithm achieves fast division by multiplying 2.1. Lookup Table Construction the dividend in the first step, which is done in parallel with the table lookup. In the second step, another multiplication To minimize the size of the lookup table, the table entries are operation is executed to generate the quotient. normalized such that the most significant bit (MSB) of each encoding schemes. In Booth 2 encoding, the multiplier is par- titioned into overlapping strings of 3 bits, and each string is used to select a single partial product. Unlike conventional Booth 2 encoding, the encoding of ´ µ Ð consists of four types of encoders. Figure 2 shows the locations of these four types of encoders: the Ð group contains all the 3-bit strings that reside entirely within Ð ; the boundary string contains some Ð bits as well as some bits; the first string is located next to the boundary string; the group contains all the remaining strings within . Figure 1: Basic Algorithm Figure 2: Booth Encoding entry is one. These MSB’s are therefore not stored in the Ð The bits represent positive numbers, whereas the table. bits represent negative numbers. Hence, conventional Booth ¾ Ñ ¿ ½ A lookup table with = is shown in Table 1. 2 encoding is used in the group but the partial products ¾ ¾Ñ ·¾ represents the truncated value of ½ to significant in the Ð group are negated. As shown in the diagram, the ¾ bits. The exponent part of the ½ may be stored in the Ð boundary region between and requires two additional same table, but can also be determined by some simple logic special encoders. Depending on whether Ñ is even or odd, ½¼¼ Ý Ý ¾ gates. In this example, the exponent is when ½ the encoding schemes for these two encoders are different. It Ý ¼ ¼½¼ Ý ¼ Ý Ý ½ ½ ¾ ¿ ¿ , the exponent is when and , is possible that only one such encoder is used in the bound- ¼¼½ Ý ½ the exponent is when ½ . ary region, but it implies that this encoder needs to generate ¿¢ Ñ multiplicand (for even ). In order to speed up the ¿ Table 1: A simple lookup table example (Ñ ) multiplication and simplify the encoding logic, two special encoders are used to avoid the “difficult” multiples. ¾ ½ Table entry Table 2 summarizes the four different encoding schemes Ñ ¼¼¼ ½¼¼¼¼¼¼¼ ¢ ½¼¼ ¼¼¼¼¼¼¼ ½ for both even and odd . It is important to note that the first encoder actually needs to examine both the first string ½¼¼½ ½½¼¼½¼½¼ ¢ ¼½¼ ½¼¼½¼½¼ and the boundary string when Ñ is odd. If the boundary string ½¼½¼ ½¼½¼¼¼½½ ¢ ¼½¼ ¼½¼¼¼½½ ½¼½ ¼ ½ is , the LSB of the first string is set to instead of .If ½¼½½ ½¼¼¼¼½½½ ¢ ¼½¼ ¼¼¼¼½½½ ½¼½ the boundary string is not , the LSB of the first string ½½¼¼ ½½½¼¼¼½½ ¢ ¼¼½ ½½¼¼¼½½ is set to be the MSB of the boundary string (as usual). This ½¼½ ½½¼¼¼¼¼½ ¢ ¼¼½ ½¼¼¼¼¼½ ½ encoding scheme uses all but two normal Booth encoders and ½½¼ ½¼½¼¼½½½ ¢ ¼¼½ ¼½¼¼½½½ ½ is particularly useful if the same multiplier hardware is used ½½½ ½¼¼½¼¼¼½ ¢ ¼¼½ ¼¼½¼¼¼½ ½ for both the first and the second multiplications. 2.2. Booth Encoding 2.3. Error Analysis Booth encoding algorithm [1] has widely been used to min- There are four sources of errors: Taylor series approximation Ì imize the number of partial product terms in a multiplier. In error ( ½ ), lookup table rounding error ( ), the rounding ½ our division algorithm, special Booth encoders are needed to error of the first multiplication ( Å ), and the rounding error ´ µ Å ¾ Ð achieve the multiplication without explicitly cal- of the second multiplication ( ). ´ µ · · · Ð ½ Ì Å ½ Å ¾ culating the value of . Lyu and Matula [3] proposed The total error is equal to . a general redundant binary booth recoding scheme. In our To minimize this error, the divider can be designed such that ¼ ¼ ¼ ¼ Ð ½ Ì Å ½ Å ¾ case, the and bits are non-overlapping, and a cheaper , , , and . This means Ñ ·¾ and faster encoding scheme is feasible. that the table entries are truncated to ¾ bits, the first Ñ ·¾ We use Booth 2 encoding to illustrate our encoding al- multiplication is truncated to ¾ bits, and the second Ñ gorithm, but the same principle can apply to the other Booth multiplication is rounded up to ¾ bits. Ì ´ µ In order to minimize ½ , is set to be slightly larger ´ µ ¾ Ð Table 2: Booth Encoding of ½ than . For each , the optimum table entry is deter- ¼ mined by setting the maximum positive error (at Ð )to Boundary First ÐÑÜ be the same as the maximum negative error (at Ð ). Ð even odd even odd Equation 10 shows the expression for the optimum table entry Ñ Ñ Ñ Bits Group Group Ñ Ì ´ µ ÓÔØ . ¼¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼¼ ½ ·½ ½ ½ ½ ¼ ·½ ¾ · ÐÑÜ ¼½ ¼ ·½ ½ ½ ·½ ·½ ·½ Ì ´ µ ÓÔØ (10) ´ · µ´¾ µ ÐÑÜ ÐÑÜ ¼½ ½ ·¾ ¾ ¾ ¼ ·½ ·¾ ½¼ ¼ ¾ ·¾ ·¾ ¾ ¾ ¾ ½ The approximation error is at its maximum when and ½¼ ½ ½ ·½ ·½ ·½ ¾ ½ ÐÑÜ Ð . Using Equation 9, the maximum approximation ¼ ½ ·½ ·½ ½ ½ ½ ½½ error can easily be derived as in Equation 11. In this case, ½½ ½ ¼ ¼ ¼ ¾ ½ ¼ ½ ½ is slightly less than ulp. ¾ ¡ ½ The Taylor series approximation error ( ) is determined ÐÑÜ ½ (11) ´½ · µ´¾ µ Ð ÐÑÜ by Equation 8. This error is most significant for large ÐÑÜ ½ and small . The maximum approximation error is ½ ½ Using round-to-nearest rounding mode, Å ulp, ½¾ ½ ÐÑÜ slightly less than ulp when Ð and . ½¾ ½ ¾ Ì Å ulp, and ulp. As in Section 2.3, ¾ ´ µ ¡ the total error of the alternative lookup table is also less than Ð Ð ½ (8) ¾ ¾ ½ ulp. Table 3 illustrates the same example shown in Sec- ¡ ÊÆ Ì ´ µ tion 2.1 with the alternative lookup table, where ¾Ñ ·¾ ´ µ ¾Ñ ·¾ The lookup table has significant bits, so the maximum Ì represents the round-to-nearest value of to sig- ½ truncation error Ì ulp. Similarly, the maximum nificant bits. ½ ½ rounding error for the first multiplication Å ulp, and the maximum rounding error for the second multiplica- ¿ Table 3: Alternative Lookup Table (Ñ ) ½ ¾ tion Å ulp. Thus, the maximum positive error is less ½ ¾ than ulp ( Å ), and the maximum negative error is also ÊÆ Ì ´ µ Table entry ½ · · Ì Å ½ less than ulp ( ½ ). ½¼¼¼ ½¼¼¼¼¼¼½ ¢ ½¼¼ ¼¼¼¼¼¼½ ½¼¼½ ½½¼¼½¼½½ ¢ ¼½¼ ½¼¼½¼½½ 3. Optimization
Recommended publications
  • Characterization Quaternaty Lookup Table in Standard Cmos Process
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 02 Issue: 07 | Oct-2015 www.irjet.net p-ISSN: 2395-0072 CHARACTERIZATION QUATERNATY LOOKUP TABLE IN STANDARD CMOS PROCESS S.Prabhu Venkateswaran1, R.Selvarani2 1Assitant Professor, Electronics and Communication Engineering, SNS College of Technology,Tamil Nadu, India 2 PG Scholar, VLSI Design, SNS College Of Technology, Tamil Nadu, India -------------------------------------------------------------------------**------------------------------------------------------------------------ Abstract- The Binary logics and devices have been process gets low power and easy to scale down, gate of developed with an latest technology and gate design MOS need much lower driving current than base current of area. The design and implementation of logical circuits two polar, scaling down increase CMOS speed Comparing become easier and compact. Therefore present logic MVL present high power consumption, due to current devices that can implement in binary and multi valued mode circuit element or require nonstandard multi logic system. In multi-valued logic system logic gates threshold CMOS technologies. multiple-value logic are increase the high power required for level of transition varies in different logic systems, a quaternary has and increase the number of required interconnections become mature in terms of logic algebra and gates. ,hence also increasing the overall energy. Interconnections Some multi valued logic systems such as ternary and are increase the dominant contributor to delay are and quaternary logic schemes have been developed. energy consumption in CMOS digital circuit. Quaternary logic has many advantages over binary logic. Since it require half the number of digits to store 2. QUATERNARY LOGIC AND LUTs any information than its binary equivalent it is best for storage; the quaternary storage mechanism is less than Ternary logic system: It is based upon CMOS compatible twice as complex as the binary system.
    [Show full text]
  • A Reduced-Complexity Lookup Table Approach to Solving Mathematical Functions in Fpgas
    A reduced-complexity lookup table approach to solving mathematical functions in FPGAs Michael Low, Jim P. Y. Lee Defence R&D Canada --- Ottawa Technical Memorandum DRDC Ottawa TM 2008-261 December 2008 A reduced-complexity lookup table approach to solving mathematical functions in FPGAs Michael Low Jim P. Y. Lee DRDC Ottawa Defence R&D Canada – Ottawa Technical Memorandum DRDC Ottawa TM 2008-261 December 2008 Principal Author Original signed by Michael Low Michael Low Defence Scientist, Radar Electronic Warfare Approved by Original signed by Jean-F. Rivest Jean-F. Rivest Head, Radar Electronic Warfare Approved for release by Original signed by Pierre Lavoie Pierre Lavoie Chief Scientist, DRDC Ottawa © Her Majesty the Queen in Right of Canada, as represented by the Minister of National Defence, 2008 © Sa Majesté la Reine (en droit du Canada), telle que représentée par le ministre de la Défense nationale, 2008 Abstract …….. Certain mathematical functions, such as the inverse, log, and arctangent, have traditionally been implemented in the digital domain using the Coordinate Rotation Digital Computer (CORDIC) algorithm. In this study, it is shown that it is possible to achieve a similar degree of numerical accuracy using a reduced-complexity lookup table (RCLT) algorithm, which is much less computationally-intensive than the CORDIC. On programmable digital signal processor (DSP) chips, this reduces computation time. On field-programmable gate arrays (FPGAs), and application-specific integrated circuits (ASICs), this reduces the consumption of hardware resources. This paper presents the results of a study in which three commonly-used functions, i.e. inverse, log, and arctangent functions, have been synthesized for the Xilinx Virtex-II family of FPGAs, using both the RCLT and CORDIC implementations.
    [Show full text]
  • AC Induction Motor Control Using the Constant V/F Principle and a Natural
    AVR494: AC Induction Motor Control Using the constant V/f Principle and a Natural PWM Algorithm 8-bit Microcontrollers Features • Cost-effective and flexible 3-phase induction motor drive • Interrupt driven Application Note • Low memory and computing requirements 1. Introduction Electrical power has been used for a long time to produce mechanical motion (either rotation or translation), thanks to electromechanical actuators. It is estimated that 50% of the electrical power produced in the United States is consumed by electrical motors. More than 50 motors can typically be found in a house, and nearly as many in a car. To preserve the environment and to reduce green-house effect gas emissions, gov- ernments around the world are introducing regulations requiring white goods manufacturers and industrial factories to produce more energy efficient appliances. Most often, this goal can be reached by an efficient drive and control of the motor speed. This is the reason why appliance designers and semiconductor suppliers are now interested by the design of low-cost and energy-efficient variable speed drives. Because of their high robustness, reliability, low cost and high efficiency (≈ 80%), AC induction motors are used in many industrial applications such as • appliances (washers, blowers, refrigerators, fans, vacuum cleaners, compressors …); • HVAC (heating, ventilation and air conditioning); • industrial drives (motion control, centrifugal pumps, robotics, …); • automotive control (electric vehicles) However, induction motors can only run at their rated speed when they are connected to the main power supply. This is the reason why variable frequency drives are needed to vary the rotor speed of an induction motor. The most popular algorithm for the control of a three-phase induction motor is the V/f control approach using a natural pulse-width modulation (PWM) technique to drive a voltage-source inverter (VSI), as shown on Figure 1-1.
    [Show full text]
  • Binary Search Algorithm Anthony Lin¹* Et Al
    WikiJournal of Science, 2019, 2(1):5 doi: 10.15347/wjs/2019.005 Encyclopedic Review Article Binary search algorithm Anthony Lin¹* et al. Abstract In In computer science, binary search, also known as half-interval search,[1] logarithmic search,[2] or binary chop,[3] is a search algorithm that finds a position of a target value within a sorted array.[4] Binary search compares the target value to an element in the middle of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array. Binary search runs in logarithmic time in the worst case, making 푂(log 푛) comparisons, where 푛 is the number of elements in the array, the 푂 is ‘Big O’ notation, and 푙표푔 is the logarithm.[5] Binary search is faster than linear search except for small arrays. However, the array must be sorted first to be able to apply binary search. There are spe- cialized data structures designed for fast searching, such as hash tables, that can be searched more efficiently than binary search. However, binary search can be used to solve a wider range of problems, such as finding the next- smallest or next-largest element in the array relative to the target even if it is absent from the array. There are numerous variations of binary search.
    [Show full text]
  • A ' Microprocessor-Based Lookup Bearing Table
    NASA Technical Paper 1838 A'Microprocessor-Based Table Lookup Approach . for Magnetic Bearing Linearization . -. Nelson J. Groom and, James B. Miller .- MAY 1981 NASA .. TECH LIBRARY KAFB, NM NASA Technical Paper 1838 A Microprocessor-BasedTable Lookup Approach for. Magnetic BearingLinearization Nelson J. Groom and James B. Miller LarrgleyResearch Ceuter Hatnpton,Virgilria National Aeronautics and Space Administration Scientific and Technical Information Branch 1981 SUMMARY An approach forproduci ng a linear transfer chlar acteristic between force command and force output of a magnetic bearing actuator without flux biasing is pres.ented. The approach is mlcroprocessor based and uses a table 'lookup to generatedrive signals for the magnetiCbearing power driver. An experimental test setup used to demonstrate the feasibility of the approach is described, and testresults are presented. The testsetup contains bearing elements simi- lar to those used in a laboratory model annular momentum control device (AMCD) . INTRODUCTION ,. This paper describes an approach for producing a lineartransfer charac- teristic between the force command and force output of a magneticbearing actuator. The approach, which is microprocessor based and uses a table lookup to generate drive signals for the magneticbearing actuator power driver, was investigated for application to a laboratory model annular momentum control device (AKD) . The laboratory model (described in ref. 1 ) was built to inves- tigate potential problem areas in implementing the AMCD concept and is being used aspart ofan AKD hardware technology development program. The basic AMCD concept is that of a rotating annular rim,suspended by a minimum of three magneticbearing suspension stations and driven by a noncontacting elec- tromagneticspin motor.
    [Show full text]
  • Design and Implementation of Improved NCO Based on FPGA
    Advances in Computer Science Research (ACSR), volume 90 3rd International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2019) Design and Implementation of Improved NCO based on FPGA Yanshuang Chen a, Jun Yang b, * School of Information Science and Engineering Yunnan University, Yun'nanKunming650091, China a [email protected], b, * [email protected] Abstract. A numerically controlled oscillator (NCO) is used to generate quadrature controllable sine and cosine waves and is an important part of software radio. The traditional NCO module is implemented based on the lookup table structure, which requires a large amount of hardware storage resources inside the FPGA. Therefore, the CORDIC algorithm is used to implement the NCO module, and the output accuracy is improved by improving the CORDIC algorithm. At the same time, the FPGA technology is characterized by strong reconfigurability, good scalability, low hardware resources. The module is designed with Verilog HDL language. Finally, the NCO model based on FPGA design has the characteristics of low hardware resource consumption and high output precision. The model was simulated by Modelsim and downloaded to the target chip verification of Altera DE2's EP2C35F672C6. The digitally controlled oscillator met the design requirements. Keywords: Digital controlled oscillator (NCO); CORDIC algorithm; Pipeline structure; FPGA; precision. 1. Introduction The numerically controlled oscillator (NCO) is an important component of the signal processing system [1], and is mainly used to generate orthogonally controllable sine and cosine waves. With the continuous development of modern communication systems, digital control oscillators have been widely used in digital communication, signal processing and other fields as an important part of software radio [2].
    [Show full text]
  • Pushing the Communication Barrier in Secure Computation Using Lookup Tables (Full Version)∗
    Pushing the Communication Barrier in Secure Computation using Lookup Tables (Full Version)∗ Ghada Dessouky1, Farinaz Koushanfar2, Ahmad-Reza Sadeghi1, Thomas Schneider3, Shaza Zeitouni1, and Michael Zohner3 1TU Darmstadt, System Security Lab, [email protected] 2University of California, Adaptive Computing and Embedded Systems Lab, [email protected] 3TU Darmstadt, Engineering Cryptographic Protocols Group, [email protected] Abstract Secure two-party computation has witnessed significant efficiency improvements in the recent years. Current implementations of protocols with security against passive adversaries generate and process data much faster than it can be sent over the network, even with a single thread. This paper introduces novel methods to further reduce the communication bottleneck and round complexity of semi-honest secure two-party computation. Our new methodology creates a trade-off between communication and computation, and we show that the added computing cost for each party is still feasible and practicable in light of the new communication savings. We first improve communication for Boolean circuits with 2-input gates by factor 1.9x when evaluated with the protocol of Goldreich-Micali-Wigderson (GMW). As a further step, we change the conventional Boolean circuit representation from 2-input gates to multi-input/multi-output lookup tables (LUTs) which can be programmed to realize arbitrary functions. We construct two protocols for evaluating LUTs offering a trade-off between online communication and total communication. Our most efficient LUT-based protocol reduces the communication and round complexity by a factor 2-4x for several basic and complex operations. Our proposed scheme results in a significant overall runtime decrease of up to a factor of 3x on several benchmark functions.
    [Show full text]
  • Dissertation a Methodology for Automated Lookup
    DISSERTATION A METHODOLOGY FOR AUTOMATED LOOKUP TABLE OPTIMIZATION OF SCIENTIFIC APPLICATIONS Submitted by Chris Wilcox Department of Computer Science In partial fulfillment of the requirements for the Degree of Doctor of Philosophy Colorado State University Fort Collins, Colorado Spring 2012 Doctoral Committee: Advisor: Michelle Mills Strout Co-Advisor: James M. Bieman Anton P. W. B¨ohm Daniel Turk ABSTRACT A METHODOLOGY FOR AUTOMATED LOOKUP TABLE OPTIMIZATION OF SCIENTIFIC APPLICATIONS Tuning the performance of scientific codes is challenging because of their math-intensive nature. Applications such as climate modeling and molecular biology simulate the behavior of natural systems based on scientific equations. Translating these equations into code can often yield expressions that are expensive to evaluate. Trigonometric, logarithmic, and exponential elementary functions are especially problematic because of their high cost relative to ordinary arithmetic. Lookup table (LUT) transformation can expedite function evaluation by precomputing and storing function results, thereby allowing inexpensive memory lookups to replace costly function calls. Practical concerns limit the size and accuracy of LUT data, thus the technique represents a tradeoff between error and performance. Current practice has the programmer apply each LUT transform manually, thereby impacting productivity, obfuscating code, and limiting programmer control over accuracy and performance. The goal of our research is to help scientific programmers use LUT techniques in a more effective and productive manner. Our approach substantially automates the process of applying LUT transformation via a methodology and its implementation in the Mesa tool. Mesa incorporates algorithms that make adding a LUT transform easier for the programmer, including expression enumeration, domain profiling, error analysis, performance modeling, and code generation.
    [Show full text]
  • Download Article (PDF)
    International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2016) The Design of Direct Digital Synthesizer Based On Cordic Algorithm and FPGA Implementation Xian Zhang1, 2, a, Xinghua Zhao1, 2 and Binquan Zhou1, 2, b 1National Key Laboratory of Inertial Technology, Beihang University, Beijing 100191, China; 2School of Instrument Science and Opto-electronics Engineering, Beihang University, Beijing 100191, China. [email protected], [email protected] Abstract. Currently, there are many methods that can implement direct digital synthesizer (DDS) including ROM look-up table method, Cordic algorithm and so on. Traditional DDS usually adopts the method of look-up table, which needs ROM to save a lot of waveform data. But the frequency resolution of DDS depends on how many ROM can save. Obviously the more data ROM can save, the better frequency resolution will be. Thus it takes the more ROM resources and the higher power dissipation while it gets the lower reliability and the slower conversion speed. However, Cordic algorithm use the method of iteration, which can generate high-resolution local oscillator signal through the shift, accumulation and subtraction operation. Compared with ROM look-up table, it reduces the occupation of the resources of ROM substantially. Therefore, this paper use Cordic algorithm to design DDS. At the same time, the Cordic algorithm is programmed into the FPGA device, which can generate sine and cosine waveform. The frequency and amplitude of waveform can be adjusted by the host computer. Then the desired waveform is simulated in Modelsim and Chipscope platform. As a result, the function of waveform is debugged and verified in hardware platform successfully.
    [Show full text]
  • On the Claim That a Table-Lookup Program Could Pass the Turing Test to Appear, Minds and Machines
    On The Claim That A Table-Lookup Program Could Pass The Turing Test To appear, Minds and Machines. Visit link.springer.com. Drew McDermott the date of receipt and acceptance should be inserted later Abstract The claim has often been made that passing the Turing Test would not be suffi- cient to prove that a computer program was intelligent because a trivial program could do it, namely, the “humongous table (ht) program,” which simply looks up in a table what to say next. This claim is examined in detail. Three ground rules are argued for: 1. That the HT program must be exhaustive, and not be based on some vaguely imagined set of tricks. 2. That the HT program must not be created by some set of sentient beings enacting re- sponses to all possible inputs. 3. That in the current state of cognitive science it must be an open possibility that a computational model of the human mind will be developed that ac- counts for at least its nonphenomenological properties. Given ground rule 3, the HT program could simply be an “optimized” version of some computational model of a mind, created via the automatic application of program-transformation rules (thus satisfying ground rule 2). Therefore, whatever mental states one would be willing to impute to an ordinary computa- tional model of the human psyche one should be willing to grant to the optimized version as well. Hence no one could dismiss out of hand the possibility that the HT program was intelligent. This conclusion is important because the Humongous-Table-Program Argument is the only argument ever marshalled against the sufficiency of the Turing Test, if we exclude arguments that cognitive science is simply not possible.
    [Show full text]
  • Lecture 5 Memoization/Dynamic Programming the String
    CS125 Lecture 5 Fall 2016 Memoization/Dynamic Programming Today’s lecture discusses memoization, which is a method for speeding up algorithms based on recursion, by using additional memory to remember already-computed answers to subproblems. It is quite similar to dynamic programming, which is the iterative version. (Mantra: memoization is to recursion as dynamic programming is to for loops.) The String reconstruction problem The greedy approach doesn’t always work, as we have seen. It lacks flexibility; if at some point, it makes a wrong choice, it becomes stuck. For example, consider the problem of string reconstruction. Suppose that all the blank spaces and punctuation marks inadvertently have been removed from a text file. You would like to reconstruct the file, using a dictionary. (We will assume that all words in the file are standard English.) For example, the string might begin “thesearethereasons”. A greedy algorithm would spot that the first two words were “the” and “sea”, but then it would run into trouble. We could backtrack; we have found that sea is a mistake, so looking more closely, we might find the first three words “the”,“sear”, and “ether”. Again there is trouble. In general, we might end up spending exponential time traveling down false trails. (In practice, since English text strings are so well behaved, we might be able to make this work– but probably not in other contexts, such as reconstructing DNA sequences!) This problem has a nice structure, however, that we can take advantage of. The problem can be broken down into entirely similar subproblems. For example, we can ask whether the strings “theseare” and “thereasons” both can be reconstructed with a dictionary.
    [Show full text]
  • EE 423: Class Demo 1 Introduction in Lab 1, Two Different Methods for Generating a Sinusoid Are Used
    EE 423: Class Demo 1 Introduction In lab 1, two different methods for generating a sinusoid are used. In sine gen.c, the built-in C function sin() is used to approximate the function sin(·) and interrupts are used to send the samples to the codec. For this implementation, the samples of the sine function are calculated at every interrupt. If the discrete-time sequence being generated is periodic, this implementation is a waste of resources1. When the discrete-time sinusoid sequence is periodic, a pre-computed lookup table may be stored in memory and samples may be read from the table and outputted to the codec. This is the way that sine lookup poll.c is coded. However, in this implementation, a polling (instead of interrupt driven) method is used to send the samples to the codec. In this demo, a mixture of these two methods mentioned above will be used. The idea of having a lookup table from sine lookup poll.c will be used to store samples of a discrete-time periodic. However, instead of calculating and storing these values by hand, the built-in C function sin() will be used to generate the lookup table. The values in the lookup table will be accessed in the same manner as sine lookup poll.c; however, the codec will be accessed using interrupts as in sine gen.c. In this demo, you will study: • code debugging, • overdriving the codec, and • code modification. Demo 1 Preliminaries Before continuing with this demo, make sure that the following tasks from the Lab 1 write-up have been completed (in the given order): 1.
    [Show full text]