Elementary Functions: Towards Automatically Generated, Efficient
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The 'Crisis of Noosphere'
The ‘crisis of noosphere’ as a limiting factor to achieve the point of technological singularity Rafael Lahoz-Beltra Department of Applied Mathematics (Biomathematics). Faculty of Biological Sciences. Complutense University of Madrid. 28040 Madrid, Spain. [email protected] 1. Introduction One of the most significant developments in the history of human being is the invention of a way of keeping records of human knowledge, thoughts and ideas. The storage of knowledge is a sign of civilization, which has its origins in ancient visual languages e.g. in the cuneiform scripts and hieroglyphs until the achievement of phonetic languages with the invention of Gutenberg press. In 1926, the work of several thinkers such as Edouard Le Roy, Vladimir Ver- nadsky and Teilhard de Chardin led to the concept of noosphere, thus the idea that human cognition and knowledge transforms the biosphere coming to be something like the planet’s thinking layer. At present, is commonly accepted by some thinkers that the Internet is the medium that brings life to noosphere. Hereinafter, this essay will assume that the words Internet and noosphere refer to the same concept, analogy which will be justified later. 2 In 2005 Ray Kurzweil published the book The Singularity Is Near: When Humans Transcend Biology predicting an exponential increase of computers and also an exponential progress in different disciplines such as genetics, nanotechnology, robotics and artificial intelligence. The exponential evolution of these technologies is what is called Kurzweil’s “Law of Accelerating Returns”. The result of this rapid progress will lead to human beings what is known as tech- nological singularity. -
Reconfigurable Embedded Control Systems: Problems and Solutions
RECONFIGURABLE EMBEDDED CONTROL SYSTEMS: PROBLEMS AND SOLUTIONS By Dr.rer.nat.Habil. Mohamed Khalgui ⃝c Copyright by Dr.rer.nat.Habil. Mohamed Khalgui, 2012 v Martin Luther University, Germany Research Manuscript for Habilitation Diploma in Computer Science 1. Reviewer: Prof.Dr. Hans-Michael Hanisch, Martin Luther University, Germany, 2. Reviewer: Prof.Dr. Georg Frey, Saarland University, Germany, 3. Reviewer: Prof.Dr. Wolf Zimmermann, Martin Luther University, Germany, Day of the defense: Monday January 23rd 2012, Table of Contents Table of Contents vi English Abstract x German Abstract xi English Keywords xii German Keywords xiii Acknowledgements xiv Dedicate xv 1 General Introduction 1 2 Embedded Architectures: Overview on Hardware and Operating Systems 3 2.1 Embedded Hardware Components . 3 2.1.1 Microcontrollers . 3 2.1.2 Digital Signal Processors (DSP): . 4 2.1.3 System on Chip (SoC): . 5 2.1.4 Programmable Logic Controllers (PLC): . 6 2.2 Real-Time Embedded Operating Systems (RTOS) . 8 2.2.1 QNX . 9 2.2.2 RTLinux . 9 2.2.3 VxWorks . 9 2.2.4 Windows CE . 10 2.3 Known Embedded Software Solutions . 11 2.3.1 Simple Control Loop . 12 2.3.2 Interrupt Controlled System . 12 2.3.3 Cooperative Multitasking . 12 2.3.4 Preemptive Multitasking or Multi-Threading . 12 2.3.5 Microkernels . 13 2.3.6 Monolithic Kernels . 13 2.3.7 Additional Software Components: . 13 2.4 Conclusion . 14 3 Embedded Systems: Overview on Software Components 15 3.1 Basic Concepts of Components . 15 3.2 Architecture Description Languages . 17 3.2.1 Acme Language . -
YANLI: a Powerful Natural Language Front-End Tool
AI Magazine Volume 8 Number 1 (1987) (© AAAI) John C. Glasgow I1 YANLI: A Powerful Natural Language Front-End Tool An important issue in achieving acceptance of computer sys- Since first proposed by Woods (1970), ATNs have be- tems used by the nonprogramming community is the ability come a popular method of defining grammars for natural to communicate with these systems in natural language. Of- language parsing programs. This popularity is due to the ten, a great deal of time in the design of any such system is flexibility with which ATNs can be applied to hierarchically devoted to the natural language front end. An obvious way to describable phenomenon and to their power, which is that of simplify this task is to provide a portable natural language a universal Turing machine. As discussed later, a grammar front-end tool or facility that is sophisticated enough to allow in the form of an ATN (written in LISP) is more informative for a reasonable variety of input; allows modification; and, than a grammar in Backus-Naur notation. As an example, yet, is easy to use. This paper describes such a tool that is the first few lines of a grammar for an English sentence writ- based on augmented transition networks (ATNs). It allows ten in Backus-Naur notation might look like the following: for user input to be in sentence or nonsentence form or both, provides a detailed parse tree that the user can access, and sentence : := [subject] predicate also provides the facility to generate responses and save in- subject ::= noun-group formation. -
Turing's Influence on Programming — Book Extract from “The Dawn of Software Engineering: from Turing to Dijkstra”
Turing's Influence on Programming | Book extract from \The Dawn of Software Engineering: from Turing to Dijkstra" Edgar G. Daylight∗ Eindhoven University of Technology, The Netherlands [email protected] Abstract Turing's involvement with computer building was popularized in the 1970s and later. Most notable are the books by Brian Randell (1973), Andrew Hodges (1983), and Martin Davis (2000). A central question is whether John von Neumann was influenced by Turing's 1936 paper when he helped build the EDVAC machine, even though he never cited Turing's work. This question remains unsettled up till this day. As remarked by Charles Petzold, one standard history barely mentions Turing, while the other, written by a logician, makes Turing a key player. Contrast these observations then with the fact that Turing's 1936 paper was cited and heavily discussed in 1959 among computer programmers. In 1966, the first Turing award was given to a programmer, not a computer builder, as were several subsequent Turing awards. An historical investigation of Turing's influence on computing, presented here, shows that Turing's 1936 notion of universality became increasingly relevant among programmers during the 1950s. The central thesis of this paper states that Turing's in- fluence was felt more in programming after his death than in computer building during the 1940s. 1 Introduction Many people today are led to believe that Turing is the father of the computer, the father of our digital society, as also the following praise for Martin Davis's bestseller The Universal Computer: The Road from Leibniz to Turing1 suggests: At last, a book about the origin of the computer that goes to the heart of the story: the human struggle for logic and truth. -
Andre Heidekrueger
AMD Heterogenous Computing X86 in development AMD new CPU and Accelerator Designs Building blocks for Heterogenous computing with the GPU Accelerators and the Latest x86 Platform Innovations 1 | Hot Chips | August, 2010 Server Industry Trends China has Seismic The performance of the 265m datasets online gamers fastest supercomputer typically exceed a terabyte grew 500x in the last decade The top 8 systems Accelerator-based on the Green 500 list use accelerators 800 servers on the Green 500 images list are 3x as energy are uploaded to Facebook efficient as those without every second 2 | Hot Chips | August,accelerators 2010 2 Top500.org Performance Projections: Can the Current Trajectory Achieve Exascale? 1 EFlops Might get there on current trajectory, but… • Too late for major government programs leading to 2018 • System power in traditional x86 architecture would be unmanageable Source for chart: Top500.org; annotations by AMD 3 | Hot Chips | August, 2010 Three Eras of Processor Performance Multi-Core Heterogeneous Single-Core Systems Era Era Era Enabled by: Enabled by: Enabled by: Moore’s Law Moore’s Law Moore’s Law Voltage Scaling Desire for Throughput Abundant data parallelism Micro-Architecture 20 years of SMP arch Power efficient GPUs Constrained by: Constrained by: Temporarily constrained by: Power Power Programming models Complexity Parallel SW availability Communication overheads Scalability o o ? we are we are here o here we are Performance here thread Performance - Time Time Application Targeted Time Throughput Throughput Performance (# of Processors) (Data-parallel exploitation) Single 4 | Driving HPC Performance Efficiency Fusion Architecture The Benefits of Heterogeneous Computing x86 CPU owns GPU Optimized for the Software World Modern Workloads . -
Generating Highly Accurate Values for All Trigonometric Functions
GENERATING HIGHLY ACCURATE VALUES FOR ALL TRIGONOMETRIC FUNCTIONS INTRODUCTION: Although hand calculators and electronic computers have pretty much eliminated the need for trigonometric tables starting about sixty years ago, this was not the case back during WWII when highly accurate trigonometric tables were vital for the calculation of projectile trajectories. At that time a good deal of the war effort in mathematics was devoted to building ever more accurate trig tables some ranging up to twenty digit numerical accuracy. Today such efforts can be considered to have been a waste of time, although they were not so at the time. It is our purpose in this note to demonstrate a new approach for quickly obtaining values for all trigonometric functions exceeding anything your hand calculator can achieve. BASIC TRIGNOMETRIC FUNCTIONS: We begin by defining all six basic trigonometric functions by looking at the following right triangle- The sides a,b, and c of this triangle satisfy the Pythagorean Theorem- a2+b2=c2 we define the six trig functions in terms of this triangle as- sin()=b/c cos( )=a/c tan()=b/a csc()=c/b sec()=c/a) cot()=b/a From Pythagorous we also have at once that – 1 sin( )2 cos( )2 1 and [1 tan( )2 ] . cos( )2 It was the ancient Egyptian pyramid builders who first realized that the most important of these trigonometric functions is tan(). Their measure of this tangent was given in terms of sekels, where 1 sekhel=7. Converting each of the above trig functions into functions of tan() yields- tan( ) 1 sin( ) cos( ) tan( ) tan( ) 1 tan( )2 1 tan( )2 1 tan( )2 1 csc( ) sec( ) 1 tan( )2 cot( ) tan( ) tan( ) In using these equivalent definitions one must be careful of signs and infinities appearing in the approximations to tan(). -
Hierarchical Roofline Analysis for Gpus: Accelerating Performance
Hierarchical Roofline Analysis for GPUs: Accelerating Performance Optimization for the NERSC-9 Perlmutter System Charlene Yang, Thorsten Kurth Samuel Williams National Energy Research Scientific Computing Center Computational Research Division Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA Berkeley, CA 94720, USA fcjyang, [email protected] [email protected] Abstract—The Roofline performance model provides an Performance (GFLOP/s) is bound by: intuitive and insightful approach to identifying performance bottlenecks and guiding performance optimization. In prepa- Peak GFLOP/s GFLOP/s ≤ min (1) ration for the next-generation supercomputer Perlmutter at Peak GB/s × Arithmetic Intensity NERSC, this paper presents a methodology to construct a hi- erarchical Roofline on NVIDIA GPUs and extend it to support which produces the traditional Roofline formulation when reduced precision and Tensor Cores. The hierarchical Roofline incorporates L1, L2, device memory and system memory plotted on a log-log plot. bandwidths into one single figure, and it offers more profound Previously, the Roofline model was expanded to support insights into performance analysis than the traditional DRAM- the full memory hierarchy [2], [3] by adding additional band- only Roofline. We use our Roofline methodology to analyze width “ceilings”. Similarly, additional ceilings beneath the three proxy applications: GPP from BerkeleyGW, HPGMG Roofline can be added to represent performance bottlenecks from AMReX, and conv2d from TensorFlow. In so doing, we demonstrate the ability of our methodology to readily arising from lack of vectorization or the failure to exploit understand various aspects of performance and performance fused multiply-add (FMA) instructions. bottlenecks on NVIDIA GPUs and motivate code optimizations. -
What Is a Closed-Form Number? Timothy Y. Chow the American
What is a Closed-Form Number? Timothy Y. Chow The American Mathematical Monthly, Vol. 106, No. 5. (May, 1999), pp. 440-448. Stable URL: http://links.jstor.org/sici?sici=0002-9890%28199905%29106%3A5%3C440%3AWIACN%3E2.0.CO%3B2-6 The American Mathematical Monthly is currently published by Mathematical Association of America. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/maa.html. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For more information regarding JSTOR, please contact [email protected]. http://www.jstor.org Mon May 14 16:45:04 2007 What Is a Closed-Form Number? Timothy Y. Chow 1. INTRODUCTION. When I was a high-school student, I liked giving exact answers to numerical problems whenever possible. If the answer to a problem were 2/7 or n-6 or arctan 3 or el/" I would always leave it in that form instead of giving a decimal approximation. -
Run-Time Adaptation to Heterogeneous Processing Units for Real-Time Stereo Vision
Run-time Adaptation to Heterogeneous Processing Units for Real-time Stereo Vision Benjamin Ranft Oliver Denninger FZI Research Center for Information Technology FZI Research Center for Information Technology Karlsruhe, Germany Karlsruhe, Germany [email protected] [email protected] Abstract—Todays systems from smartphones to workstations task are becoming increasingly parallel and heterogeneous: Pro- data cessing units not only consist of more and more identical element cores – furthermore, systems commonly contain either a discrete general-purpose GPU alongside with their CPU or even integrate both on a single chip. To benefit from this trend, software should utilize all available resources and adapt to varying configurations, including different CPU and GPU performance or competing processes. This paper investigates parallelization and adaptation strate- gies using dense stereo vision as an example application – a basis e. g. for advanced driver assistance systems, but also robotics or gesture recognition. At this, task-driven as well as data element- and data flow-driven parallelization approaches data are feasible. To achieve real-time performance, we first utilize flow data element-parallelism individually on each processing unit. Figure 1. Parallelization approaches offered by our application On this basis, we develop and implement further strategies for heterogeneous systems and automatic adaptation to the units (GPUs) often outperform multi-core CPUs with single hardware available at run-time. Each approach is described instruction, multiple data (SIMD) capabilities w. r. t. frame concerning i. a. the propagation of data to processors and its rates and energy efficiency [4], [5], although there are relation to established methods. An experimental evaluation with multiple test systems and usage scenarious reveals advan- exceptions [6]. -
A Dual-Purpose Real/Complex Logarithmic Number System ALU
A Dual-Purpose Real/Complex Logarithmic Number System ALU Mark G. Arnold Sylvain Collange Computer Science and Engineering ELIAUS Lehigh University Universite´ de Perpignan Via Domitia [email protected] [email protected] Abstract—The real Logarithmic Number System (LNS) allows advantages to using logarithmic arithmetic to represent <[X¯] fast and inexpensive multiplication and division but more expen- and =[X¯]. Several hundred papers [26] have considered con- sive addition and subtraction as precision increases. Recent ad- ventional, real-valued LNS; most have found some advantages vances in higher-order and multipartite table methods, together with cotransformation, allow real LNS ALUs to be implemented for low-precision implementation of multiply-rich algorithms, effectively on FPGAs for a wide variety of medium-precision like (1) and (2). special-purpose applications. The Complex LNS (CLNS) is a This paper considers an even more specialized number generalization of LNS which represents complex values in log- system in which complex values are represented in log-polar polar form. CLNS is a more compact representation than coordinates. The advantage is that the cost of complex multi- traditional rectangular methods, reducing the cost of busses and memory in intensive complex-number applications like the plication and division is reduced even more than in rectangular FFT; however, prior CLNS implementations were either slow LNS at the cost of a very complicated addition algorithm. CORDIC-based or expensive 2D-table-based approaches. This This paper proposes a novel approach to reduce the cost of paper attempts to leverage the recent advances made in real- this log-polar addition algorithm by using a conventional real- valued LNS units for the more specialized context of CLNS. -
Elementary Functions and Their Inverses*
ELEMENTARYFUNCTIONS AND THEIR INVERSES* BY J. F. RITT The chief item of this paper is the determination of all elementary functions whose inverses are elementary. The elementary functions are understood here to be those which are obtained in a finite number of steps by performing algebraic operations and taking exponentials and logarithms. For instance, the function tan \cf — log,(l + Vz)] + [«*+ log arc sin*]1« is elementary. We prove that if F(z) and its inverse are both elementary, there exist n functions Vt(*)j <hiz), •', <Pniz), where each <piz) with an odd index is algebraic, and each (p(z) with an even index is either e? or log?, such that F(g) = <pn(pn-f-<Pn9i(z) each q>i(z)ii<ît) being substituted for z in ^>iiiz). That every F(z) of this type has an elementary inverse is obvious. It remains to develop a method for recognizing whether a given elementary function can be reduced to the above form for F(.z). How to test fairly simple functions will be evident from the details of our proofs. For the immediate present, we let the general question stand. The present paper is an addition to Liouville's work of almost a century ago on the classification of the elementary functions, on the possibility of effecting integrations in finite terms, and on the impossibility of solving certain differentia] equations, and certain transcendental equations, in finite terms.!" Free use is made here of the ingenious methods of Liouville. * Presented to the Society, October 25, 1924. -(•Journal de l'Ecole Polytechnique, vol. -
A Compiler Target Model for Line Associative Registers
University of Kentucky UKnowledge Theses and Dissertations--Electrical and Computer Engineering Electrical and Computer Engineering 2019 A Compiler Target Model for Line Associative Registers Paul S. Eberhart University of Kentucky, [email protected] Digital Object Identifier: https://doi.org/10.13023/etd.2019.141 Right click to open a feedback form in a new tab to let us know how this document benefits ou.y Recommended Citation Eberhart, Paul S., "A Compiler Target Model for Line Associative Registers" (2019). Theses and Dissertations--Electrical and Computer Engineering. 138. https://uknowledge.uky.edu/ece_etds/138 This Master's Thesis is brought to you for free and open access by the Electrical and Computer Engineering at UKnowledge. It has been accepted for inclusion in Theses and Dissertations--Electrical and Computer Engineering by an authorized administrator of UKnowledge. For more information, please contact [email protected]. STUDENT AGREEMENT: I represent that my thesis or dissertation and abstract are my original work. Proper attribution has been given to all outside sources. I understand that I am solely responsible for obtaining any needed copyright permissions. I have obtained needed written permission statement(s) from the owner(s) of each third-party copyrighted matter to be included in my work, allowing electronic distribution (if such use is not permitted by the fair use doctrine) which will be submitted to UKnowledge as Additional File. I hereby grant to The University of Kentucky and its agents the irrevocable, non-exclusive, and royalty-free license to archive and make accessible my work in whole or in part in all forms of media, now or hereafter known.