How Low Can You Go?
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
System Design for a Computational-RAM Logic-In-Memory Parailel-Processing Machine
System Design for a Computational-RAM Logic-In-Memory ParaIlel-Processing Machine Peter M. Nyasulu, B .Sc., M.Eng. A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Doctor of Philosophy Ottaw a-Carleton Ins titute for Eleceical and Computer Engineering, Department of Electronics, Faculty of Engineering, Carleton University, Ottawa, Ontario, Canada May, 1999 O Peter M. Nyasulu, 1999 National Library Biôiiothkque nationale du Canada Acquisitions and Acquisitions et Bibliographie Services services bibliographiques 39S Weiiington Street 395. nie WeUingtm OnawaON KlAW Ottawa ON K1A ON4 Canada Canada The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, ban, distribute or seU reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microficbe/nlm, de reproduction sur papier ou sur format électronique. The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fkom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. Abstract Integrating several 1-bit processing elements at the sense amplifiers of a standard RAM improves the performance of massively-paralle1 applications because of the inherent parallelism and high data bandwidth inside the memory chip. -
Getting Started Computing at the Al Lab by Christopher C. Stacy Abstract
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLI..IGENCE LABORATORY WORKING PAPER 235 7 September 1982 Getting Started Computing at the Al Lab by Christopher C. Stacy Abstract This document describes the computing facilities at the M.I.T. Artificial Intelligence Laboratory, and explains how to get started using them. It is intended as an orientation document for newcomers to the lab, and will be updated by the author from time to time. A.I. Laboratory Working Papers are produced for internal circulation. and may contain information that is, for example, too preliminary or too detailed for formal publication. It is not intended that they should be considered papers to which reference can be made in the literature. a MASACHUSETS INSTITUTE OF TECHNOLOGY 1982 Getting Started Table of Contents Page i Table of Contents 1. Introduction 1 1.1. Lisp Machines 2 1.2. Timesharing 3 1.3. Other Computers 3 1.3.1. Field Engineering 3 1.3.2. Vision and Robotics 3 1.3.3. Music 4 1,3.4. Altos 4 1.4. Output Peripherals 4 1.5. Other Machines 5 1.6. Terminals 5 2. Networks 7 2.1. The ARPAnet 7 2.2. The Chaosnet 7 2.3. Services 8 2.3.1. TELNET/SUPDUP 8 2.3.2. FTP 8 2.4. Mail 9 2.4.1. Processing Mail 9 2.4.2. Ettiquette 9 2.5. Mailing Lists 10 2.5.1. BBoards 11 2.6. Finger/Inquire 11 2.7. TIPs and TACs 12 2.7.1. ARPAnet TAC 12 2.7.2. Chaosnet TIP 13 3. -
Selected Filmography of Digital Culture and New Media Art
Dejan Grba SELECTED FILMOGRAPHY OF DIGITAL CULTURE AND NEW MEDIA ART This filmography comprises feature films, documentaries, TV shows, series and reports about digital culture and new media art. The selected feature films reflect the informatization of society, economy and politics in various ways, primarily on the conceptual and narrative plan. Feature films that directly thematize the digital paradigm can be found in the Film Lists section. Each entry is referenced with basic filmographic data: director’s name, title and production year, and production details are available online at IMDB, FilmWeb, FindAnyFilm, Metacritic etc. The coloured titles are links. Feature films Fritz Lang, Metropolis, 1926. Fritz Lang, M, 1931. William Cameron Menzies, Things to Come, 1936. Fritz Lang, The Thousand Eyes of Dr. Mabuse, 1960. Sidney Lumet, Fail-Safe, 1964. James B. Harris, The Bedford Incident, 1965. Jean-Luc Godard, Alphaville, 1965. Joseph Sargent, Colossus: The Forbin Project, 1970. Henri Verneuil, Le serpent, 1973. Alan J. Pakula, The Parallax View, 1974. Francis Ford Coppola, The Conversation, 1974. Sidney Pollack, The Three Days of Condor, 1975. George P. Cosmatos, The Cassandra Crossing, 1976. Sidney Lumet, Network, 1976. Robert Aldrich, Twilight's Last Gleaming, 1977. Michael Crichton, Coma, 1978. Brian De Palma, Blow Out, 1981. Steven Lisberger, Tron, 1982. Godfrey Reggio, Koyaanisqatsi, 1983. John Badham, WarGames, 1983. Roger Donaldson, No Way Out, 1987. F. Gary Gray, The Negotiator, 1988. John McTiernan, Die Hard, 1988. Phil Alden Robinson, Sneakers, 1992. Andrew Davis, The Fugitive, 1993. David Fincher, The Game, 1997. David Cronenberg, eXistenZ, 1999. Frank Oz, The Score, 2001. Tony Scott, Spy Game, 2001. -
Block Ciphers: Fast Implementations on X86-64 Architecture
Block Ciphers: Fast Implementations on x86-64 Architecture University of Oulu Department of Information Processing Science Master’s Thesis Jussi Kivilinna May 20, 2013 Abstract Encryption is being used more than ever before. It is used to prevent eavesdropping on our communications over cell phone calls and Internet, securing network connections, making e-commerce and e-banking possible and generally hiding information from unwanted eyes. The performance of encryption functions is therefore important as slow working implementation increases costs. At server side faster implementation can reduce the required capacity and on client side it can lower the power usage. Block ciphers are a class of encryption functions that are typically used to encrypt large bulk data, and thus make them a subject of many studies when endeavoring greater performance. The x86-64 architecture is the most dominant processor architecture in server and desktop computers; it has numerous different instruction set extensions, which make the architecture a target of constant new research on fast software implementations. The examined block ciphers – Blowfish, AES, Camellia, Serpent and Twofish – are widely used in various applications and their different designs make them interesting objects of investigation. Several optimization techniques to speed up implementations have been reported in previous research; such as the use of table look-ups, bit-slicing, byte-slicing and the utilization of “out-of-order” scheduling capabilities. We examine these different techniques and utilize them to construct new implementations of the selected block ciphers. Focus with these new implementations is in modes of operation which allow multiple blocks to be processed in parallel; such as the counter mode. -
An FPGA-Accelerated Embedded Convolutional Neural Network
Master Thesis Report ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network (edit) (edit) 1000ch 1000ch FPGA 1000ch Network Analysis Network Analysis 2x512 > 1024 2x512 > 1024 David Gschwend [email protected] SqueezeNet v1.1 b2a ext7 conv10 2x416 > SqueezeNet SqueezeNet v1.1 b2a ext7 conv10 2x416 > SqueezeNet arXiv:2005.06892v1 [cs.CV] 14 May 2020 Supervisors: Emanuel Schmid Felix Eberli Professor: Prof. Dr. Anton Gunzinger August 2016, ETH Zürich, Department of Information Technology and Electrical Engineering Abstract Image Understanding is becoming a vital feature in ever more applications ranging from medical diagnostics to autonomous vehicles. Many applications demand for embedded solutions that integrate into existing systems with tight real-time and power constraints. Convolutional Neural Networks (CNNs) presently achieve record-breaking accuracies in all image understanding benchmarks, but have a very high computational complexity. Embedded CNNs thus call for small and efficient, yet very powerful computing platforms. This master thesis explores the potential of FPGA-based CNN acceleration and demonstrates a fully functional proof-of-concept CNN implementation on a Zynq System-on-Chip. The ZynqNet Embedded CNN is designed for image classification on ImageNet and consists of ZynqNet CNN, an optimized and customized CNN topology, and the ZynqNet FPGA Accelerator, an FPGA-based architecture for its evaluation. ZynqNet CNN is a highly efficient CNN topology. Detailed analysis and optimization of prior topologies using the custom-designed Netscope CNN Analyzer have enabled a CNN with 84.5 % top-5 accuracy at a computational complexity of only 530 million multiply- accumulate operations. The topology is highly regular and consists exclusively of convolu- tional layers, ReLU nonlinearities and one global pooling layer. -
Advanced Synthesis Cookbook
Advanced Synthesis Cookbook Advanced Synthesis Cookbook 101 Innovation Drive San Jose, CA 95134 www.altera.com MNL-01017-6.0 Document last updated for Altera Complete Design Suite version: 11.0 Document publication date: July 2011 © 2011 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX are Reg. U.S. Pat. & Tm. Off. and/or trademarks of Altera Corporation in the U.S. and other countries. All other trademarks and service marks are the property of their respective holders as described at www.altera.com/common/legal.html. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. Advanced Synthesis Cookbook July 2011 Altera Corporation Contents Chapter 1. Introduction Blocks and Techniques . 1–1 Simulating the Examples . 1–1 Using a C Compiler . 1–2 Chapter 2. Arithmetic Introduction . 2–1 Basic Addition . 2–2 Ternary Addition . 2–2 Grouping Ternary Adders . 2–3 Combinational Adders . 2–3 Double Addsub/ Basic Addsub . 2–3 Two’s Complement Arithmetic Review . 2–4 Traditional ADDSUB Unit . 2–4 Compressors (Carry Save Adders) . 2–5 Compressor Width 6:3 . -
Usuba, Optimizing Bitslicing Compiler Darius Mercadier
Usuba, Optimizing Bitslicing Compiler Darius Mercadier To cite this version: Darius Mercadier. Usuba, Optimizing Bitslicing Compiler. Programming Languages [cs.PL]. Sorbonne Université (France), 2020. English. tel-03133456 HAL Id: tel-03133456 https://tel.archives-ouvertes.fr/tel-03133456 Submitted on 6 Feb 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESE` DE DOCTORAT DE SORBONNE UNIVERSITE´ Specialit´ e´ Informatique Ecole´ doctorale Informatique, Tel´ ecommunications´ et Electronique´ (Paris) Present´ ee´ par Darius MERCADIER Pour obtenir le grade de DOCTEUR de SORBONNE UNIVERSITE´ Sujet de la these` : Usuba, Optimizing Bitslicing Compiler soutenue le 20 novembre 2020 devant le jury compose´ de : M. Gilles MULLER Directeur de these` M. Pierre-Evariste´ DAGAND Encadrant de these` M. Karthik BHARGAVAN Rapporteur Mme. Sandrine BLAZY Rapporteur Mme. Caroline COLLANGE Examinateur M. Xavier LEROY Examinateur M. Thomas PORNIN Examinateur M. Damien VERGNAUD Examinateur Abstract Bitslicing is a technique commonly used in cryptography to implement high-throughput parallel and constant-time symmetric primitives. However, writing, optimizing and pro- tecting bitsliced implementations by hand are tedious tasks, requiring knowledge in cryptography, CPU microarchitectures and side-channel attacks. The resulting programs tend to be hard to maintain due to their high complexity. -
Creativity in Computer Science. in J
Creativity in Computer Science Daniel Saunders and Paul Thagard University of Waterloo Saunders, D., & Thagard, P. (forthcoming). Creativity in computer science. In J. C. Kaufman & J. Baer (Eds.), Creativity across domains: Faces of the muse. Mahwah, NJ: Lawrence Erlbaum Associates. 1. Introduction Computer science only became established as a field in the 1950s, growing out of theoretical and practical research begun in the previous two decades. The field has exhibited immense creativity, ranging from innovative hardware such as the early mainframes to software breakthroughs such as programming languages and the Internet. Martin Gardner worried that "it would be a sad day if human beings, adjusting to the Computer Revolution, became so intellectually lazy that they lost their power of creative thinking" (Gardner, 1978, p. vi-viii). On the contrary, computers and the theory of computation have provided great opportunities for creative work. This chapter examines several key aspects of creativity in computer science, beginning with the question of how problems arise in computer science. We then discuss the use of analogies in solving key problems in the history of computer science. Our discussion in these sections is based on historical examples, but the following sections discuss the nature of creativity using information from a contemporary source, a set of interviews with practicing computer scientists collected by the Association of Computing Machinery’s on-line student magazine, Crossroads. We then provide a general comparison of creativity in computer science and in the natural sciences. 2. Nature and Origins of Problems in Computer Science December 21, 2004 Computer science is closely related to both mathematics and engineering. -
Fpnew: an Open-Source Multi-Format Floating-Point Unit Architecture For
1 FPnew: An Open-Source Multi-Format Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing Stefan Mach, Fabian Schuiki, Florian Zaruba, Student Member, IEEE, and Luca Benini, Fellow, IEEE Abstract—The slowdown of Moore’s law and the power wall Internet of Things (IoT) domain. In this environment, achiev- necessitates a shift towards finely tunable precision (a.k.a. trans- ing high energy efficiency in numerical computations requires precision) computing to reduce energy footprint. Hence, we need architectures and circuits which are fine-tunable in terms of circuits capable of performing floating-point operations on a wide range of precisions with high energy-proportionality. We precision and performance. Such circuits can minimize the present FPnew, a highly configurable open-source transprecision energy cost per operation by adapting both performance and floating-point unit (TP-FPU) capable of supporting a wide precision to the application requirements in an agile way. The range of standard and custom FP formats. To demonstrate the paradigm of “transprecision computing” [1] aims at creating flexibility and efficiency of FPnew in general-purpose processor a holistic framework ranging from algorithms and software architectures, we extend the RISC-V ISA with operations on half-precision, bfloat16, and an 8bit FP format, as well as SIMD down to hardware and circuits which offer many knobs to vectors and multi-format operations. Integrated into a 32-bit fine-tune workloads. RISC-V core, our TP-FPU can speed up execution of mixed- The most flexible and dynamic way of performing nu- precision applications by 1.67x w.r.t. -
2012, Dec, Google Introduces Metaweb Searching
Google Gets A Second Brain, Changing Everything About Search Wade Roush12/12/12Follow @wroush Share and Comment In the 1983 sci-fi/comedy flick The Man with Two Brains, Steve Martin played Michael Hfuhruhurr, a neurosurgeon who marries one of his patients but then falls in love with the disembodied brain of another woman, Anne. Michael and Anne share an entirely telepathic relationship, until Michael’s gold-digging wife is murdered, giving him the opportunity to transplant Anne’s brain into her body. Well, you may not have noticed it yet, but the search engine you use every day—by which I mean Google, of course—is also in the middle of a brain transplant. And, just as Dr. Hfuhruhurr did, you’re probably going to like the new version a lot better. You can think of Google, in its previous incarnation, as a kind of statistics savant. In addition to indexing hundreds of billions of Web pages by keyword, it had grown talented at tricky tasks like recognizing names, parsing phrases, and correcting misspelled words in users’ queries. But this was all mathematical sleight-of-hand, powered mostly by Google’s vast search logs, which give the company a detailed day-to-day picture of the queries people type and the links they click. There was no real understanding underneath; Google’s algorithms didn’t know that “San Francisco” is a city, for instance, while “San Francisco Giants” is a baseball team. Now that’s changing. Today, when you enter a search term into Google, the company kicks off two separate but parallel searches. -
Bit Slicing Repetitive Logic
Bit Slicing Repetitive Logic Where logic blocks are duplicated within a system, there is much to gain from optimization. • Optimize single block for size. The effect is amplified by the number of identical blocks. • Optimize interconnect. With careful design, the requirement for routing channels between the blocks can be eliminated. Interconnection is by butting in 2 dimensions. 4001 Bit Slicing Instead of creating an ALU function by function, we create it slice by slice. A[7:0] B[7:0] A[7] B[7] A[0] B[0] Z[7] Z[7:0] Z[0] • Each bit slice is a full 1-bit ALU. • N are used to create an N–bit ALU. 4002 Bit Slicing FnAND FnOR FnXOR FnADD FnSHR FnSHL Cout SHRin A Z FA B A Cin A SHLin Z= A&B Z= A|B Z= A!=B Z= A+B Z= A>>1 Z= A<<1 • Simple 1–bit ALU. • The control lines select which function of A and B is fed to Z. • Some functions, e.g. Add, require extra data I/O. 4003 Bit Slicing FnAND FnAND Z Z Vdd GND B B A A • Data busses horizontal, control lines vertical. • Compact gate matrix implementation1. 1bit slice designs can also be built around standard cells although the full custom approach used here gives better results 4004 • Bit Sliced ALU (3-bits, scalable). CarryOut ShiftRightIn ShiftLeftOut FnAND FnOR FnXOR FnAdd FnSHR FnSHL Cout SHRin A Z[2] FA B[2] A[2] Cin A SHLin Cout SHRin A Z[1] FA B[1] A[1] Cin A SHLin Cout SHRin A Z[0] FA B[0] A[0] Cin A SHLin CarryIn ShiftRightOut ShiftLeftIn 4005 Bit Slicing We can extend this principle to the whole datapath. -
Formally Verified Big Step Semantics out of X86-64 Binaries
Formally Verified Big Step Semantics out of x86-64 Binaries Ian Roessle Freek Verbeek Binoy Ravindran Virginia Tech, USA Virginia Tech, USA Virginia Tech, USA [email protected] [email protected] [email protected] Abstract or systems where source code is unavailable due to propri- This paper presents a methodology for generating formally etary reasons. Various safety-critical systems in automotive, proven equivalence theorems between decompiled x86-64 aerospace, medical and military domains are built out of machine code and big step semantics. These proofs are built components where the source code is not available [38]. on top of two additional contributions. First, a robust and In such a context, certification plays a crucial role. Certi- tested formal x86-64 machine model containing small step fication can require a compliance proof based on formal semantics for 1625 instructions. Second, a decompilation- methods [32, 41]. Bottom-up formal verification can aid in into-logic methodology supporting both x86-64 assembly obtaining high levels of assurance for black-box components and machine code at large scale. This work enables black- running on commodity hardware, such as x86-64. box binary verification, i.e., formal verification of a binary A binary typically consists of blocks composed by control where source code is unavailable. As such, it can be applied flow. In this paper, a block is defined as a sequence of instruc- to safety-critical systems that consist of legacy components, tions that can be modeled with only if-then-else statements or components whose source code is unavailable due to (no loops).