Design Methods for DSP Systems

Total Page:16

File Type:pdf, Size:1020Kb

Design Methods for DSP Systems EURASIP Journal on Applied Signal Processing Design Methods for DSP Systems Guest Editors: Bernhard Wess, Shuvra S. Bhattacharyya, and Markus Rupp EURASIP Journal on Applied Signal Processing Design Methods for DSP Systems EURASIP Journal on Applied Signal Processing Design Methods for DSP Systems Guest Editors: Markus Rupp, Bernhard Wess, and Shuvra S. Bhattacharyya, Copyright © 2006 Hindawi Publishing Corporation. All rights reserved. This is a special issue published in volume 2006 of “EURASIP Journal on Applied Signal Processing.” All articles are open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Editor-in-Chief Ali H. Sayed, University of California, USA Associate Editors Kenneth Barner, USA Søren Holdt Jensen, Denmark Vitor H. Nascimento, Brazil Mauro Barni, Italy Mark Kahrs, USA Sven Nordholm , Australia Richard Barton, USA Thomas Kaiser, Germany Douglas O’Shaughnessy, Canada Ati Baskurt, France Moon Gi Kang, South Korea Montse Pardas, Spain Kostas Berberidis, Greece Matti Karjalainen, Finland Wilfried Philips, Belgium Jose C. Bermudez, Brazil Walter Kellermann, Germany Vincent Poor, USA Enis Cetin, Turkey Joerg Kliewer, USA Ioannis Psaromiligkos, Canada Jonathon Chambers, UK Lisimachos P. Kondi, USA Phillip Regalia, France Benoit Champagne, Canada Alex Kot, Singapore Markus Rupp, Austria Joe Chen, USA Vikram Krishnamurthy, Canada Bill Sandham, UK Liang-Gee Chen, Taiwan Tan Lee, Hong Kong Bulent Sankur, Turkey Huaiyu Dai, USA Geert Leus, The Netherlands Erchin Serpedin, USA Satya Dharanipragada, USA Bernard C. Levy, USA Dirk Slock, France Frank Ehlers, Italy Ta-Hsin Li, USA Yap-Peng Tan, Singapore Sharon Gannot, Israel Mark Liao, Taiwan Dimitrios Tzovaras, Greece Fulvio Gini, Italy Yuan-Pei Lin, Taiwan Hugo Van hamme, Belgium Irene Gu, Sweden Shoji Makino, Japan Bernhard Wess, Austria Peter Handel, Sweden Stephen Marshall, UK Douglas Williams, USA R. Heusdens, The Netherlands C. Mecklenbräuker, Austria Roger Woods, UK Ulrich Heute, Germany Gloria Menegaz, Italy Jar-Ferr Yang, Taiwan Arden Huang, USA Ricardo Merched, Brazil Abdelhak M. Zoubir, Germany Jiri Jan, Czech Republic Rafael Molina, Spain Sudharman K. Jayaweera, USA Marc Moonen, Belgium Contents Design Methods for DSP Systems, Markus Rupp, Bernhard Wess, and Shuvra S. Bhattacharyya Volume 2006 (2006), Article ID 47817, 3 pages Macrocell Builder: IP-Block-Based Design Environment for High-Throughput VLSI Dedicated Digital Signal Processing Systems, Nacer-Eddine Zergainoh, Ludovic Tambour, Pascal Urard, and Ahmed Amine Jerraya Volume 2006 (2006), Article ID 28636, 11 pages Multiple-Clock-Cycle Architecture for the VLSI Design of a System for Time-Frequency Analysis, Veselin N. Ivanović, Radovan Stojanović, and LJubivša Stanković Volume 2006 (2006), Article ID 60613, 18 pages 3D-SoftChip: A Novel Architecture for Next-Generation Adaptive Computing Systems, Chul Kim, Alex Rassau, Stefan Lachowicz, Mike Myung-Ok Lee, and Kamran Eshraghian Volume 2006 (2006), Article ID 75032, 13 pages Highly Flexible Multimode Digital Signal Processing Systems Using Adaptable Components and Controllers, Vinu Vijay Kumar and John Lach Volume 2006 (2006), Article ID 79595, 9 pages Rapid VLIW Processor Customization for Signal Processing Applications Using Combinational Hardware Functions, Raymond R. Hoare, Alex K. Jones, Dara Kusic, Joshua Fazekas, John Foster, Shenchih Tung, and Michael McCloud Volume 2006 (2006), Article ID 46472, 23 pages Rapid Prototyping for Heterogeneous Multicomponent Systems: An MPEG-4 Stream over a UMTS Communication Link, M. Raulet, F. Urban, J. F. Nezan, C. Moy, O. Deforges, and Y. Sorel Volume 2006 (2006), Article ID 64369, 13 pages A Fully Automated Environment for Verification of Virtual Prototypes, P. Belanović, B. Knerr, M. Holzer, and M. Rupp Volume 2006 (2006), Article ID 32408, 12 pages FPGA-Based Reconfigurable Measurement Instruments with Functionality Defined by User, Guo-Ruey Tsai and Min-Chuan Lin Volume 2006 (2006), Article ID 84340, 14 pages FPGA Implementation of an MUD Based on Cascade Filters for a WCDMA System, Quoc-Thai Ho, Daniel Massicotte, and Adel-Omar Dahmane Volume 2006 (2006), Article ID 52919, 12 pages A New Pipelined Systolic Array-Based Architecture for Matrix Inversion in FPGAs with Kalman Filter Case Study, Abbas Bigdeli, Morteza Biglari-Abhari, Zoran Salcic, and Yat Tin Lai Volume 2006 (2006), Article ID 89186, 12 pages Floating-to-Fixed-Point Conversion for Digital Signal Processors, Daniel Menard, Daniel Chillet, and Olivier Sentieys Volume 2006 (2006), Article ID 96421, 19 pages Optimum Wordlength Search Using Sensitivity Information, Kyungtae Han and Brian L. Evans Volume 2006 (2006), Article ID 92849, 14 pages Hindawi Publishing Corporation EURASIP Journal on Applied Signal Processing Volume 2006, Article ID 47817, Pages 1–3 DOI 10.1155/ASP/2006/47817 Editorial Design Methods for DSP Systems Markus Rupp,1 Bernhard Wess,1 and Shuvra S. Bhattacharyya2 1 Institute of Communications and Radio Frequency Engineering, Vienna University of Technology, Gusshausstrasse 25/389, 1040 Vienna, Austria 2 Department of Electrical & Computer Engineering, University of Maryland, College Park, MD 20742, USA Received 8 August 2005; Accepted 8 August 2005 Copyright © 2006 Markus Rupp et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Industrial implementations of DSP systems today require ter transfer level architectures for VLSI signal processing sys- extreme complexity. Examples are wireless systems satisfy- tems from high-level representations as interconnections of ing standards like WLAN or 3GPP, video components, or intellectual property (IP) blocks. The development empha- multimedia players. At the same time, often harsh con- sizes extensive parameterization and component reuse to im- straints like low-power requirements burden the designer prove productivity and flexibility. Careful generation of con- even more. Conventional methods for ASIC design are not trol structures is also performed to manage delays and coor- sufficient any more to guarantee a fast conversion from ini- dinate parallel execution. Effectiveness of the tool is demon- tial concept to final product. In industry, the problem has strated on a number of high-throughput signal processing been addressed by the wording design crisis or design gap. applications. While this design gap exists in a complexity gap, that is, a In “Multiple-clock cycle architecture for the VLSI design of difference between existing, available, and demanded com- a system for time-frequency analysis,” Veselin N. Ivanovicet´ plexity, there is also a productivity gap, that is, the dif- al. present a streamlined architecture for time-frequency sig- ference between available complexity and how much we nal analysis. The architecture enables real-time analysis of a are able to efficiently convert into gate-level representa- number of important time-frequency distributions. By pro- tions. This special issue intends to present recent solutions viding for multiple-clock-cycle operation and resource shar- to such gaps addressing algorithmic design methods, al- ing across the design in an efficient manner, the architecture gorithms for floating-to-fixed-point conversion, automatic achieves these features with relatively low hardware complex- DSP coding strategies, architectural exploration methods, ity. Results are given based on implementation of the archi- hardware/software partitioning, as well as virtual and rapid tecture on field-programmable gate arrays, and a thorough prototyping. comparison is given against a single-cycle implementation We received 20 submissions from different fields and ar- architecture. eas of expertise from which finally only 12 were accepted for In “3D-SoftChip: a novel architecture for next-generation publication. These 12 papers can be categorised into four adaptive computing systems,” C. Kim et al. present an archi- groups: pure VLSI design methods, prototyping methods, tecture for real-time communication and signal processing experimental reports on FPGAs, and floating-to-fixed-point through vertical integration of a configurable array processor conversions. subsystem and a switch subsystem. The proposed integration Most activities in design methods are related to the final is achieved by means of an indium bump interconnection ar- product. VLSI design methods intend to deal with high com- ray to provide high interconnection bandwidth at relatively plexity in a rather short time. In this special issue, we present low levels of power dissipation. The paper motivates and de- five contributions allowing to design complex VLSI designs velops the design of the proposed system architecture, along in substantially lower time periods. with its 2D subsystems and hierarchical interconnection net- In “Macrocell builder: IP-block-based design environment work. Details on hardware/software codesign aspects of the for high-throughput VLSI dedicated digital signal process- proposed system are also discussed. ing systems”, N.-E. Zergainoh et al. present a design tool, In “Highly flexible multimode digital signal processing called DSP macrocell builder, that generates SystemC regis- systems using adaptable components and controllers”,V.V. 2 EURASIP Journal on Applied Signal Processing
Recommended publications
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Performance Evaluation of Vmware and Virtualbox
    2012 IACSIT Hong Kong Conferences IPCSIT vol. 29 (2012) © (2012) IACSIT Press, Singapore Performance Evaluation of VMware and VirtualBox Deepak K Damodaran1+, Biju R Mohan2, Vasudevan M S 3 and Dinesh Naik 4 Information Technology, National Institute of Technology Karnataka, India Abstract. Virtualization is a framework of dividing the resources of a computer into multiple execution environments. More specific it is a layer of software that provides the illusion of a real machine to multiple instances of virtual machines. Virtualization offers a lot of benefits including flexibility, security, ease to configuration and management, reduction of cost and so forth, but at the same time it also brings a certain degree of performance overhead. Furthermore, Virtual Machine Monitor (VMM) is the core component of virtual machine (VM) system and its effectiveness greatly impacts the performance of whole system. In this paper, we measure and analyze the performance of two virtual machine monitors VMware and VirtualBox using LMbench and IOzone, and provide a quantitative and qualitative comparison of both virtual machine monitors. Keywords: Virtualization, Virtual machine monitors, Performance. 1. Introduction Years ago, a problem aroused. How to run multiple operating systems on the same machine at the same time. The solution to this problem was virtual machines. Virtual machine monitor (VMM) the core part of virtual machines sits between one or more operating systems and the hardware and gives the illusion to each running operating system that it controls the machine. Behind the scenes, however the VMM actually is in control of the hardware, and must multiplex running operating systems across the physical resources of the machine.
    [Show full text]
  • INTERNATIONAL JOURNAL of ENGINEERING SCIENCES & RESEARCH TECHNOLOGY a Media Player Based on ARM by Porting of Linux Archan Agrawal*, Mrs
    [Agrawal, 3(12): December, 2014] ISSN: 2277-9655 Scientific Journal Impact Factor: 3.449 (ISRA), Impact Factor: 2.114 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A Media Player Based On ARM by Porting of Linux Archan Agrawal*, Mrs. Monika S. Joshi Department of Electronics & Comm. Engg., Marathwada Institute of Technology, Aurangabad, India Abstract This paper describes the porting of embedded linux on ARM 9 platform for designing and implementing of embedded media player on S3Cmini2440 development board. A novel transplating method for linux kernel is presented here, Linux kernel as well as its cut,compile,and porting process under ARM platform are introduced. Optimized linux operating system in the processor has been installed and transplanted the SDL_FFMPEG library into S3Cmini2440 after the cross compilation. Ofthis whole system come together in playing the audio/video & picture formats files smoothly & effectively. Keywords: Embedded Linux, ARM9,Porting, S3Cmini2440, Compilation,.Embedded Media Player. Introduction compatible to any of the hardware with the With the wide application of embedded systems in architecture specific changes into it, Linux has consumer electronics, industrial control, aerospace, become popular making the embedded system market automotive electronics, health care, network more competitive. communications and other fields ,embedded system include executable programs and scripts. An has been familiar to people all walks of life, operating system provides applications with a embedded systems have been into people's lives, it is platform where they can run, managing their access changing people's production and lifestyles in a to the CPU and system memory. The user's operation variety of forms.
    [Show full text]
  • Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times²§
    Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times²§ Rafael H. Saavedra³ Alan Jay Smith³³ ABSTRACT In previous research, we have developed and presented a model for measuring machines and analyzing programs, and for accurately predicting the running time of any analyzed program on any measured machine. That work is extended here by: (a) developing a high level program to measure the design and performance of the cache and TLB for any machine; (b) using those measurements, along with published miss ratio data, to improve the accuracy of our run time predictions; (c) using our analysis tools and measurements to study and compare the design of several machines, with par- ticular reference to their cache and TLB performance. As part of this work, we describe the design and performance of the cache and TLB for ten machines. The work presented in this paper extends a powerful technique for the evaluation and analysis of both computer systems and their workloads; this methodology is valuable both to com- puter users and computer system designers. 1. Introduction The performance of a computer system is a function of the speed of the individual functional units, such as the integer, branch, and floating-point units, caches, bus, memory system, and I/O units, and of the workload presented to the system. In our previous research [Saav89, 90, 92b, 92c], described below, we have measured the performance of the parts of the CPU on corresponding portions of various work- loads, but this work has not explicitly considered the behavior and performance of the cache memory.
    [Show full text]
  • What Every Programmer Should Know About Memory
    What Every Programmer Should Know About Memory Ulrich Drepper Red Hat, Inc. [email protected] November 21, 2007 Abstract As CPU cores become both faster and more numerous, the limiting factor for most programs is now, and will be for some time, memory access. Hardware designers have come up with ever more sophisticated memory handling and acceleration techniques–such as CPU caches–but these cannot work optimally without some help from the programmer. Unfortunately, neither the structure nor the cost of using the memory subsystem of a computer or the caches on CPUs is well understood by most programmers. This paper explains the structure of memory subsys- tems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them. 1 Introduction day these changes mainly come in the following forms: In the early days computers were much simpler. The var- • RAM hardware design (speed and parallelism). ious components of a system, such as the CPU, memory, mass storage, and network interfaces, were developed to- • Memory controller designs. gether and, as a result, were quite balanced in their per- • CPU caches. formance. For example, the memory and network inter- faces were not (much) faster than the CPU at providing • Direct memory access (DMA) for devices. data. This situation changed once the basic structure of com- For the most part, this document will deal with CPU puters stabilized and hardware developers concentrated caches and some effects of memory controller design. on optimizing individual subsystems. Suddenly the per- In the process of exploring these topics, we will explore formance of some components of the computer fell sig- DMA and bring it into the larger picture.
    [Show full text]
  • Stream Processing Systems Benchmark: Streambench
    Aalto University School of Science Degree Programme in Computer Science and Engineering Yangjun Wang Stream Processing Systems Benchmark: StreamBench Master's Thesis Espoo, May 26, 2016 Supervisors: Assoc. Prof. Aristides Gionis Advisor: D.Sc. Gianmarco De Francisci Morales Aalto University School of Science ABSTRACT OF Degree Programme in Computer Science and Engineering MASTER'S THESIS Author: Yangjun Wang Title: Stream Processing Systems Benchmark: StreamBench Date: May 26, 2016 Pages: 59 Major: Foundations of Advanced Computing Code: SCI3014 Supervisors: Assoc. Prof. Aristides Gionis Advisor: D.Sc. Gianmarco De Francisci Morales Batch processing technologies (Such as MapReduce, Hive, Pig) have matured and been widely used in the industry. These systems solved the issue processing big volumes of data successfully. However, first big amount of data need to be collected and stored in a database or file system. That is very time-consuming. Then it takes time to finish batch processing analysis jobs before get any results. While there are many cases that need analysed results from unbounded sequence of data in seconds or sub-seconds. To satisfy the increasing demand of processing such streaming data, several streaming processing systems are implemented and widely adopted, such as Apache Storm, Apache Spark, IBM InfoSphere Streams, and Apache Flink. They all support online stream processing, high scalability, and tasks monitoring. While how to evaluate stream processing systems before choosing one in production development is an open question. In this thesis, we introduce StreamBench, a benchmark framework to facilitate performance comparisons of stream processing systems. A common API compo- nent and a core set of workloads are defined.
    [Show full text]
  • Digital Technical Journal, Volume 4, Number 3: NVAX-Microprocessor VAX Systems
    Editorial Jane C. Blake, Editor Kathleen M. Stetson, Associate Editor Helen L. Patterson, Associate Editor Circulation Catherine M. Phillips, Administrator Sherry L. Gonzalez Production Terri Autieri, Production Editor Anne S. Katzeff, ppographer Peter R. Woodbury, Illustrator Advisory Board Samuel H. Fuller, Chairman Richard W Beane Richard J. Hollingsworth Alan G. Nemeth Victor A. Vyssotsky Gayn B. Winters The Digital TechnicalJournal is published quarterly by Digital Equipment Corporation, 146 Main Street MLO 1-3/B68,Maynard, Massachusetts 01754-2571. Subscriptions to the Journal are $40.00 for four issues and must be prepaid in U.S. funds. University and college professors and Ph.D. students in the electrical engineering and computer science fields receive complimentary subscriptions upon request. Orders, inquiries, and address changes should be sent to the Digital TechnicalJournal at the published-by address. Inquiries can also be sent electronically to [email protected]. Single copies and back issues are available for $16.00 each from Digital Press of Digital Equipment Corporation, 1 Burlington Woods Drive, Burlington, MA 01830-4597 Digital employees may send subscription orders on the ENET to RDVAX::JOURNAL or by interoffice mail to mailstop ML01-3/B68. Orders should include badge number, site location code, and address. All employees must advise of changes of address. Comments on the content of any paper are welcomed and may be sent to the editor at the published-by or network address. Copyright O 1992 Digital Equipment Corporation. Copying without fee is permitted provided that such copies are made for use in educational institutions by faculty members and are not distributed for commercial advantage.
    [Show full text]
  • COMP520-12C Final Report Nomadfs a Block
    COMP520-12C Final Report NomadFS A block migrating distributed file system Samuel Weston This report is in partial fulfilment of the requirements for the degree of Bachelor of Computing and Mathematical Sciences with Honours (BCMS(Hons)) at The University of Waikato. ©2012 Samuel Weston Abstract A distributed file system is a file system that is spread across multiple ma- chines. This report describes the block-based distributed file system NomadFS. NomadFS is designed for small scale distributed settings, such as those that exist in computer laboratories and cluster computers. It implements features, such as caching and block migration, which are aimed at improving the perfor- mance of shared data in such a setting. This report includes a discussion of the design and implementation of No- madFS, including relevant background. It also includes performance measure- ments, such as scalability. 2 Acknowledgements I would like to thank all the friendly members of the WAND network research group. This especially includes my supervisor Tony McGregor who has provided me with a massive amount of help over the year. Thanks! On a personal level I have enjoyed developing NomadFS and have learnt a great deal as a consequence of this development. This learning includes improv- ing my C programming ability, both in user space and kernel space (initially NomadFS was planned to be developed as a kernel space file system). I have also learnt a large amount about file systems and operating systems in general. 3 nomad /'n@Umæd/ noun member of tribe roaming from place to place for pasture; 4 Contents 1 Introduction 11 2 Background 13 2.1 A file system overview .
    [Show full text]
  • Performance and Cost Analysis Between On-Demand and Preemptive Virtual Machines
    Performance and Cost Analysis Between On-Demand and Preemptive Virtual Machines Breno G. S. Costa, Marco Antonio Sousa Reis, Aleteia´ P. F. Araujo´ and Priscila Solis Department of Computer Science, University of Brasilia, Bras´ılia/DF, Brazil Keywords: Cloud Computing, Transient Server, Performance Analysis, Preemptive Machines. Abstract: A few years ago, Amazon Web Services introduced spot instances, transient servers that can be contracted at a significant discount over regular price, but whose availability depends on cloud provider criteria and the instance can be revoked at any time. Google Cloud Platform offers preemptive instances, transient servers that have similar behavior and discount level to spot instances. Both providers advertise that their transient servers have the same performance level as servers contracted on-demand. Even with the possibility of revocation at the provider’s discretion, some applications can benefit from the low prices charged by these servers. But the measured performance of both models, transient and on-demand, must be similar, and the applications must survive occasional or mass server revoking. This work compares the performance and costs of transient and on-demand servers from both providers. Results show there is no significant difference in performance measured, but there is real cost advantage using transient servers. On Amazon Web Services a MapReduce cluster composed of transient servers achieved a 68% discount when compared to the same cluster based on on-demand servers. On Google Cloud Platform, the discount achieved was 26% but it can be bigger when the clusters are larger. 1 INTRODUCTION form (GCP) transient servers, named preemptive in- stances (Google, 2017), have a fixed discount of about Cloud providers have introduced a new class of 80%, a maximum lifetime of 24 hours (with the possi- servers, called transient servers, which they can uni- bility of preemption within lifetime) and with an alert laterally revoke at any time (Singh et al., 2014).
    [Show full text]
  • Minimizing Lookup Rpcs in Lustre File System Using Metadata Delegation at Client Side
    Minimizing Lookup RPCs in Lustre File System using Metadata Delegation at Client Side Vilobh Meshram, Xiangyong Ouyang and Dhabaleswar K. Panda Department of Computer Science and Engineering The Ohio State University {meshram, ouyangx, panda}@cse.ohio-state.edu Abstract 75% of all the file system calls access metadata [18]. Therefore, the efficient management of metadata is Lustre is a massively Parallel Distributed File Sys- crucial for the overall system performance. Along tem and its architecture scales well to large amount with this it will be very beneficial if we can minimize of data. However the performance of Lustre can be the time spent in communication between the inter- limited by the load of metadata operations at the acting Client and Server nodes in case of file system Metadata Server (MDS). Because of the higher ca- calls which access metadata. pacity of parallel file systems, they are often used to Lustre is a POSIX compliant, open-source dis- store and access millions of small files. These small tributed parallel filesystem.Due to the extremely scal- files may create a meta-data bottleneck, especially able architecture of the Lustre filesystem, Lustre de- for file systems that have only a single active meta- ployments are popular in scientific supercomputing, data server. Also,in case of Lustre installations with as well as in the oil and gas, manufacturing, rich me- Single Metadata Server or with Clustered Metadata dia, and finance sectors. Lustre presents a POSIX Server the time spent in path traversal from each interface to its clients with parallel access capabili- client involves multiple LOOKUP Remote Procedure ties to the shared file objects.
    [Show full text]
  • Performance Analysis of Scientific Computing Workloads on Trusted
    Performance Analysis of Scientific Computing Workloads on Trusted Execution Environments Ayaz Akram Anna Giannakou Venkatesh Akella Jason Lowe-Power Sean Peisert UC Davis LBNL UC Davis UC Davis LBNL & UC Davis [email protected] [email protected] [email protected] [email protected] [email protected] Abstract—Scientific computing sometimes involves computa- Physical memory Zone of trust tion on sensitive data. Depending on the data and the execution Encrypted Unsecured Data Open provider data environment, the HPC (high-performance computing) user or data data data provider may require confidentiality and/or integrity guar- Runtime Operating system Untrusted antees. To study the applicability of hardware-based trusted exe- or OS or hypervisor Trusted cution environments (TEEs) to enable secure scientific computing, Secure Normal Normal Data Other app app app scientist users we deeply analyze the performance impact of AMD SEV and Intel Untrusted Compute provider SGX for diverse HPC benchmarks including traditional scientific TEE computing, machine learning, graph analytics, and emerging Fig. 1: Trusted execution environments and threat model scientific computing workloads. We observe three main findings: 1) SEV requires careful memory placement on large scale NUMA machines (1×–3.4× slowdown without and 1×–1.15× slowdown with NUMA aware placement), 2) virtualization—a prerequisite the world, trust required by data providers of the compute for SEV—results in performance degradation for workloads with facility, and liability to an organization for hosting such data irregular memory accesses and large working sets (1×–4× slow- are both very high. Moreover, traditionally, the environments down compared to native execution for graph applications) and 3) created to protect sensitive data have significant usability SGX is inappropriate for HPC given its limited secure memory size and inflexible programming model (1.2×–126× slowdown challenges.
    [Show full text]
  • Performance and Evaluation of Lisp Systems
    Performance and Evaluation of Lisp Systems Richard P. Gabriel Originally Published by The MIT Press in their Computer Systems Series (ISBN 0-262-07093-6) Preface The distance is commonly very great between actual performances and speculative possibility, It is natural to suppose that as much as has been done today may be done tomorrow: but on the morrow some difficulty emerges, or some external impediment obstructs. Indolence, interrup- tion, business, and pleasure, all take their turns of retardation; and every long work is lengthened by a thousand causes that can, and ten thousand that cannot, be recounted. Perhaps no extensive and multifarious per- formance was ever effected within the term originally fixed in the under- taker’s mind. He that runs against Time has an antagonist not subject to casualties. Samuel Johnson (Gibbon’s Miscellaneous Works) When I ran across this quote, I was at first jubilant to have found something profound about performance written by Samuel Johnson which I could use as a centerpiece for the preface to this book. But as I read I saw that he was talking much too specifically about human performance to be an appropriate general statement about performance—a statement that could be applied to the performance of a computer program. It took me a few days to see that the point Johnson made addressed the very center of what should be learned about the performance of Lisp systems by anyone who cares to study the material I’ve presented in this book. That point is that people work very hard to attain every microsecond
    [Show full text]