Optimizing Applications for Multicore by Intel Software Engineer Levent Akyil Welcome to the Parallel Universe
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Here I Led Subcommittee Reports Related to Data-Intensive Science and Post-Moore Computing) and in CRA’S Board of Directors Since 2015
Vivek Sarkar Curriculum Vitae Contents 1 Summary 2 2 Education 3 3 Professional Experience 3 3.1 2017-present: College of Computing, Georgia Institute of Technology . 3 3.2 2007-present: Department of Computer Science, Rice University . 5 3.3 1987-2007: International Business Machines Corporation . 7 4 Professional Awards 11 5 Research Awards 12 6 Industry Gifts 15 7 Graduate Student and Other Mentoring 17 8 Professional Service 19 8.1 Conference Committees . 20 8.2 Advisory/Award/Review/Steering Committees . 25 9 Keynote Talks, Invited Talks, Panels (Selected) 27 10 Teaching 33 11 Publications 37 11.1 Refereed Conference and Journal Publications . 37 11.2 Refereed Workshop Publications . 51 11.3 Books, Book Chapters, and Edited Volumes . 58 12 Patents 58 13 Software Artifacts (Selected) 59 14 Personal Information 60 Page 1 of 60 01/06/2020 1 Summary Over thirty years of sustained contributions to programming models, compilers and runtime systems for high performance computing, which include: 1) Leading the development of ASTI during 1991{1996, IBM's first product compiler component for optimizing locality, parallelism, and the (then) new FORTRAN 90 high-productivity array language (ASTI has continued to ship as part of IBM's XL Fortran product compilers since 1996, and was also used as the foundation for IBM's High Performance Fortran compiler product); 2) Leading the research and development of the open source Jikes Research Virtual Machine at IBM during 1998{2001, a first-of-a-kind Java Virtual Machine (JVM) and dynamic compiler implemented -
Ultra-Large-Scale Sites
Ultra-large-scale Sites <working title> – Scalability, Availability and Performance in Social Media Sites (picture from social network visualization?) Walter Kriha With a forword by << >> Walter Kriha, Scalability and Availability Aspects…, V.1.9.1 page 1 03/12/2010 Copyright <<ISBN Number, Copyright, open access>> ©2010 Walter Kriha This selection and arrangement of content is licensed under the Creative Commons Attribution License: http://creativecommons.org/licenses/by/3.0/ online: www.kriha.de/krihaorg/ ... <a rel="license" href="http://creativecommons.org/licenses/by/3.0/de/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by/3.0/de/88x31.png" /></a><br /><span xmlns:dc="http://purl.org/dc/elements/1.1/" href="http://purl.org/dc/dcmitype/Text" property="dc:title" rel="dc:type"> Building Scalable Social Media Sites</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="wwww.kriha.de/krihaorg/books/ultra.pdf" property="cc:attributionName" rel="cc:attributionURL">Walter Kriha</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/de/">Creative Commons Attribution 3.0 Germany License</a>.<br />Permissions beyond the scope of this license may be available at <a xmlns:cc="http://creativecommons.org/ns#" href="www.kriha.org" rel="cc:morePermissions">www.kriha.org</a>. Walter Kriha, Scalability and Availability Aspects…, V.1.9.1 page 2 03/12/2010 Acknowledgements <<master course, Todd Hoff/highscalability.com..>> Walter Kriha, Scalability and Availability Aspects…, V.1.9.1 page 3 03/12/2010 ToDo’s - The role of ultra fast networks (Infiniband) on distributed algorithms and behaviour with respect to failure models - more on group behaviour from Clay Shirky etc. -
August 2009 Volume 34 Number 4
AUGUST 2009 VOLUME 34 NUMBER 4 OPINION Musings 2 Rik Farrow FILE SYSTEMS Cumulus: Filesystem Backup to the Cloud 7 Michael VR able, SteFan SaVage, and geoffrey M. VoelkeR THE USENIX MAGAZINE PROGRAMMinG Rethinking Browser Performance 14 leo MeyeRoVich Programming Video Cards for Database Applications 21 tiM kaldewey SECURITY Malware to Crimeware: How Far Have They Gone, and How Do We Catch Up? 35 daVid dittRich HARDWARE A Home-Built NTP Appliance 45 Rudi Van dRunen CoLUMns Practical Perl Tools: Scratch the Webapp Itch with CGI::Application, Part 1 56 daVid n. blank-edelMan Pete’s All Things Sun: T Servers—Why, and Why Not 61 PeteR baeR galVin iVoyeur: Who Invited the Salesmen? 67 daVe JoSePhSen /dev/random 71 RobeRt g. Ferrell BooK REVIEWS Book Reviews 74 elizabeth zwicky et al. USEniX NOTES USENIX Lifetime Achievement Award 78 STUG Award 79 USENIX Association Financial Report for 2008 79 ellie young Writing for ;login: 83 ConfERENCES NSDI ’09 Reports 84 Report on the 8th International Workshop on Peer-to-Peer Systems (IPTPS ’09) 97 Report on the First USENIX Workshop on Hot Topics in Parallelism (HotPar ’09) 99 Report on the 12th Workshop on Hot Topics in Operating Systems (HotOS XII) 109 The Advanced Computing Systems Association aug09covers.indd 1 7.13.09 9:21:47 AM Upcoming Events 22n d ACM Sy M p o S i u M o n op e r A t i n g Sy S t e ms 7t H uSENIX Sy M p o S i u M o n ne t w o r k e d Sy S t e ms prinCipleS (SoSp ’09) de S i g n A n d iM p l e M e n t A t i o n (nSDI ’10) Sponsored by ACM SIGOPS in cooperation with USENIX Sponsored -
Andrzej Nowak - Bio
Multi-core Architectures Multi-core Architectures Andrzej Nowak - Bio 2005-2006 Intel Corporation IEEE 802.16d/e WiMax development Theme: Towards Reconfigggurable High-Performance Comppguting Linux kernel performance optimizations research Lecture 2 2006 Master Engineer diploma in Computer Science Multi-core Architectures Distributed Applications & Internet Systems Computer Systems Modeling 2007-2008 CERN openlab Andrzej Nowak Multi-core technologies CERN openlab (Geneva, Switzerland) Performance monitoring Systems architecture Inverted CERN School of Computing, 3-5 March 2008 1 iCSC2008, Andrzej Nowak, CERN openlab 2 iCSC2008, Andrzej Nowak, CERN openlab Multi-core Architectures Multi-core Architectures Introduction Objectives: Explain why multi-core architectures have become so popular Explain why parallelism is such a good bet for the near future Provide information about multi-core specifics Discuss the changes in computing landscape Discuss the impact of hardware on software Contents: Hardware part THEFREERIDEISOVERTHE FREE RIDE IS OVER Software part Recession looms? Outlook 3 iCSC2008, Andrzej Nowak, CERN openlab 4 iCSC2008, Andrzej Nowak, CERN openlab Towards Reconfigurable High-Performance Computing Lecture 2 iCSC 2008 3-5 March 2008, CERN Multi-core Architectures 1 Multi-core Architectures Multi-core Architectures Fundamentals of scalability Moore’s Law (1) Scalability – “readiness for enlargement” An observation made in 1965 by Gordon Moore, the co- founder of Intel Corporation: Good scalability: Additional -
Efficiency, Energy Efficiency and Programming of Accelerated HPC Servers: Highlights of PRACE Studies
Efficiency, energy efficiency and programming of accelerated HPC servers: Highlights of PRACE studies Lennart Johnsson Department of Computer Science University of Houston and School of Computer Science and Communications KTH To appear in Springer Verlag “GPU Solutions to Multi-scale Problems in Science and Engineering”, 2011 2 Lennart Johnsson Abstract During the last few years the convergence in architecture for High-Performance Computing systems that took place for over a decade has been replaced by a di- vergence. The divergence is driven by the quest for performance, cost- performance and in the last few years also energy consumption that during the life-time of a system have come to exceed the HPC system cost in many cases. Mass market, specialized processors, such as the Cell Broadband Engine (CBE) and Graphics Processors, have received particular attention, the latter especially after hardware support for double-precision floating-point arithmetic was intro- duced about three years ago. The recent support of Error Correcting Code (ECC) for memory and significantly enhanced performance for double-precision arithme- tic in the current generation of Graphic Processing Units (GPUs) have further so- lidified the interest in GPUs for HPC. In order to assess the issues involved in potentially deploying clusters with nodes consisting of commodity microprocessors with some type of specialized processor for enhanced performance or enhanced energy efficiency or both for science and engineering workloads, PRACE, the Partnership for Advanced Com- puting in Europe, undertook a study that included three types of accelerators, the CBE, GPUs and ClearSpeed, and tools for their programming. The study focused on assessing performance, efficiency, power efficiency for double-precision arithmetic and programmer productivity. -
Design of a Parallel Multi-Threaded Programming Model for Multi-Core Processors
DESIGN OF A PARALLEL MULTI-THREADED PROGRAMMING MODEL FOR MULTI-CORE PROCESSORS By Muhammad Ali Ismail Thesis submitted for the Degree of Doctor of Philosophy Department of Computer and Information Systems Engineering NED University of Engineering & Technology University Road, Karachi - 75270, Pakistan 2011 DESIGN OF A PARALLEL MULTI-THREADED PROGRAMMING MODEL FOR MULTI-CORE PROCESSORS PhD Thesis By Muhammad Ali Ismail Batch: 2008-2009 Project Advisor: Prof. Dr. Shahid Hafeez Mirza Project Co-supervisor: Prof. Dr. Talat Altaf 2011 Department of Computer and Information Systems Engineering NED University of Engineering & Technology University Road, Karachi - 75270, Pakistan Certificate Certified that the thesis entitled, “DEVELOPMENT OF A NEW PARALLEL MULTI-THREADED PROGRAMMING MODEL FOR MULTI-CORE PROCESSORS” which is being submitted by Mr. Muhammad Ali Ismail for the award of degree of Doctor of Philosophy in Computer & Information Systems Engineering Department of NED University of Engineering and Technology is a record of candidate’s own original work carried out by him under our supervision and guidance. The work incorporated in this thesis has not been submitted elsewhere for the award of any other degree. ___________________ _________________________ Prof. Dr. Talat Altaf, Prof. Dr. Shahid Hafeez Mirza Dean (ECE ), NEDUET Professor, UIT PhD Co-supervisor PhD Supervisor Acknowledgements In first place, I would like to thank the Almighty Allah for His countless blessings. In fact, all praise and glory belongs to Him and none has the right and worth to be worshipped but He. Next, I would like to acknowledge my home university, NED university of Engineering and Technology, for giving me the opportunity and funding for conducting this PhD research. -
High Performance Neural Network Inference, Streaming, and Visualization of Medical Images Using FAST
Received August 15, 2019, accepted September 16, 2019, date of publication September 19, 2019, date of current version October 2, 2019. Digital Object Identifier 10.1109/ACCESS.2019.2942441 High Performance Neural Network Inference, Streaming, and Visualization of Medical Images Using FAST ERIK SMISTAD 1,2, ANDREAS ØSTVIK 1,2, AND ANDRÉ PEDERSEN 1 1SINTEF Medical Technology, 7465 Trondheim, Norway 2Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, 7491 Trondheim, Norway Corresponding author: Erik Smistad ([email protected]) This work was supported by the Research Council of Norway under Grant 270941 and Grant 237887. ABSTRACT Deep convolutional neural networks have quickly become the standard for medical image analysis. Although there are many frameworks focusing on training neural networks, there are few that focus on high performance inference and visualization of medical images. Neural network inference requires an inference engine (IE), and there are currently several IEs available including Intel's OpenVINO, NVIDIA's TensorRT, and Google's TensorFlow which supports multiple backends, including NVIDIA's cuDNN, AMD's ROCm and Intel's MKL-DNN. These IEs only work on specific processors and have completely different application programming interfaces (APIs). In this paper, we presents methods for extending FAST, an open-source high performance framework for medical imaging, to use any IE with a common programming interface. Thereby making it easier for users to deploy and test their neural networks on different processors. This article provides an overview of current IEs and how they can be combined with existing software such as FAST. The methods are demonstrated and evaluated on three performance demanding medical use cases: real-time ultrasound image segmentation, computed tomography (CT) volume segmentation, and patch-wise classification of whole slide microscopy images. -
Download (633Kb)
Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products. Intel, Intel Core and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Ct: A New Paradigm for Data Parallel Computing *Other names and brands may be claimed as the property of others. Hans–Christian Hoppe Intel Visual Computing Institute, Intel Labs Copyright © 2009. Intel Corporation. using material from http://intel.com/software/products Anwar Ghuloum, CJ Newburn, Michael McCool and Stefanus Du Toit Performance and Productivity Libraries, Developer Products Division, Software and Services Group Software & Services Group, Developer Products Division Software & Services Group, Developer Products Division Copyright © 2009, Intel Corporation. All rights reserved. Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. -
Comparison and Analysis of Parallel Computing Performance Using Openmp and MPI
Send Orders for Reprints to [email protected] 38 The Open Automation and Control Systems Journal, 2013, 5, 38-44 Open Access Comparison and Analysis of Parallel Computing Performance Using OpenMP and MPI Shen Hua1,2,* and Zhang Yang1 1Department of Electronic Engineering, Dalian Neusoft University of Information Dalian, 116023/Liaoning, China 2Department of Biomedical Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian Univer- sity of Technology, Dalian, 116024/Liaoning, China Abstract: The developments of multi-core technology have induced big challenges to software structures. To take full advantages of the performance enhancements offered by new multi-core hardware, software programming models have made a great shift from sequential programming to parallel programming. OpenMP (Open Multi-Processing) and MPI (Message Passing Interface), as the most common parallel programming models, can provide different performance characteristics of parallelism in different cases. This paper intends to compare and analyze the parallel computing ability between OpenMP and MPI, and then some proposals are provided in parallel programming. The processing tools used include Intel VTune Performance Analyzer and Intel Thread Checker. The find- ings indicate that OpenMP is in favor of implementation and provides good performance in shared memory systems , and that MPI is propitious to programming models which require nodes performing a large number of tasks and little commu- nications in processes. Keywords: Parallel computing, Multi-core, multithread programming, OpenMP, MPI. 1. INTRODUCTION library routines and environment variables. OpenMP em- ploys the fork-join model, where different tasks are imple- As the number of transistors has increased to fit within a mented by multiple threads within the same address space. -
Threading Building Blocks Getting Started Guide
Intel(R) Threading Building Blocks Getting Started Guide Intel® Threading Building Blocks is a runtime-based parallel programming model for C++ code that uses threads. It consists of a template-based runtime library to help you harness the latent performance of multicore processors. Use Intel® Threading Building Blocks to write scalable applications that: • Specify logical parallel structure instead of threads • Emphasize data parallel programming • Take advantage of concurrent collections and parallel algorithms This guide provides a complete example that uses Intel® Threading Building Blocks to write, compile, link, and run a parallel application. The example shows you how to explore a key feature of the library and to successfully build and link an application. After completing this guide, you should be ready to write and build your own code using Intel® Threading Building Blocks. Contents 10H Note Default Directory Paths...............................................................................29H 21H Set Up Environment ..........................................................................................310H 32H Develop an Application Using parallel_for .............................................................411H 43H Build the Application..........................................................................................812H 54H Run the Application .........................................................................................1013H 65H Next Steps.....................................................................................................1114H -
Programming Models & Environments Summit
REPORT OF THE 2014 Programming Models & Environments Summit EDITOR: Michael Heroux CO-EDITOR: Richard Lethin CONTRIBUTORS: Michael Garland, Mary Hall, Michael Heroux, Larry Kaplan, Richard Lethin, Kathryn O’Brien, Vivek Sarkar, John Shalf, Marc Snir, Armando Solar-Lezama, Thomas Sterling, and Katherine Yelick DOE ASCR REVIEWER: Sonia R. Sachs September 19, 2016 Sponsored by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) 17-CS-3270 Contents 1 Executive Summary 5 2 Introduction 7 2.1 Hardware Challenges for Programming Environments ...................... 8 2.2 Application Challenges for Programming Models and Environments .............. 10 2.3 Application Software Design Challenges .............................. 12 2.4 Principles of Exascale Programming Models and Environments ................. 12 2.5 Ecosystem Considerations ...................................... 14 3 Role of Programming Models and Environments Research 15 3.1 Challenges ............................................... 16 3.2 Metrics for Success .......................................... 17 4 Background, Challenges and Opportunities 18 4.1 Characterizing Programming Models and Environments ..................... 18 4.2 Design Choices ............................................ 22 4.2.1 Low Level ........................................... 22 4.2.2 Uniform vs. Hybrid ..................................... 23 4.2.3 Physical vs. Virtual ..................................... 23 4.2.4 Scheduling and Mapping .................................. -
General Purpose Programming on Modern Graphics Hardware
GENERAL PURPOSE PROGRAMMING ON MODERN GRAPHICS HARDWARE Robert Fleming, B.Sc. Thesis Prepared for the Degree of MASTER OF SCIENCE UNIVERSITY OF NORTH TEXAS May 2008 APPROVED: Robert Renka, Major Professor Armin Mikler, Committee Member Tom Jacob, Committee Member Krishna Kavi, Chair of the Department of Computer Science and Engineering Oscar Garcia, Dean of the College of Engineering Sandra L. Terrell, Dean of the Robert B. Toulouse School of Graduate Studies Fleming, Robert. General Purpose Programming on Modern Graphics Hardware. Master of Science (Computer Science), May 2008, 90 pp., 1 table, 3 figures, references, 124 titles. I start with a brief introduction to the graphics processing unit (GPU) as well as general-purpose computation on modern graphics hardware (GPGPU). Next, I explore the motivations for GPGPU programming, and the capabilities of modern GPUs (including advantages and disadvantages). Also, I give the background required for further exploring GPU programming, including the terminology used and the resources available. Finally, I include a comprehensive survey of previous and current GPGPU work, and end with a look at the future of GPU programming. Copyright 2008 by Robert Fleming ii To Wanda and Cheesepuff, my partners in crime. iii TABLE OF CONTENTS Page LIST OF TABLES AND ILLUSTRATIONS ......................................................................vi Chapters 1. MOTIVATION ............................................................................................ 1 1.1 What Kinds of Computation Suit the