Openss>>Expview

Total Page:16

File Type:pdf, Size:1020Kb

Openss>>Expview How to Analyze the Performance of Parallel Codes with Open|SpeedShop Jim Galarowicz: Krell Instute Don Maghrak, Krell Instute LLNL-PRES-503451 Oct 27-28, 2015 Presenters anD ExtenDeD Team v Jim Galarowicz, Krell v Donald Maghrak, Krell Open|SpeedShop extended team: v William Hachfeld, Dave Whitney, Dane Gardner: Argo Navis/Krell v Jennifer Green, David Montoya, Mike Mason, David Shrader: LANL v MarHn Schulz, MaJ Legendre and Chris Chambreau: LLNL v Mahesh Rajan, Anthony Agelastos: SNL v Dyninst group (Bart Miller: UW & Jeff Hollingsworth: UMD) v Phil Roth, Mike Brim: ORNL v Ciera Jaspan: CMU How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 2 Outline Welcome IntroducHon to Open|SpeedShop (O|SS) tools Overview of performance experiments Ø How to use Open|SpeedShop to gather anD Display Ø Sampling Experiments Ø Tracing Experiments Ø How to compare performance Data for Different applicaon runs Advanced and New FuncHonality v Component Based Tool Framework (CBTF) based O|SS Ø New lightweight iop anD mpip experiments Ø New: memory experiment Ø New: CUDA/GPU experiment Ø New: pthreads experiment What is new / Roadmap / Future Plans Supplemental, or Interest Ø CommanD Line Interface (CLI) tutorial anD examples How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 3 How to Analyze the Performance of Parallel Codes with Open|SpeedShop Secon 1 IntroDucNon into Tools anD Open|SpeeDShop Oct 27-28, 2015 Open|SpeeDShop Tool Set v Open Source Performance Analysis Tool Framework Ø Most common performance analysis steps all in one tool Ø Combines tracing anD sampling techniques Ø Extensible by plugins for Data collecNon anD representaon Ø Gathers anD Displays several types of performance informaon v Flexible and Easy to use Ø User access through: GUI, Command Line, Python Scrip@ng, convenience scripts v Scalable Data CollecHon Ø Instrumentaon of unmodified applica@on binaries Ø New opNon for hierarchical online data aggregaon v Supports a wide range of systems Ø Extensively useD anD testeD on a variety of Linux clusters Ø Cray, Blue Gene, ARM, Intel MIC, GPU support How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 5 Open|SpeedShop Workflow srun –n4 –N1 smg2000 –n 65 65 65 srun –n4 –N1 smg2000 –n 65 65 65 osspcsamp “srun –n4 –N1 smg2000 –n 65 65 65” MPI Applicaon O|SS Post-mortem hp://www.openspeedshop.org/ Oct 27-28, 2015 Alternave Interfaces v Scripng language Ø ImmeDiate commanD interface Experiment Commands Ø O|SS interacNve commanD line (CLI) expView • openss -cli expCompare expStatus List Commands v list –v exp Python module list –v hosts import openss list –v src my_filename=openss.FileList("myprog.a.out")Session Commands my_exptype=openss.ExpTypeList("pcsamp") openGui my_id=openss.expCreate(my_filename,my_exptype) openss.expGo() My_metric_list = openss.MetricList("exclusive") my_viewtype = openss.ViewTypeList("pcsamp”) result = openss.expView(my_id,my_viewtype,my_metric_list) How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 7 Central Concept: Experiments v Users pick experiments: Ø What to measure and from which sources? Ø How to select, view, and analyze the resulting data? v Two main classes: Ø Statistical Sampling • Periodically interrupt execution and record location • Useful to get an overview • Low and uniform overhead Ø Event Tracing • Gather and store individual application events • Provides detailed per event information • Can lead to huge data volumes v O|SS can be extended with additional experiments Oct 27-28, 2015 Sampling Experiments in O|SS v PC Sampling (pcsamp) Ø RecorD PC repeateDly at user DefineD Nme interval Ø Low overhead overview of Nme DistribuNon Ø GooD first step, lightweight overview v Call Path Profiling (userme) Ø PC Sampling anD Call stacks for each sample Ø ProviDes inclusive anD exclusive Nming Data Ø Use to finD hot call paths, whom is calling who v Hardware Counters (hwc, hwcHme, hwcsamp) Ø ProviDes profile of harDware counter events like cache & TLB misses Ø hwcsamp: • PerioDically sample to capture profile of the coDe against the chosen counter • Default events are PAPI_TOT_INS anD PAPI_TOT_CYC Ø hwc, hwcme: • Sample a harDware counter Nll a certain number of events ( calleD thresholD) is recorDeD anD get Call Stack • Default event is PAPI_TOT_CYC overflows How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 9 Tracing Experiments in O|SS v Input/Output Tracing (io, iot) Ø RecorD invocaon of all POSIX I/O events Ø ProviDes aggregate anD inDiviDual Nmings Ø Store funcNon arguments anD return coDe for each call (iot) v MPI Tracing (mpi, mpit, mpio) Ø RecorD invocaon of all MPI rouNnes Ø ProviDes aggregate anD inDiviDual Nmings Ø Store funcNon arguments anD return coDe for each call (mpit) Ø Create Open Trace Format (OTF) output (mpiol) v FloaHng Point ExcepHon Tracing (fpe) Ø TriggereD by any FPE causeD by the applicaon Ø Helps pinpoint numerical problem areas How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 10 Performance Analysis in Parallel v How to deal with concurrency? Ø Any experiment can be applieD to parallel applicaon • Important step: aggregaon or selecNon of Data Ø Special experiments targeNng parallelism/synchronizaon v O|SS supports MPI and threaded codes Ø Automacally applieD to all tasks/threads Ø Default views aggregate across all tasks/threads Ø Data from inDiviDual tasks/threads available Ø Thread support (incl. OpenMP) baseD on POSIX threads v Specific parallel experiments (e.g., MPI) Ø Wraps MPI calls anD reports • MPI rouNne Nme • MPI rouNne parameter informaon Ø The mpit experiment also store funcNon arguments anD return coDe for each call How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 11 How to Run a First Experiment in O|SS? 1. Picking the experiment Ø What Do I want to measure? Ø We will start with pcsamp to get a first overview 2. Launching the applicaon Ø How Do I control my applicaon unDer O|SS? Ø Enclose how you normally run your applica7on in quotes Ø osspcsamp “mpirun –np 256 smg2000 –n 65 65 65” 3. Storing the results Ø O|SS will create a database Ø Name: smg2000-pcsamp.openss 4. Exploring the gathered data Ø How Do I interpret the Data? Ø O|SS will print a default report Ø Open the GUI to analyze data in detail (run: “openss”) How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 12 Example Run with Output (1 of 2) v osspcsamp “mpirun –np 4 smg2000 –n 65 65 65” Bash> osspcsamp "mpirun -np 4 ./smg2000 -n 65 65 65" [openss]: pcsamp experiment using the pcsamp experiment Default sampling rate: "100". [openss]: Using OPENSS_PREFIX installeD in /opt/ossoffv2.1u4 [openss]: Seng up offline raw Data Directory in /opt/shareD/offline-oss [openss]: Running offline pcsamp experiment using the commanD: "mpirun -np 4 /opt/ossoffv2.1u4/bin/ossrun "./smg2000 -n 65 65 65" pcsamp" Running with these Driver parameters: (nx, ny, nz) = (65, 65, 65) … <SMG nave output> … Final Relave ResiDual Norm = 1.774415e-07 [openss]: ConverNng raw Data from /opt/shareD/offline-oss into temp file X.0.openss Processing raw Data for smg2000 Processing processes anD threads ... Processing performance Data ... Processing funcons anD statements ... Resolving symbols for /home/jeg/DEMOS/workshop_demos/mpi/smg2000/test/smg2000 How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 13 Example Run with Output (2 of 2) v osspcsamp “mpirun –np 4 smg2000 –n 65 65 65” [openss]: Restoring anD Displaying Default view for: /home/jeg/DEMOS/workshop_demos/mpi/smg2000/test/smg2000-pcsamp.openss [openss]: The restoreD experiment iDenNfier is: -x 1 Exclusive CPU Nme % of CPU Time FuncNon (Defining locaon) in seconds. 7.870000 43.265531 hypre_SMGResiDual (smg2000: smg_resiDual.c,152) 4.390000 24.134140 hypre_CyclicReDucNon (smg2000: cyclic_reDucNon.c,757) 1.090000 5.992303 mca_btl_vader_check_oxes (libmpi.so.1.4.0: btl_vader_ox.h,108) 0.510000 2.803738 unpack_preDefineD_data (libopen-pal.so.6.1.1: opal_datatype_unpack.h,41) 0.380000 2.089060 hypre_SemiInterp (smg2000: semi_interp.c,126) 0.360000 1.979109 hypre_SemiRestrict (smg2000: semi_restrict.c,125) 0.350000 1.924134 __memcpy_ssse3_back (libc-2.17.so) 0.310000 1.704233 pack_preDefineD_data (libopen-pal.so.6.1.1: opal_datatype_pack.h,38) 0.210000 1.154480 hypre_SMGAxpy (smg2000: smg_axpy.c,27) 0.140000 0.769654 hypre_StructAxpy (smg2000: struct_axpy.c,25) 0.110000 0.604728 hypre_SMGSetStructVectorConstantValues (smg2000: smg.c,379) .... v View with GUI: openss –f smg2000-pcsamp.openss How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 14 Default Output Report View Performance Data Default view: by Function Toolbar to switch (Data is sum from all Views processes and threads) Select “Functions”, click D-icon Graphical Representation How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 15 Statement Report Output View Performance Data View Choice: Statements Select “statements, click D-icon Statement in Program that took the most time How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 16 Associate Source & Performance Data Double click to open Use window controls to source window split/arrange windows Selected performance data point How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 17 Library (LinkedObject) View Select LinkedObject View type and Click on D-icon Shows time spent in libraries. Can indicate imbalance. Libraries in the application How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 18 Loop View Select Loops View type and Click on D-icon Shows time spent in loops.
Recommended publications
  • Tech Tools Tech Tools
    NEWS Tech Tools Tech Tools LLVM 3.0 Released Low Level Virtual Machine (LLVM) recently announced ver- replacing the C/ Objective-C compiler in the GCC system with a sion 3.0 of it compiler infrastructure. Originally implemented more easily integrated system and wider support for multithread- for C/ C++, the language-agnostic design of LLVM has ing. Other features include a new register allocator (which can spawned a wide variety of provide substantial perfor- front ends, including Objec- mance improvements in gen- tive-C, Fortran, Ada, Haskell, erated code), full support for Java bytecode, Python, atomic operations and the Ruby, ActionScript, GLSL, new C++ memory model, Clang, and others. and major improvement in The new release of LLVM the MIPS back end. represents six months of de- All LLVM releases are velopment over the previous available for immediate version and includes several download from the LLVM re- major changes, including dis- leases web site at: http:// continued support for llvm- llvm.​­org/​­releases/. For more gcc; the developers recom- information about LLVM, mend switching to Clang or visit the main LLVM website DragonEgg. Clang is aimed at at: http://​­llvm.​­org/. YaCy Search Engine Online Open64 5.0 Released The YaCy project has released version 1.0 of its Open64, the open source (GPLv2-licensed) peer-to-peer Free Software search engine. YaCy compiler for C/C++ and Fortran that’s does not use a central server; instead, its search re- backed by AMD and has been developed by sults come from a network of independent peers. SGI, HP, and various universities and re- According to the announcement, in this type of dis- search organizations, recently released ver- tributed network, no single entity decides which sion 5.0.
    [Show full text]
  • Porting Open64 to the Cygwin Environment
    Porting Open64 to the Cygwin Environment Nathan Tallent Mike Fagan ∗ November 2003 Abstract Cygwin is a Linux-like environment for Windows. It is sufficiently com- plete and stable that porting even very large codes to the environment is relatively straightforward. Cygwin is easy enough to download and install that it provides a convenient platform for traveling with code and giving conference demonstra- tions (via laptop) – even potentially expanding a code’s audience and user base (via desktop). However, for various reasons, Cygwin is not 100% compatible with Linux. We describe the major problems we encountered porting Rice University’s Open64/SL code base to Cygwin and present our solutions. 1 Introduction Cygwin is a Linux-like environment for Windows. It provides a Windows DLL that emulates nearly all of the Linux API. Programs written for Linux can link with the Cygwin DLL and thus run on Windows. A reasonably proficient Unix user can easily download and install a relatively complete Linux environment, including shells, de- velopment tools (GCC, make, gdb), editors (emacs, vi) and a graphical environment (XFree86). Moreover, the environment is complete and stable enough that it is rela- tively straightforward to port even very large codes to Cygwin. Because of the preva- lence of Windows-based desktops and laptops, Cygwin provides an attractive means for traveling with code and demonstrating it at conferences – even potentially expand- ing a code’s audience and user base. However, for full-time development or for critical applications, Unix remains superior. More information about Cygwin can be found on its web site, http://www.cygwin.com.
    [Show full text]
  • Automated Malware Analysis Report for Funny Linux.Elf
    ID: 449051 Sample Name: funny_linux.elf Cookbook: defaultlinuxfilecookbook.jbs Time: 05:20:57 Date: 15/07/2021 Version: 33.0.0 White Diamond Table of Contents Table of Contents 2 Linux Analysis Report funny_linux.elf 3 Overview 3 General Information 3 Detection 3 Signatures 3 Classification 3 Analysis Advice 3 General Information 3 Process Tree 3 Yara Overview 3 Jbx Signature Overview 4 Mitre Att&ck Matrix 4 Malware Configuration 4 Behavior Graph 4 Antivirus, Machine Learning and Genetic Malware Detection 5 Initial Sample 5 Dropped Files 5 Domains 5 URLs 5 Domains and IPs 5 Contacted Domains 5 Contacted IPs 5 Runtime Messages 6 Joe Sandbox View / Context 6 IPs 6 Domains 6 ASN 6 JA3 Fingerprints 6 Dropped Files 6 Created / dropped Files 6 Static File Info 6 General 6 Static ELF Info 7 ELF header 7 Sections 7 Program Segments 8 Dynamic Tags 8 Symbols 8 Network Behavior 10 System Behavior 10 Analysis Process: funny_linux.elf PID: 4576 Parent PID: 4498 10 General 10 File Activities 10 File Read 10 Copyright Joe Security LLC 2021 Page 2 of 10 Linux Analysis Report funny_linux.elf Overview General Information Detection Signatures Classification Sample funny_linux.elf Name: SSaampplllee hhaass sstttrrriiippppeedd ssyymbboolll tttaabblllee Analysis ID: 449051 Sample has stripped symbol table MD5: e0ba4089e9b457… Ransomware SHA1: 21b3392a2fdab2a… Miner Spreading SHA256: d2544462756205… mmaallliiiccciiioouusss malicious Evader Phishing Infos: sssuusssppiiiccciiioouusss suspicious cccllleeaann clean Exploiter Banker Spyware Trojan / Bot Adware Score:
    [Show full text]
  • CPU Function (Defining Location) CPU Time in Seconds
    NASA Open|SpeedShop Tutorial Performance Analysis with Open|SpeedShop Contributors: Jim Galarowicz, Bill Hachfeld, Greg Schultz: Argo Navis Technologies Don Maghrak, Patrick Romero: Krell Institute Martin Schulz, Matt Legendre: Lawrence Livermore National Laboratory Jennifer Green, Dave Montoya: Los Alamos National Laboratory Mahesh Rajan, Doug Pase: Sandia National Laboratories Koushik Ghosh, Engility Dyninst group (Bart Miller: UW & Jeff Hollingsworth: UMD) Phil Roth, Mike Brim: ORNL LLNL-PRES-503451 Dec 12, 2016 Outline Introduction to Open|SpeedShop How to run basic timing experiments and what they can do? How to deal with parallelism (MPI and threads)? How to properly use hardware counters? Slightly more advanced targets for analysis How to understand and optimize I/O activity? How to evaluate memory efficiency? How to analyze codes running on GPUs? DIY and Conclusions: DIY and Future trends Hands-on Exercises (after each section) On site cluster available We will provide exercises and test codes Performance Analysis With Open|SpeedShop: NASA Hands-On Tutorial Dec 12, 2016 2 NASA Open|SpeedShop Tutorial Performance Analysis with Open|SpeedShop Section 1 Introduction to Open|SpeedShop Dec 12, 2016 Open|SpeedShop Tool Set Open Source Performance Analysis Tool Framework Most common performance analysis steps all in one tool Combines tracing and sampling techniques Gathers and displays several types of performance information Flexible and Easy to use User access through: GUI, Command Line, Python Scripting, convenience
    [Show full text]
  • 2010-2011 Annual Report
    Software in the Public Interest, Inc. 2010-2011 Annual Report 1st July 2011 To the membership, board and friends of Software in the Public Interest, Inc: As mandated by Article 8 of the SPI Bylaws, I respectfully submit this annual report on the activities of Software in the Public Interest, Inc. and extend my thanks to all of those who contributed to the mission of SPI in the past year. { Jonathan McDowell, SPI Secretary 1 Contents 1 President's Welcome3 2 Committee Reports4 2.1 Membership Committee.......................4 2.1.1 Statistics...........................4 3 Board Report5 3.1 Board Members............................5 3.2 Board Changes............................6 3.3 Elections................................6 4 Treasurer's Report7 4.1 Income Statement..........................7 4.2 Balance Sheet.............................9 5 Member Project Reports 11 5.1 New Associated Projects....................... 11 5.1.1 LibreOffice.......................... 11 5.1.2 Open64............................ 11 5.1.3 Jenkins............................ 11 5.1.4 ankur.org.in.......................... 12 A About SPI 13 2 Chapter 1 President's Welcome In the time that has passed since I originally agreed to serve as President sev- eral years ago, SPI has made significant and enduring progress resolving long- standing procedural problems that once impeded our ability to meet the basic service expectations of our associated projects. These accomplishments are of course not my own, but the collective result of work contributed by a dedicated core group of volunteers including the rest of our officers and board. A measure of our success is the steady stream of new SPI associated projects in recent years, including those we invited to join us in the last year listed later in this report.
    [Show full text]
  • Proceedings of the 7 Python in Science Conference
    Proceedings of the 7th Python in Science Conference SciPy Conference – Pasadena, CA, August 19-24, 2008. Editors: Gaël Varoquaux, Travis Vaught, Jarrod Millman Proceedings of the 7th Python in Science Conference (SciPy 2008) Contents Editorial 3 G. Varoquaux, T. Vaught, J. Millman The State of SciPy 5 J. Millman, T. Vaught Exploring Network Structure, Dynamics, and Function using NetworkX 11 A. Hagberg, D. Schult, P. Swart Interval Arithmetic: Python Implementation and Applications 16 S. Taschini Experiences Using SciPy for Computer Vision Research 22 D. Eads, E. Rosten The SciPy Documentation Project (Technical Overview) 27 S. Van der Walt Matplotlib Solves the Riddle of the Sphinx 29 M. Droettboom The SciPy Documentation Project 33 J. Harrington Pysynphot: A Python Re-Implementation of a Legacy App in Astronomy 36 V. Laidler, P. Greenfield, I. Busko, R. Jedrzejewski How the Large Synoptic Survey Telescope (LSST) is using Python 39 R. Lupton Realtime Astronomical Time-series Classification and Broadcast Pipeline 42 D. Starr, J. Bloom, J. Brewer Analysis and Visualization of Multi-Scale Astrophysical Simulations Using Python and NumPy 46 M. Turk Mayavi: Making 3D Data Visualization Reusable 51 P. Ramachandran, G. Varoquaux Finite Element Modeling of Contact and Impact Problems Using Python 57 R. Krauss Circuitscape: A Tool for Landscape Ecology 62 V. Shah, B. McRae Summarizing Complexity in High Dimensional Spaces 66 K. Young Converting Python Functions to Dynamically Compiled C 70 I. Schnell 1 unPython: Converting Python Numerical Programs into C 73 R. Garg, J. Amaral The content of the articles of the Proceedings of the Python in Science Conference is copyrighted and owned by their original authors.
    [Show full text]
  • The CUDA Compiler Driver NVCC
    The CUDA Compiler Driver NVCC Last modified on: <04-01-2008> Document Change History Version Date Responsible Reason for Change beta 01-15-2007 Juul VanderSpek Initial release 0.1 05-25-2007 Juul VanderSpek CUDA 0.1 release 1.0 06-13-2007 Juul VanderSpek CUDA 1.0 release 1.1 10-12-2007 Juul VanderSpek CUDA 1.1 release 2.0 04-01-2008 Juul VanderSpek CUDA 2.0 release nvcc.doc V2.0 ii April, 2006 Introduction Overview CUDA programming model The CUDA Toolkit targets a class of applications whose control part runs as a process on a general purpose computer (Linux, Windows), and which use one or more NVIDIA GPUs as coprocessors for accelerating SIMD parallel jobs. Such jobs are ‘self- contained’, in the sense that they can be executed and completed by a batch of GPU threads entirely without intervention by the ‘host’ process, thereby gaining optimal benefit from the parallel graphics hardware. Dispatching GPU jobs by the host process is supported by the CUDA Toolkit in the form of remote procedure calling. The GPU code is implemented as a collection of functions in a language that is essentially ‘C’, but with some annotations for distinguishing them from the host code, plus annotations for distinguishing different types of data memory that exists on the GPU. Such functions may have parameters, and they can be ‘called’ using a syntax that is very similar to regular C function calling, but slightly extended for being able to specify the matrix of GPU threads that must execute the ‘called’ function.
    [Show full text]
  • High-Level Programming Model for Heterogeneous Embedded Systems Using
    HIGH-LEVEL PROGRAMMING MODEL FOR HETEROGENEOUS EMBEDDED SYSTEMS USING MULTICORE INDUSTRY STANDARD APIS A Dissertation Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy By Peng Sun August 2016 HIGH-LEVEL PROGRAMMING MODEL FOR HETEROGENEOUS EMBEDDED SYSTEMS USING MULTICORE INDUSTRY STANDARD APIS Peng Sun APPROVED: Dr. Chapman, Barbara Dept. of Computer Science, University of Houston Dr. Gabriel, Edgar Dept. of Computer Science, University of Houston Dr. Shah, Shishir Dept. of Computer Science, University of Houston Dr. Subhlok, Jaspal Dept. of Computer Science, University of Houston Dr. Chandrasekaran, Sunita Dept. of CIS, University of Delaware Dean, College of Natural Sciences and Mathematics ii Acknowledgements First and foremost, I would like to thank my advisor, Dr. Barbara Chapman, for her invaluable advice and guidance in my Ph.D. study. I appreciate all her dedi- cated guidance, and great opportunities to participate in those worthwhile academic projects, and the funding to complete my Ph.D. study. Her passion and excellence on academic and contributions to communities encourage me to finish my Ph.D. degree. Specifically, I am very grateful to Dr. Sunita Chandrasekaran for the mentoring and guidance for my research and publications. That help is paramount to my Ph.D. Journey. I truly could not achieve the degree without her mentoring and advisory. Special thanks to all my committee members: Dr. Edgar Gabriel, Dr. Shishir Shah, Dr. Jaspal Subhlok, for their time, insightful comments and help. I would also like to thank my fellow labmates of the HPCTools group: Dr.
    [Show full text]
  • Build Linux Kernel with Open64
    Build Linux kernel with Open64 Qing Zhu1, Tao Wang2, Gang Yu1, Kun Ling2, Jian-xin Lai1 1Unix System Lab, Hewlett-Packard Company 2 State Key Laboratory of Computer Architecture, Institute of Computing Technology 1{qing.zhu2, yu.gang, jianxin.lai}@hp.com, 2{wangtao2010, lingkun}@ict.ac.cn Abstract architectures are principal and popular in IT industry, proving Open64 is an open source, optimizing compiler. The goal of the the success on these two platforms will impress more people on compiler is to be an alternative back-end of GCC. To improve Open64. Open64’s code quality, enhance GCC-compatibility and attrac- • Provide a good research platform to improve the kernel’s per- t more users to the Open64 community, it is important to build formance by advanced optimizations. Especially, to enable I- significant applications like Linux kernel with Open64. This paper PA/IPO (Inter-Procedure Analysis and Optimization) on kernel describes the work of building and running up Linux kernel with build, show how much performance improvement that IPA/IPO Open64. As an experience report, the general build, triage approach will give us on standard system software like operating system and tackling method for commonly issues on building kernel with kernel. non-gnu are discussed. Now, the officially maintained open64.net trunk on x8664 target and the Loongcc[1] branch on MIPS64 target In this paper, we make a summary of general practices in ker- have all successfully built up and run the Linux kernel. nel building process on different architectures. These approaches and experience may help other targets to conduct kernel building Keywords Open64, Linux kernel, Loongcc, X86-64, MIPS-64, and build other Open64 support platforms effectively.
    [Show full text]
  • For More Details, See Downloadable Description
    Title: Open64 Compiler infrastructure for emerging multicore/manycore architecture Presenters All Symposium Tutorial Organizer: Guang R. Gao, University of Delaware (open to all attendees at no charge) Barbara Chapman, University of Houston Tony Linthicum & Anshuman Dasgupta, Qualcomm Inc. Tuesday, April 15th at 7:00 PM Juergen Ributzka, University of Delaware Abstract Open64 was originally developed by SGI and released as the MIPSpro compiler. It has been well- recognized as an industrial-strength production compiler for high-performance computing. It includes advanced interprocedural optimizations, loop nest optimizations, global scalar optimizations, and code generation with advanced global register allocation and software pipelining. It was open-sourced in 2000 after it was retargeted to the Itanium processor. Now, Open64 are accepted by many compiler researchers as a good infrastructure for the research on new compiler optimizing technologies, especially the for the emerging multi-core/many-core architecture. Presentation Outline 7:00pm – 7:30pm A Briefly Introduction to Open64 Presenter: Guang R. Gao, University of Delaware Abstract: Open64 was originally developed by SGI and released as the MIPSpro compiler. It has been well recognized as an industrial-strength production compiler for high-performance computing. It includes advanced interprocedural optimizations, loop nest optimizations, global scalar optimizations, and code generation with advanced global register allocation and software pipelining. It was open- sourced in 2000 after it was retargeted to the Itanium processor. There have been a number of subsequent branches and improvements to the compiler since then. Intel adopted the Open64 compiler for compiler-related research and subsequently released it as the Open Research Compiler (ORC) starting Jan 2002.
    [Show full text]
  • Pathscale Fortran and C/C++ Compilers
    PathScale™ Compiler Suite User Guide Version 3.2 Page i PathScale Compiler Suite User Guide Version 3.2 Page ii PathScale Compiler Suite User Guide Version 3.2 Information furnished in this manual is believed to be accurate and reliable. However, PathScale LLC assumes no responsibility for its use, nor for any infringements of patents or other rights of third parties which may result from its use. PathScale LLC reserves the right to change product specifications at any time without notice. Applications described in this document for any of these products are for illustrative purposes only. PathScale LLC makes no representation nor warranty that such applications are suitable for the specified use without further testing or modification. PathScale LLC assumes no responsibility for any errors that may appear in this document. No part of this document may be copied nor reproduced by any means, nor translated nor transmitted to any magnetic medium without the express written consent of PathScale LLC. In accordance with the terms of their valid PathScale agreements, customers are permitted to make electronic and paper copies of this document for their own exclusive use. Linux is a registered trademark of Linus Torvalds. PathScale, the PathScale logo, and EKOPath are registered trademarks of PathScale, LLC. Red Hat and all Red Hat-based trademarks are trademarks or registered trademarks of Red Hat, Inc. SuSE is a registered trademark of SuSE Linux AG. All other brand and product names are trademarks or registered trademarks of their respective owners. © 2007, 2008 PathScale, LLC. All rights reserved. © 2006, 2007 QLogic Corporation. All rights reserved worldwide.
    [Show full text]
  • Code That Performs
    PRODUCT BRIEF High-Performance Computing Intel® Parallel Studio XE 2020 Software Code that Performs Intel® Parallel Studio XE helps developers take their HPC, enterprise, AI, and cloud applications to the max—with fast, scalable, and portable parallel code Intel® Parallel Studio XE is a comprehensive suite of development tools that make it fast and easy to build modern code that gets every last ounce of performance out of the newest Intel® processors. This tool-packed suite simplifies creating code with the latest techniques in vectorization, multi-threading, multi-node, and memory optimization. Get powerful, consistent programming with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions for Intel® Xeon® Scalable processors, plus support for the latest standards and integrated development environments (IDEs). Who Needs It? • C, C++, Fortran, and Python* software developers and architects building HPC, enterprise, AI, and cloud solutions • Developers looking to maximize their software’s performance on current and future Intel® platforms What it Does • Creates faster code1. Boost application performance that scales on current and future Intel® platforms with industry-leading compilers, numerical libraries, performance profilers, and code analyzers. • Builds code faster. Simplify the process of creating fast, scalable, and reliable parallel code. • Delivers Priority Support. Connect directly to Intel’s engineers for confidential answers to technical questions, access older versions of the products, and receive free updates for a year. Paid license required. What’s New • Speed artificial intelligence inferencing. Intel® Compilers, Intel® Performance Libraries and analysis tools support Intel® Deep Learning Boost, which includes Vector Neural Network Instructions (VNNI) in 2nd generation Intel® Xeon® Scalable processors (codenamed Cascade Lake/AP platforms) • Develop for large memories of up to 512GB DIMMs with Persistence.
    [Show full text]