4.X Is the System Default

Total Page:16

File Type:pdf, Size:1020Kb

4.X Is the System Default EasyBuild Documentation Release 20190816.0 Ghent University Fri, 16 Aug 2019 18:07:17 Contents 1 Introductory topics 3 1.1 What is EasyBuild?...........................................3 1.2 Concepts and terminology........................................4 1.2.1 EasyBuild framework......................................4 1.2.2 Easyblocks...........................................5 1.2.3 Toolchains............................................5 1.2.4 Easyconfig files.........................................6 1.2.5 Extensions............................................6 1.3 Typical workflow example: building and installing WRF........................7 1.3.1 Searching for available easyconfigs files............................7 1.3.2 Getting an overview of planned installations..........................8 1.3.3 Installing a software stack...................................8 2 Getting started 11 2.1 Installing EasyBuild........................................... 11 2.1.1 Requirements.......................................... 12 2.1.2 Bootstrapping EasyBuild.................................... 12 2.1.3 Advanced bootstrapping options................................ 16 2.1.4 Updating an existing EasyBuild installation.......................... 17 2.1.5 Dependencies.......................................... 18 2.1.6 Sources............................................. 20 2.1.7 In case of installation issues. .................................. 21 2.2 Configuring EasyBuild.......................................... 21 2.2.1 Supported configuration types................................. 21 2.2.2 Overview of current configuration (--show-config, --show-full-config).... 26 2.2.3 Available configuration settings................................ 26 3 Basic usage topics 33 3.1 Using the EasyBuild command line................................... 33 3.1.1 Specifying builds........................................ 33 3.1.2 Commonly used command line options............................ 36 3.2 Writing easyconfig files: the basics................................... 45 3.2.1 What is an easyconfig (file)?.................................. 46 3.2.2 Available easyconfig parameters................................ 47 3.2.3 Mandatory easyconfig parameters............................... 47 3.2.4 Common easyconfig parameters................................ 47 3.2.5 Tweaking existing easyconfig files............................... 57 i 3.2.6 Dynamic values for easyconfig parameters........................... 58 3.2.7 Version-specific documentation relevant to easyconfigs.................... 58 3.2.8 Contributing easyconfigs.................................... 59 3.3 Understanding EasyBuild logs...................................... 59 3.3.1 Basic information........................................ 59 3.3.2 Navigating log files....................................... 60 4 Advanced usage topics 63 4.1 Archived easyconfigs........................................... 63 4.1.1 Toolchain deprecation...................................... 63 4.1.2 Using --consider-archived-easyconfigs ..................... 64 4.2 Backing up of existing modules (--backup-modules)....................... 64 4.2.1 Disabling automatic backup of modules............................ 65 4.2.2 Example............................................. 65 4.3 Common toolchains........................................... 66 4.3.1 Definition and motivation.................................... 66 4.3.2 Versioning scheme for common toolchains........................... 68 4.3.3 Update cycle for common toolchains.............................. 68 4.3.4 Overview of common toolchains................................ 68 4.3.5 Customizing common toolchains................................ 69 4.4 Generating container recipes & images................................. 70 4.4.1 Requirements.......................................... 70 4.4.2 Usage.............................................. 71 4.4.3 Configuration.......................................... 79 4.4.4 ‘Stacking’ container images.................................. 81 4.4.5 Seeding in source files for container build process....................... 82 4.5 Contributing............................................... 82 4.5.1 How to contribute........................................ 83 4.5.2 Pull requests........................................... 85 4.5.3 Review process for contributions................................ 90 4.6 Controlling compiler optimization flags................................. 93 4.6.1 Controlling target architecture specific optimizations via --optarch ............ 93 4.7 EasyBuild on Cray............................................ 96 4.7.1 Test systems........................................... 96 4.7.2 EasyBuild toolchains...................................... 96 4.7.3 What works already?...................................... 97 4.7.4 Required EasyBuild configuration............................... 97 4.7.5 Major supported/tested applications.............................. 98 4.8 Detection of loaded modules....................................... 99 4.8.1 Motivation............................................ 99 4.8.2 Detection mechanism...................................... 100 4.8.3 Action to take if loaded modules are detected......................... 100 4.8.4 Allowing particular loaded modules.............................. 102 4.8.5 Checking of $EBROOT* environment variables........................ 102 4.9 Local variables in easyconfig files.................................... 104 4.9.1 Motivation & context...................................... 104 4.9.2 Changes in EasyBuild v4.0 w.r.t. local variables in easyconfig files.............. 105 4.9.3 Recommended naming scheme for local variables in easyconfig files............. 105 4.9.4 Warning for local variables that do not follow the recommended naming scheme....... 105 4.9.5 Specifying what should be done when non-confirming local variables are found via --local-var-naming-check .............................. 106 4.9.6 Renaming local variables to match the recommended naming scheme using eb --fix-deprecated-easyconfigs ........................... 106 4.10 Experimental features.......................................... 107 ii 4.11 Extended dry run............................................. 107 4.11.1 Important notes......................................... 108 4.11.2 Overview of dry run mechanism................................ 109 4.11.3 Guidelines for easyblocks.................................... 116 4.11.4 Example output......................................... 119 4.12 Hooks................................................... 119 4.12.1 What are hooks?........................................ 120 4.12.2 Configuring EasyBuild to use your hook implementations................... 120 4.12.3 Available hooks......................................... 120 4.12.4 Implementing hooks...................................... 122 4.12.5 Caveats............................................. 122 4.12.6 Examples of hook implementations.............................. 124 4.13 Implementing easyblocks........................................ 124 4.13.1 The basics............................................ 125 4.13.2 Easyblocks vs easyconfigs................................... 126 4.13.3 Naming scheme for easyblocks................................. 126 4.13.4 Structure of an easyblock.................................... 128 4.13.5 Deriving from existing (generic) easyblocks.......................... 128 4.13.6 Specific aspects of easyblocks................................. 128 4.13.7 Using new/custom easyblocks................................. 132 4.13.8 Testing easyblocks....................................... 132 4.13.9 Use case: an easyblock for Tensorflow............................. 132 4.14 Including additional Python modules (--include-*)........................ 133 4.14.1 General aspects of --include-* options.......................... 133 4.14.2 Including additional easyblocks (--include-easyblocks)............... 134 4.14.3 Including additional module naming schemes (--include-module-naming-schemes) 134 4.14.4 Including additional toolchains (--include-toolchains)................ 135 4.15 Integration with GitHub......................................... 135 4.15.1 Requirements.......................................... 136 4.15.2 Configuration.......................................... 137 4.15.3 Checking status of GitHub integration (--check-github)................. 138 4.15.4 Using easyconfigs from pull requests (--from-pr)..................... 139 4.15.5 Uploading test reports (--upload-test-report).................... 142 4.15.6 Reviewing easyconfig pull requests (--review-pr).................... 143 4.15.7 Merging easyconfig pull requests (--merge-pr)...................... 144 4.15.8 Submitting new and updating pull requests (--new-pr, --update-pr)......... 145 4.16 Manipulating dependencies....................................... 150 4.16.1 Filtering out dependencies using --filter-deps ..................... 151 4.16.2 Installing dependencies as hidden modules using --hide-deps .............. 152 4.16.3 Using minimal toolchains for dependencies.......................... 153 4.17 Packaging support............................................ 154 4.17.1 Prerequisites.......................................... 154 4.17.2 Configuration options...................................... 155 4.17.3 Usage.............................................. 155 4.17.4 Packaging existing installations................................. 157 4.18 Partial installations...........................................
Recommended publications
  • What If What I Need Is Not in Powerai (Yet)? What You Need to Know to Build from Scratch?
    IBM Systems What if what I need is not in PowerAI (yet)? What you need to know to build from scratch? Jean-Armand Broyelle June 2018 IBM Systems – Cognitive Era Things to consider when you have to rebuild a framework © 2017 International Business Machines Corporation 2 IBM Systems – Cognitive Era CUDA Downloads © 2017 International Business Machines Corporation 3 IBM Systems – Cognitive Era CUDA 8 – under Legacy Releases © 2017 International Business Machines Corporation 4 IBM Systems – Cognitive Era CUDA 8 Install Steps © 2017 International Business Machines Corporation 5 IBM Systems – Cognitive Era cuDNN and NVIDIA drivers © 2017 International Business Machines Corporation 6 IBM Systems – Cognitive Era cuDNN v6.0 for CUDA 8.0 © 2017 International Business Machines Corporation 7 IBM Systems – Cognitive Era cuDNN and NVIDIA drivers © 2017 International Business Machines Corporation 8 IBM Systems – Cognitive Era © 2017 International Business Machines Corporation 9 IBM Systems – Cognitive Era © 2017 International Business Machines Corporation 10 IBM Systems – Cognitive Era cuDNN and NVIDIA drivers © 2017 International Business Machines Corporation 11 IBM Systems – Cognitive Era Prepare your environment • When something goes wrong it’s better to Remove local anaconda installation $ cd ~; rm –rf anaconda2 .conda • Reinstall anaconda $ cd /tmp; wget https://repo.anaconda.com/archive/Anaconda2-5.1.0-Linux- ppc64le.sh $ bash /tmp/Anaconda2-5.1.0-Linux-ppc64le.sh • Activate PowerAI $ source /opt/DL/tensorflow/bin/tensorflow-activate • When you
    [Show full text]
  • Practical C Programming, 3Rd Edition
    Practical C Programming, 3rd Edition By Steve Oualline 3rd Edition August 1997 ISBN: 1-56592-306-5 This new edition of "Practical C Programming" teaches users not only the mechanics or programming, but also how to create programs that are easy to read, maintain, and debug. It features more extensive examples and an introduction to graphical development environments. Programs conform to ANSI C. 0 TEAM FLY PRESENTS Table of Contents Preface How This Book is Organized Chapter by Chapter Notes on the Third Edition Font Conventions Obtaining Source Code Comments and Questions Acknowledgments Acknowledgments to the Third Edition I. Basics 1. What Is C? How Programming Works Brief History of C How C Works How to Learn C 2. Basics of Program Writing Programs from Conception to Execution Creating a Real Program Creating a Program Using a Command-Line Compiler Creating a Program Using an Integrated Development Environment Getting Help on UNIX Getting Help in an Integrated Development Environment IDE Cookbooks Programming Exercises 3. Style Common Coding Practices Coding Religion Indentation and Code Format Clarity Simplicity Summary 4. Basic Declarations and Expressions Elements of a Program Basic Program Structure Simple Expressions Variables and Storage 1 TEAM FLY PRESENTS Variable Declarations Integers Assignment Statements printf Function Floating Point Floating Point Versus Integer Divide Characters Answers Programming Exercises 5. Arrays, Qualifiers, and Reading Numbers Arrays Strings Reading Strings Multidimensional Arrays Reading Numbers Initializing Variables Types of Integers Types of Floats Constant Declarations Hexadecimal and Octal Constants Operators for Performing Shortcuts Side Effects ++x or x++ More Side-Effect Problems Answers Programming Exercises 6.
    [Show full text]
  • Open Source Copyrights
    Kuri App - Open Source Copyrights: 001_talker_listener-master_2015-03-02 ===================================== Source Code can be found at: https://github.com/awesomebytes/python_profiling_tutorial_with_ros 001_talker_listener-master_2016-03-22 ===================================== Source Code can be found at: https://github.com/ashfaqfarooqui/ROSTutorials acl_2.2.52-1_amd64.deb ====================== Licensed under GPL 2.0 License terms can be found at: http://savannah.nongnu.org/projects/acl/ acl_2.2.52-1_i386.deb ===================== Licensed under LGPL 2.1 License terms can be found at: http://metadata.ftp- master.debian.org/changelogs/main/a/acl/acl_2.2.51-8_copyright actionlib-1.11.2 ================ Licensed under BSD Source Code can be found at: https://github.com/ros/actionlib License terms can be found at: http://wiki.ros.org/actionlib actionlib-common-1.5.4 ====================== Licensed under BSD Source Code can be found at: https://github.com/ros-windows/actionlib License terms can be found at: http://wiki.ros.org/actionlib adduser_3.113+nmu3ubuntu3_all.deb ================================= Licensed under GPL 2.0 License terms can be found at: http://mirrors.kernel.org/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all. deb alsa-base_1.0.25+dfsg-0ubuntu4_all.deb ====================================== Licensed under GPL 2.0 License terms can be found at: http://mirrors.kernel.org/ubuntu/pool/main/a/alsa- driver/alsa-base_1.0.25+dfsg-0ubuntu4_all.deb alsa-utils_1.0.27.2-1ubuntu2_amd64.deb ======================================
    [Show full text]
  • Nosql Databases
    Query & Exploration SQL, Search, Cypher, … Stream Processing Platforms Data Storm, Spark, .. Data Ingestion Serving ETL, Distcp, Batch Processing Platforms BI, Cubes, Kafka, MapReduce, SparkSQL, BigQuery, Hive, Cypher, ... RDBMS, Key- OpenRefine, value Stores, … Data Definition Tableau, … SQL DDL, Avro, Protobuf, CSV Storage Systems HDFS, RDBMS, Column Stores, Graph Databases Computing Platforms Distributed Commodity, Clustered High-Performance, Single Node Query & Exploration SQL, Search, Cypher, … Stream Processing Platforms Data Storm, Spark, .. Data Ingestion Serving ETL, Distcp, Batch Processing Platforms BI, Cubes, Kafka, MapReduce, SparkSQL, BigQuery, Hive, Cypher, ... RDBMS, Key- OpenRefine, value Stores, … Data Definition Tableau, … SQL DDL, Avro, Protobuf, CSV Storage Systems HDFS, RDBMS, Column Stores, Graph Databases Computing Platforms Distributed Commodity, Clustered High-Performance, Single Node Computing Single Node Parallel Distributed Computing Computing Computing CPU GPU Grid Cluster Computing Computing A single node (usually multiple cores) Attached to a data store (Disc, SSD, …) One process with potentially multiple threads R: All processing is done on one computer BidMat: All processing is done on one computer with specialized HW Single Node In memory Retrieve/Stores from Disc Pros Simple to program and debug Cons Can only scale-up Does not deal with large data sets Single Node solution for large scale exploratory analysis Specialized HW and SW for efficient Matrix operations Elements: Data engine software for
    [Show full text]
  • Easybuild Documentation Release 20210907.0
    EasyBuild Documentation Release 20210907.0 Ghent University Tue, 07 Sep 2021 08:55:41 Contents 1 What is EasyBuild? 3 2 Concepts and terminology 5 2.1 EasyBuild framework..........................................5 2.2 Easyblocks................................................6 2.3 Toolchains................................................7 2.3.1 system toolchain.......................................7 2.3.2 dummy toolchain (DEPRECATED) ..............................7 2.3.3 Common toolchains.......................................7 2.4 Easyconfig files..............................................7 2.5 Extensions................................................8 3 Typical workflow example: building and installing WRF9 3.1 Searching for available easyconfigs files.................................9 3.2 Getting an overview of planned installations.............................. 10 3.3 Installing a software stack........................................ 11 4 Getting started 13 4.1 Installing EasyBuild........................................... 13 4.1.1 Requirements.......................................... 14 4.1.2 Using pip to Install EasyBuild................................. 14 4.1.3 Installing EasyBuild with EasyBuild.............................. 17 4.1.4 Dependencies.......................................... 19 4.1.5 Sources............................................. 21 4.1.6 In case of installation issues. .................................. 22 4.2 Configuring EasyBuild.......................................... 22 4.2.1 Supported configuration
    [Show full text]
  • Cscope Security Update (RHSA-2009-1101)
    cscope security update (RHSA-2009-1101) Original Release Date: June 16, 2009 Last Revised: June 16, 2009 Number: ASA-2009-236 Risk Level: None Advisory Version: 1.0 Advisory Status: Final 1. Overview: cscope is a mature, ncurses-based, C source-code tree browsing tool. Multiple buffer overflow flaws were found in cscope. An attacker could create a specially crafted source code file that could cause cscope to crash or, possibly, execute arbitrary code when browsed with cscope. The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the names CVE-2004-2541, CVE-2006-4262, CVE-2009- 0148 and CVE-2009-1577 to these issues. Note: This advisory is specific to RHEL3 and RHEL4. No Avaya system products are vulnerable, as cscope is not installed by default. More information about these vulnerabilities can be found in the security advisory issued by RedHat Linux: · https://rhn.redhat.com/errata/RHSA-2009-1101.html 2. Avaya System Products using RHEL3 or RHEL4 with cscope installed: None 3. Avaya Software-Only Products: Avaya software-only products operate on general-purpose operating systems. Occasionally vulnerabilities may be discovered in the underlying operating system or applications that come with the operating system. These vulnerabilities often do not impact the software-only product directly but may threaten the integrity of the underlying platform. In the case of this advisory Avaya software-only products are not affected by the vulnerability directly but the underlying Linux platform may be. Customers should determine on which Linux operating system the product was installed and then follow that vendor's guidance.
    [Show full text]
  • Mochi-JCST-01-20.Pdf
    Ross R, Amvrosiadis G, Carns P et al. Mochi: Composing data services for high-performance computing environments. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 35(1): 121–144 Jan. 2020. DOI 10.1007/s11390-020-9802-0 Mochi: Composing Data Services for High-Performance Computing Environments Robert B. Ross1, George Amvrosiadis2, Philip Carns1, Charles D. Cranor2, Matthieu Dorier1, Kevin Harms1 Greg Ganger2, Garth Gibson3, Samuel K. Gutierrez4, Robert Latham1, Bob Robey4, Dana Robinson5 Bradley Settlemyer4, Galen Shipman4, Shane Snyder1, Jerome Soumagne5, and Qing Zheng2 1Argonne National Laboratory, Lemont, IL 60439, U.S.A. 2Parallel Data Laboratory, Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A. 3Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada 4Los Alamos National Laboratory, Los Alamos NM, U.S.A. 5The HDF Group, Champaign IL, U.S.A. E-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected] E-mail: [email protected]; [email protected]; [email protected]; [email protected] E-mail: [email protected]; [email protected]; [email protected]; {bws, gshipman}@lanl.gov E-mail: [email protected]; [email protected]; [email protected] Received July 1, 2019; revised November 2, 2019. Abstract Technology enhancements and the growing breadth of application workflows running on high-performance computing (HPC) platforms drive the development of new data services that provide high performance on these new platforms, provide capable and productive interfaces and abstractions for a variety of applications, and are readily adapted when new technologies are deployed. The Mochi framework enables composition of specialized distributed data services from a collection of connectable modules and subservices.
    [Show full text]
  • Openimageio 1.7 Programmer Documentation (In Progress)
    OpenImageIO 1.7 Programmer Documentation (in progress) Editor: Larry Gritz [email protected] Date: 31 Mar 2016 ii The OpenImageIO source code and documentation are: Copyright (c) 2008-2016 Larry Gritz, et al. All Rights Reserved. The code that implements OpenImageIO is licensed under the BSD 3-clause (also some- times known as “new BSD” or “modified BSD”) license: Redistribution and use in source and binary forms, with or without modification, are per- mitted provided that the following conditions are met: • Redistributions of source code must retain the above copyright notice, this list of condi- tions and the following disclaimer. • Redistributions in binary form must reproduce the above copyright notice, this list of con- ditions and the following disclaimer in the documentation and/or other materials provided with the distribution. • Neither the name of the software’s owners nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIB- UTORS ”AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FIT- NESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUD- ING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABIL- ITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
    [Show full text]
  • Linux Kernel and Driver Development Training Slides
    Linux Kernel and Driver Development Training Linux Kernel and Driver Development Training © Copyright 2004-2021, Bootlin. Creative Commons BY-SA 3.0 license. Latest update: October 9, 2021. Document updates and sources: https://bootlin.com/doc/training/linux-kernel Corrections, suggestions, contributions and translations are welcome! embedded Linux and kernel engineering Send them to [email protected] - Kernel, drivers and embedded Linux - Development, consulting, training and support - https://bootlin.com 1/470 Rights to copy © Copyright 2004-2021, Bootlin License: Creative Commons Attribution - Share Alike 3.0 https://creativecommons.org/licenses/by-sa/3.0/legalcode You are free: I to copy, distribute, display, and perform the work I to make derivative works I to make commercial use of the work Under the following conditions: I Attribution. You must give the original author credit. I Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. I For any reuse or distribution, you must make clear to others the license terms of this work. I Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. Document sources: https://github.com/bootlin/training-materials/ - Kernel, drivers and embedded Linux - Development, consulting, training and support - https://bootlin.com 2/470 Hyperlinks in the document There are many hyperlinks in the document I Regular hyperlinks: https://kernel.org/ I Kernel documentation links: dev-tools/kasan I Links to kernel source files and directories: drivers/input/ include/linux/fb.h I Links to the declarations, definitions and instances of kernel symbols (functions, types, data, structures): platform_get_irq() GFP_KERNEL struct file_operations - Kernel, drivers and embedded Linux - Development, consulting, training and support - https://bootlin.com 3/470 Company at a glance I Engineering company created in 2004, named ”Free Electrons” until Feb.
    [Show full text]
  • Towards a Fully Automated Extraction and Interpretation of Tabular Data Using Machine Learning
    UPTEC F 19050 Examensarbete 30 hp August 2019 Towards a fully automated extraction and interpretation of tabular data using machine learning Per Hedbrant Per Hedbrant Master Thesis in Engineering Physics Department of Engineering Sciences Uppsala University Sweden Abstract Towards a fully automated extraction and interpretation of tabular data using machine learning Per Hedbrant Teknisk- naturvetenskaplig fakultet UTH-enheten Motivation A challenge for researchers at CBCS is the ability to efficiently manage the Besöksadress: different data formats that frequently are changed. Significant amount of time is Ångströmlaboratoriet Lägerhyddsvägen 1 spent on manual pre-processing, converting from one format to another. There are Hus 4, Plan 0 currently no solutions that uses pattern recognition to locate and automatically recognise data structures in a spreadsheet. Postadress: Box 536 751 21 Uppsala Problem Definition The desired solution is to build a self-learning Software as-a-Service (SaaS) for Telefon: automated recognition and loading of data stored in arbitrary formats. The aim of 018 – 471 30 03 this study is three-folded: A) Investigate if unsupervised machine learning Telefax: methods can be used to label different types of cells in spreadsheets. B) 018 – 471 30 00 Investigate if a hypothesis-generating algorithm can be used to label different types of cells in spreadsheets. C) Advise on choices of architecture and Hemsida: technologies for the SaaS solution. http://www.teknat.uu.se/student Method A pre-processing framework is built that can read and pre-process any type of spreadsheet into a feature matrix. Different datasets are read and clustered. An investigation on the usefulness of reducing the dimensionality is also done.
    [Show full text]
  • Enforcing Abstract Immutability
    Enforcing Abstract Immutability by Jonathan Eyolfson A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Electrical and Computer Engineering Waterloo, Ontario, Canada, 2018 © Jonathan Eyolfson 2018 Examining Committee Membership The following served on the Examining Committee for this thesis. The decision of the Examining Committee is by majority vote. External Examiner Ana Milanova Associate Professor Rensselaer Polytechnic Institute Supervisor Patrick Lam Associate Professor University of Waterloo Internal Member Lin Tan Associate Professor University of Waterloo Internal Member Werner Dietl Assistant Professor University of Waterloo Internal-external Member Gregor Richards Assistant Professor University of Waterloo ii I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. iii Abstract Researchers have recently proposed a number of systems for expressing, verifying, and inferring immutability declarations. These systems are often rigid, and do not support “abstract immutability”. An abstractly immutable object is an object o which is immutable from the point of view of any external methods. The C++ programming language is not rigid—it allows developers to express intent by adding immutability declarations to methods. Abstract immutability allows for performance improvements such as caching, even in the presence of writes to object fields. This dissertation presents a system to enforce abstract immutability. First, we explore abstract immutability in real-world systems. We found that developers often incorrectly use abstract immutability, perhaps because no programming language helps developers correctly implement abstract immutability.
    [Show full text]
  • A Dataset for Github Repository Deduplication
    A Dataset for GitHub Repository Deduplication Diomidis Spinellis Audris Mockus Zoe Kotti [email protected] {dds,zoekotti}@aueb.gr University of Tennessee Athens University of Economics and Business ABSTRACT select distinct p1, p2 from( select project_commits.project_id as p2, GitHub projects can be easily replicated through the site’s fork first_value(project_commits.project_id) over( process or through a Git clone-push sequence. This is a problem for partition by commit_id empirical software engineering, because it can lead to skewed re- order by mean_metric desc) as p1 sults or mistrained machine learning models. We provide a dataset from project_commits of 10.6 million GitHub projects that are copies of others, and link inner join forkproj.all_project_mean_metric each record with the project’s ultimate parent. The ultimate par- on all_project_mean_metric.project_id = ents were derived from a ranking along six metrics. The related project_commits.project_id) as shared_commits projects were calculated as the connected components of an 18.2 where p1 != p2; million node and 12 million edge denoised graph created by direct- Listing 1: Identification of projects with common commits ing edges to ultimate parents. The graph was created by filtering out more than 30 hand-picked and 2.3 million pattern-matched GitHub contains many millions of copied projects. This is a prob- clumping projects. Projects that introduced unwanted clumping lem for empirical software engineering. First, when data contain- were identified by repeatedly visualizing shortest path distances ing multiple copies of a repository are analyzed, the results can between unrelated important projects. Our dataset identified 30 end up skewed [27]. Second, when such data are used to train thousand duplicate projects in an existing popular reference dataset machine learning models, the corresponding models can behave of 1.8 million projects.
    [Show full text]