gem5art Release 0.2.1

Jul 01, 2021

Contents

1 Introduction 3 1.1 Installing gem5art...... 4

2 Artifacts 5 2.1 gem5art artifacts...... 5 2.2 ArtifactDB...... 7 2.3 Artifacts API Documentation...... 9

3 Run 15 3.1 Introduction...... 15 3.2 SE and FS mode runs...... 15 3.3 Running an experiment...... 16 3.4 Run Already in the Database...... 17 3.5 Searching the Database to find Runs...... 17 3.6 Searching the Database to find Runs with Specific Names...... 18 3.7 Runs API Documentation...... 18

4 Tasks 21 4.1 Use of Python Multiprocessing...... 21 4.2 Use of Celery...... 21 4.3 Tasks API Documentation...... 22

5 Disk Images 23 5.1 Introduction...... 23 5.2 Building a Simple Disk Image with Packer...... 23

6 Frequently Asked Questions 27

7 Tutorial: Run Full System Boot Tests 29 7.1 Introduction...... 29 7.2 Setting up the environment...... 30 7.3 Building gem5...... 30 7.4 Creating a disk image...... 31 7.5 Compiling the linux kernel...... 32 7.6 gem5 run scripts...... 33 7.7 Database and Celery Server...... 33 7.8 Creating a launch script...... 33

i 7.9 If Using Celery...... 37 7.10 If Using Python Multiprocessing Library:...... 37 7.11 Results...... 38

8 Tutorial: Run NAS Parallel Benchmarks with gem5 39 8.1 Introduction...... 39 8.2 Setting up the environment...... 40 8.3 Building gem5...... 41 8.4 Creating a disk image...... 41 8.5 Compiling the linux kernel...... 45 8.6 gem5 run scripts...... 45 8.7 Database and Celery Server...... 46 8.8 Creating a launch script...... 46 8.9 If Using Celery...... 49 8.10 If Using Python Multiprocessing Library:...... 49 8.11 Results...... 50

9 Tutorial: Run Microbenchmarks with gem5 51 9.1 Introduction...... 51 9.2 Setting up the environment...... 51 9.3 Build gem5...... 52 9.4 Download and compile the microbenchmarks...... 52 9.5 gem5 run scripts...... 53 9.6 Database and Celery Server...... 53 9.7 Creating a launch script...... 53

10 Tutorial: Run SPEC CPU 2017 / SPEC CPU 2006 Benchmarks in Full System Mode with gem5art 57 10.1 Introduction...... 57 10.2 Setting up the Experiment...... 60 10.3 Running the Experiment...... 64 10.4 Appendix I. Working Status...... 67 10.5 Appendix II. Disk Image Generation Scripts...... 67

11 Tutorial: Run PARSEC Benchmarks with gem5 69 11.1 Introduction...... 69 11.2 Setting up the environment...... 70 11.3 Building gem5...... 71 11.4 Creating a disk image...... 71 11.5 Compiling the linux kernel...... 75 11.6 gem5 run scripts...... 75 11.7 Database and Celery Server...... 75 11.8 Creating a launch script...... 76 11.9 Results...... 79

12 Indices and tables 83

Python Module Index 85

Index 87

ii gem5art, Release 0.2.1

Authors: • Jason Lowe-Power

The gem5art project is a set of Python modules to help make it easier to run experiments with the gem5 simulator. gem5art contains libraries for Artifacts, Reproducibility, and Testing. You can think of gem5art as a structured protocol for running gem5 experiments When running an experiment, there are inputs, steps to run the experiment, and outputs. gem5art tracks all of these through Artifacts. An artifact is a some object, usually a file, which is used as part of the experiment. The gem5art project contains an interface to store all of these artifacts in a database. This database is mainly used to aid reproducibility, for instance, when you want to go back and re-run an experiment. However, it can also be used to share artifacts with others doing similar experiments (e.g., a disk image with a shared workload). The database is also used to store results from gem5 runs. Given all of the input artifacts, these runs have enough information to reproduce exactly the same experimental output. Additionally, there is metadata associated with each gem5 run (e.g., the experiment name, the script name, script parameters, gem5 binary name, etc.) which are useful for aggregating results from many experiments. These experimental aggregates are useful for testing gem5 as well as conducting research. We will be using this data by aggregating the data from 100s or 1000s of gem5 experiments to determine the state of gem5’s codebase at any particular time. For instance, as discussed in the Linux boot tutorial, we can use this data to determine which Linux kernels, Ubuntu versions, and boot types are currently functional in gem5.

One of the underlying themes of gem5art is that you should fully understand each piece of the experiment you’re running. To this end, gem5art requires that every artifact for a particular experiment is explicitly defined. Additionally, we encourage the use of Python scripts at every level of experimentation from the workload and disk image creation to running gem5. By using Python scripts, you can both automate and document the processes for running your experiments. Many of the ideas used to develop gem5art came from our experience using gem5 and the pain points of running complex experiments. Jason Lowe-Power used gem5 extensively during his PhD at University of Wisconsin-Madison. Through this experience, he made many mistakes and lost an untold number of days trying to reproduce experiments or re-creating artifacts that were accidentally deleted or moved. gem5art was designed to reduce the likelihood that this happens to other researchers.

Authors: • Ayaz Akram • Jason Lowe-Power

Contents 1 gem5art, Release 0.2.1

2 Contents CHAPTER 1

Introduction

The primary motivation behind gem5art is to provide an infrastructure to use a structured approach to run experiments with gem5. Particular goals of gem5art include: • structured gem5 experiments • easy to use • resource sharing • reproducibility • easy to extend • documentation of the performed experiments gem5art is mainly composed of the following components: • a database to store artifacts (gem5art-artifacts) • python objects to wrap gem5 experiments (gem5art-run) • a celery worker to manage gem5 jobs (gem5art-tasks) The process of performing experiments using gem5 can quickly become complicated due to involvement of multiple components. This can be intimidating for new users, and it can be difficult to manage even for experienced researchers. As an example, following is a diagram which shows the interaction that takes place among different components (artifacts) while running full-system experiments with gem5.

3 gem5art, Release 0.2.1

Figure: Flowchart of gem5 full system mode use case Each bubble in the diagram represents a different artifact which is one small part of a gem5 experiment culminating in the results from the gem5 execution. All of the lines show dependencies between artifacts (e.g., the disk image depends on the m5 binary). You can imagine everything in this example to be contained in a base git repository (base repo) artifact which can keep track of changes in files not tracked by other repositories. Packer is a tool to generate disk images and serves as an input to the disk image artifact. gem5 source code repo artifact serves as an input to two other artifacts (gem5 binary and m5 utility). Linux source repository and base repository (specifically kernel config files) are used to build the disk image and multiple artifacts then generate the final results artifact. gem5art serves as a tool/infrastructure to streamline this entire process and keeps track of things as they change, thus leading to reproducible runs. Moreover, it allows to share the artifacts used in the above example, among multiple users. Additionally, gem5art tracks results like all other artifacts, so they can be archived and queried later to aggregate many different gem5 experiments.

1.1 Installing gem5art gem5art is available as a PyPI package and can be installed using pip. Since, gem5art requires Python 3, we recom- mend creating a virtual environment with Python 3 before using gem5art. Run the following commands to create a virtual environment and install gem5art: virtualenv -p python3 venv source venv/bin/activate pip install gem5art-artifact gem5art-run gem5art-tasks

Authors: • Ayaz Akram • Jason Lowe-Power

4 Chapter 1. Introduction CHAPTER 2

Artifacts

2.1 gem5art artifacts

All unique objects used during gem5 experiments are termed “artifacts” in gem5art. Examples of artifacts include: gem5 binary, gem5 source code repo, Linux kernel source repo, linux binary, disk image, and packer binary (used to build the disk image). The goal of this infrastructure is to keep a record of all the artifacts used in a particular experiment and to return the set of used artifacts when the same experiment needs to be performed in the future. The description of an artifact serves as the documentation of how that artifact was created. One of the goals of gem5art is for these artifacts to be self contained. With just the metadata stored with the artifact a third party should be able to perfectly reproduce the artifact. (We are still working toward this goal. For instance, we are looking into using docker to create artifacts to separate artifact creation from the host platform its run on.) Each artifact is characterized by a set of attributes, described below: • command: command used to build this artifact • typ: type of the artifact e.g. binary, git repo etc. • name: name of the artifact • cwd: current working directory, where the command to build the artifact is run • path: actual path of the location of the artifact • inputs: a list of the artifacts used to build the current artifact • documentation: a docstring explaining the purpose of the artifact and any other useful information that can help to reproduce the artifact Additionally, each artifact also has the following implicit information. • hash: an MD5 hash for a binary artifact or a git hash for a git artifact • time: time of the creation of an artifact • id: a UUID associated with the artifact

5 gem5art, Release 0.2.1

• git: a dictionary containing the origin, current commit and the repo name for a git artifact (will be an empty dictionary for other types of artifacts) These attribute are not specified by the user, but are generated by gem5art automatically (when the Artifact object is created for the first time). An example of how a user would create a gem5 binary artifact using gem5art is shown below. In this example, the type, name, and documentation are up to the user of gem5art. You’re encouraged to use names that are easy to remember when you later query the database. The documentation attribute should be used to completely describe the artifact that you are saving. gem5_binary= Artifact.registerArtifact( command='scons build//gem5.opt', typ='gem5 binary', name='gem5', cwd='gem5/', path='gem5/build/X86/gem5.opt', inputs= [gem5_repo,], documentation= ''' Default gem5 binary compiled for the X86 ISA. This was built from the main gem5 repo (gem5.googlesource.com) without any modifications. We recently updated to the current gem5 master which has a fix for memory channel address striping. ''' )

Another goal of gem5art is to enable sharing of artifacts among multiple users, which is achieved through the use of the centralized database. Basically, whenever a user tries to create a new artifact, the database is searched to find if the same artifact exists there. If it does, the user can download the matching artifact for use. Otherwise, the newly created artifact is uploaded to the database for later use. The use of database also avoids running identical experiments (by generating an error message if a user tries to execute exact run which already exists in the database).

2.1.1 Creating artifacts

To create an Artifact, you must use registerArtifact as shown in the above example as well. This is a factory method which will initially create the artifact. When calling registerArtifact, the artifact will automatically be added to the database. If it already exists, a pointer to that artifact will be returned. The parameters to the registerArtifact function are meant for documentation, not as explicit directions to create the artifact from scratch. In the future, this feature may be added to gem5art. Note: While creating new artifacts, warning messages showing that certain attributes (except hash and id) of two artifacts don’t match (when artifact similarity is checked in the code) might appear. Users should make sure that they understand the reasons of any such warnings.

2.1.2 Using artifacts from the database

You can create an artifact with just a UUID if it is already stored in the database. The behavior will be the same as when creating an artifact that already exists. All of the properties of the artifact will be populated from the database.

6 Chapter 2. Artifacts gem5art, Release 0.2.1

2.2 ArtifactDB

The particular database used in this work is MongoDB. We use MongoDB since it can easily store large files (e.g., disk images), is tightly integrated with Python through pymongo, and has an interface that is flexible as the needs of gem5art changes. Currently, it’s required to run a database to use gem5. However, we are planning on changing this default to allow gem5art to be used standalone as well. gem5art allows you to connect to any database, but by default assumes there is a MongoDB instance running on the localhost at mongodb://localhost:27017. You can use the environment variable GEM5ART_DB to specify the default database to connect when running simple scripts. Additionally, you can specify the location of the database when calling getDBConnection in your scripts. In case no database exists or a user want their own database, you can create a new database by creating a new directory and running the mongodb docker image. See the MongoDB docker documentation or the MongoDB documentation for more information.

`docker run -p 27017:27017 -v :/data/db --

˓→name mongo- -d mongo`

This uses the official MongoDB Docker image to run the database at the default port on the localhost. If the Docker container is killed, it can be restarted with the same command line and the database should be consistent.

2.2.1 Connecting to an existing database

By default, gem5art will assume the database is running at mongodb://localhost:27017, which is MongoDB’s default on the localhost. The environment variable GEM5ART_DB can override this default. Otherwise, to programmatically set a database URI when using gem5art, you can pass a URI to the getDatabaseConnection function. Currently, gem5art only supports MongoDB database backends, but extending this to other databases should be straightforward.

2.2.2 Searching the Database gem5art provides a few convience functions for searching and accessing the database. These functions can be found in artifact.common_queries. Specifically, we provide the following functions: • getByName: Returns all objects mathching name in database. • getDiskImages: Returns a generator of disk images (type = disk image). • getLinuxBinaries: Returns a generator of Linux kernel binaries (type = kernel). • getgem5Binaries: Returns a generator of gem5 binaries (type = gem5 binary).

2.2.3 Downloading from the Database

You can also download a file associated with an artifact using functions provided by gem5art. A good way to search and download items from the database is by using the Python interactive shell. You can search the database with the

2.2. ArtifactDB 7 gem5art, Release 0.2.1 functions provided by the artifact module (e.g., getByName, getByType, etc.). Then, once you’ve found the ID of the artifact you’d like to download, you can call downloadFile. See the example below.

$ python Python3.6.8(default, Oct7 2019, 12:59:55) [GCC8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from gem5art.artifact import * >>> db= getDBConnection() >>> for i in getDiskImages(db, limit=2): print(i) ... ubuntu id: d4a54de8-3a1f-4d4d-9175-53c15e647afd type: disk image path: disk-image/ubuntu-image/ubuntu inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:94092971-4277-

˓→4d38-9e4a-495a7119a5e5, m5:69dad8b1-48d0-43dd-a538-f3196a894804 Ubuntu with m5 binary installed and root auto login ubuntu id: c54b8805-48d6-425d-ac81-9b1badba206e type: disk image path: disk-image/ubuntu-image/ubuntu inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:5bfaab52-7d04-

˓→49f2-8fea-c5af8a7f34a8, m5:69dad8b1-48d0-43dd-a538-f3196a894804 Ubuntu with m5 binary installed and root auto login >>> for i in getLinuxBinaries(db, limit=2): print(i) ... vmlinux-5.2.3 id: 8cfd9fbe-24d0-40b5-897e-beca3df80dd2 type: kernel path: linux-stable/vmlinux-5.2.3 inputs: fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, linux-stable:25feca9a-

˓→3642-458e-a179-f3705266b2fe Kernel binary for 5.2.3 with simple config file vmlinux-5.2.3 id: 9721d8c9-dc41-49ba-ab5c-3ed169e24166 type: kernel path: linux-stable/vmlinux-5.2.3 inputs: npb:85e6dd97-c946-4596-9b52-0bb145810d68, linux-stable:25feca9a-3642-458e-

˓→a179-f3705266b2fe Kernel binary for 5.2.3 with simple config file >>> from uuid import UUID >>> db.downloadFile(UUID('8cfd9fbe-24d0-40b5-897e-beca3df80dd2'), 'linux-stable/

˓→vmlinux-5.2.3')

For another example, assume there is a disk image named npb (containing NAS Parallel Benchmarks) in your database and you want to download the disk image to your local directory. You can do the following to download the disk image: import gem5art.artifact db= gem5art.artifact.getDBConnection() disks= gem5art.artifact.getByName(db,'npb') for disk in disks: if disk.type =='disk image' and disk.documentation =='npb disk image created on

˓→Nov 20': (continues on next page)

8 Chapter 2. Artifacts gem5art, Release 0.2.1

(continued from previous page) db.downloadFile(disk._id,'npb')

Here, we assume that there can be multiple disk images/artifacts with the name npb and we are only interested in downloading the npb disk image with a particular documentation (’npb disk image created on Nov 20’). Also, note that there is not a single way to download files from the database (although they will eventually use the downloadFile function). The dual of the downloadFile method used above is upload.

Database schema

Alternative, you can use the pymongo Python module or the mongodb command line interface to interact with the database. See the MongoDB documentation for more information on how to query the MongoDB database. gem5art has two collections. artifact_database.artifacts stores all of the metadata for the artifacts and artifact_database.fs is a GridFS store for all of the files. The files in the GridFS use the same UUIDs as the Artifacts as their primary keys. You can list all of the details of all of the artifacts by running the following in Python.

#!/usr/bin/env python3

from pymongo import MongoClient

db= MongoClient().artifact_database for i in db.artifacts.find(): print(i)

gem5art also provides a few methods to search the database for artifacts of a particular type or name. For example, to find all disk images in a database you can do the following:

import gem5art.artifact db= gem5art.artifact.getDBConnection('mongodb://localhost') for i in gem5art.artifact.getDiskImages(db): print(i)

Other similar methods include: getLinuxBinaries(), getgem5Binaries() You can use getByName() method to search database for artifacts using the name attribute. For example, to search for gem5 named artifacts:

import gem5art.artifact db= gem5art.artifact.getDBConnection('mongodb://localhost') for i in gem5art.artifact.getByName(db,"gem5"): print(i)

2.3 Artifacts API Documentation

2.3.1 Artifact Module

This is the gem5 artifact package class gem5art.artifact.Artifact(other: Union[str, uuid.UUID, Dict[str, Any]]) A base artifact class. It holds following attributes of an artifact:

2.3. Artifacts API Documentation 9 gem5art, Release 0.2.1

1) name: name of the artifact 2) command: bash command used to generate the artifact 3) path: path of the location of the artifact 4) time: time of creation of the artifact 5) documentation: a string to describe the artifact 6) ID: unique identifier of the artifact 7) inputs: list of the input artifacts used to create this artifact stored as a list of uuids classmethod registerArtifact(command: str, name: str, cwd: str, typ: str, path: Union[str, pathlib.Path], documentation: str, inputs: List[Artifact] = []) → gem5art.artifact.artifact.Artifact Constructs a new artifact. This assume either it’s not in the database or it is the exact same as when it was added to the database gem5art.artifact.getByName(db: gem5art.artifact._artifactdb.ArtifactDB, name: str, limit: int = 0) → Iterator[gem5art.artifact.artifact.Artifact] Returns all objects mathching name in database. Limit specifies the maximum number of results to return. gem5art.artifact.getDiskImages(db: gem5art.artifact._artifactdb.ArtifactDB, limit: int = 0) → Iterator[gem5art.artifact.artifact.Artifact] Returns a generator of disk images (type = disk image). Limit specifies the maximum number of results to return. gem5art.artifact.getLinuxBinaries(db: gem5art.artifact._artifactdb.ArtifactDB, limit: int = 0) → Iterator[gem5art.artifact.artifact.Artifact] Returns a generator of Linux kernel binaries (type = kernel). Limit specifies the maximum number of results to return. gem5art.artifact.getgem5Binaries(db: gem5art.artifact._artifactdb.ArtifactDB, limit: int = 0) → Iterator[gem5art.artifact.artifact.Artifact] Returns a generator of gem5 binaries (type = gem5 binary). Limit specifies the maximum number of results to return. gem5art.artifact.getDBConnection(uri: str = ”) → gem5art.artifact._artifactdb.ArtifactDB Returns the database connection uri: a string representing the URI of the database. See _getDBType for details. If no URI is given we use the default (mongodb://localhost:27017) or the value in the GEM5ART_DB environment variable. If the connection has not been established, this will create a new connection. If the connection has been estab- lished, this will replace the connection if the uri input is non-empy.

2.3.2 Artifact

File contains the Artifact class and helper functions class gem5art.artifact.artifact.Artifact(other: Union[str, uuid.UUID, Dict[str, Any]]) A base artifact class. It holds following attributes of an artifact: 1) name: name of the artifact 2) command: bash command used to generate the artifact

10 Chapter 2. Artifacts gem5art, Release 0.2.1

3) path: path of the location of the artifact 4) time: time of creation of the artifact 5) documentation: a string to describe the artifact 6) ID: unique identifier of the artifact 7) inputs: list of the input artifacts used to create this artifact stored as a list of uuids classmethod registerArtifact(command: str, name: str, cwd: str, typ: str, path: Union[str, pathlib.Path], documentation: str, inputs: List[Artifact] = []) → gem5art.artifact.artifact.Artifact Constructs a new artifact. This assume either it’s not in the database or it is the exact same as when it was added to the database gem5art.artifact.artifact.getGit(path: pathlib.Path) → Dict[str, str] Returns dictionary with origin, current commit, and repo name for the base repository for path. An exception is generated if the repo is dirty or doesn’t exist gem5art.artifact.artifact.getHash(path: pathlib.Path) → str Returns an md5 hash for the file in self.path.

2.3.3 Artifact

A base artifact class. It holds following attributes of an artifact: 1) name: name of the artifact 2) command: bash command used to generate the artifact 3) path: path of the location of the artifact 4) time: time of creation of the artifact 5) documentation: a string to describe the artifact 6) ID: unique identifier of the artifact 7) inputs: list of the input artifacts used to create this artifact stored as a list of uuids

2.3.4 Helper Functions for Common Queries

File contains the some helper functions with common queries for artifacts in the ArtifactDB. gem5art.artifact.common_queries.getByName(db: gem5art.artifact._artifactdb.ArtifactDB, name: str, limit: int = 0) → Itera- tor[gem5art.artifact.artifact.Artifact] Returns all objects mathching name in database. Limit specifies the maximum number of results to return. gem5art.artifact.common_queries.getDiskImages(db: gem5art.artifact._artifactdb.ArtifactDB, limit: int = 0) → Itera- tor[gem5art.artifact.artifact.Artifact] Returns a generator of disk images (type = disk image). Limit specifies the maximum number of results to return.

2.3. Artifacts API Documentation 11 gem5art, Release 0.2.1 gem5art.artifact.common_queries.getLinuxBinaries(db: gem5art.artifact._artifactdb.ArtifactDB, limit: int = 0) → Itera- tor[gem5art.artifact.artifact.Artifact] Returns a generator of Linux kernel binaries (type = kernel). Limit specifies the maximum number of results to return. gem5art.artifact.common_queries.getgem5Binaries(db: gem5art.artifact._artifactdb.ArtifactDB, limit: int = 0) → Itera- tor[gem5art.artifact.artifact.Artifact] Returns a generator of gem5 binaries (type = gem5 binary). Limit specifies the maximum number of results to return.

2.3.5 AritifactDB

This is mostly internal. This file defines the ArtifactDB type and some common implementations of ArtifactDB. The database interface defined here does not include any schema information. The database “schema” is defined in the artifact.py file based on the types of artifacts stored in the database. Some common queries can be found in common_queries.py class gem5art.artifact._artifactdb.ArtifactDB(uri: str) Abstract base class for all artifact DBs. downloadFile(key: uuid.UUID, path: pathlib.Path) → None Download the file with the _id key to the path. Will overwrite the file if it currently exists. get(key: Union[uuid.UUID, str]) → Dict[str, str] Key can be a UUID or a string. Returns a dictionary to construct an artifact. put(key: uuid.UUID, artifact: Dict[str, Union[str, uuid.UUID]]) → None Insert the artifact into the database with the key searchByLikeNameType(name: str, typ: str, limit: int) → Iterable[Dict[str, Any]] Returns an iterable of all artifacts in the database that match some type and a regex name. Note: Not all DB implementations will implement this function searchByName(name: str, limit: int) → Iterable[Dict[str, Any]] Returns an iterable of all artifacts in the database that match some name. Note: Not all DB implementations will implement this function searchByNameType(name: str, typ: str, limit: int) → Iterable[Dict[str, Any]] Returns an iterable of all artifacts in the database that match some name and type. Note: Not all DB implementations will implement this function searchByType(typ: str, limit: int) → Iterable[Dict[str, Any]] Returns an iterable of all artifacts in the database that match some type. Note: Not all DB implementations will implement this function upload(key: uuid.UUID, path: pathlib.Path) → None Upload the file at path to the database with _id of key class gem5art.artifact._artifactdb.ArtifactMongoDB(uri: str) This is a mongodb database connector for storing Artifacts (as defined in artifact.py). This database stores the data in three collections: - artifacts: This stores the json serialized Artifact class - files and chunks: These two collections store the large files required

12 Chapter 2. Artifacts gem5art, Release 0.2.1

for some artifacts. Within the files collection, the _id is the UUID of the artifact. downloadFile(key: uuid.UUID, path: pathlib.Path) → None Download the file with the _id key to the path. Will overwrite the file if it currently exists. get(key: Union[uuid.UUID, str]) → Dict[str, str] Key can be a UUID or a string. Returns a dictionary to construct an artifact. put(key: uuid.UUID, artifact: Dict[str, Union[str, uuid.UUID]]) → None Insert the artifact into the database with the key searchByLikeNameType(name: str, typ: str, limit: int) → Iterable[Dict[str, Any]] Returns an iterable of all artifacts in the database that match some type and a regex name. searchByName(name: str, limit: int) → Iterable[Dict[str, Any]] Returns an iterable of all artifacts in the database that match some name. searchByNameType(name: str, typ: str, limit: int) → Iterable[Dict[str, Any]] Returns an iterable of all artifacts in the database that match some name and type. searchByType(typ: str, limit: int) → Iterable[Dict[str, Any]] Returns an iterable of all artifacts in the database that match some type. upload(key: uuid.UUID, path: pathlib.Path) → None Upload the file at path to the database with _id of key gem5art.artifact._artifactdb.getDBConnection(uri: str = ”) → gem5art.artifact._artifactdb.ArtifactDB Returns the database connection uri: a string representing the URI of the database. See _getDBType for details. If no URI is given we use the default (mongodb://localhost:27017) or the value in the GEM5ART_DB environment variable. If the connection has not been established, this will create a new connection. If the connection has been estab- lished, this will replace the connection if the uri input is non-empy.

Authors: • Ayaz Akram • Jason Lowe-Power

2.3. Artifacts API Documentation 13 gem5art, Release 0.2.1

14 Chapter 2. Artifacts CHAPTER 3

Run

3.1 Introduction

Each gem5 experiment is wrapped inside a run object. These run objects contain all of the information required to execute the gem5 experiments and can optionally be executed via the gem5art tasks library (or manually with the run() function.).gem5Run interacts with the Artifact class of gem5art to ensure reproducibility of gem5 experiments and also stores the current gem5Run object and the output results in the database for later analysis.

3.2 SE and FS mode runs

Next are two methods (for SE (system-emulation) and FS (full-system) modes of gem5) from gem5Run class which give an idea of the required arguments from a user’s perspective to create a gem5Run object:

@classmethod def createSERun(cls, name: str, gem5_binary: str, run_script: str, outdir: str, gem5_artifact: Artifact, gem5_git_artifact: Artifact, run_script_git_artifact: Artifact, *params: str, timeout: int= 60 *15)->'gem5Run': ......

@classmethod def createFSRun(cls, name: str, gem5_binary: str, (continues on next page)

15 gem5art, Release 0.2.1

(continued from previous page) run_script: str, outdir: str, gem5_artifact: Artifact, gem5_git_artifact: Artifact, run_script_git_artifact: Artifact, linux_binary: str, disk_image: str, linux_binary_artifact: Artifact, disk_image_artifact: Artifact, *params: str, timeout: int= 60 *15)->'gem5Run': ......

For the user it is important to understand different arguments passed to run objects: • name: name of the run, can act as a tag to search the database to find the required runs (it is expected that user will use a unique name for different experiments) • gem5_binary: path to the actual gem5 binary to be used • run_script: path to the python run script that will be used with gem5 binary • outdir: path to the directory where gem5 results should be written • gem5_artifact: gem5 binary git artifact object • gem5_git_artifact: gem5 source git repo artifact object • run_script_git_artifact: run script artifact object • linux_binary (only full-system): path to the actual linux binary to be used (used by run script as well) • disk_image (only full-system): path to the actual disk image to be used (used by run script as well) • linux_binary_artifact (only full-system): linux binary artifact object • disk_image_artifact (only full-system): disk image artifact object • params: other params to be passed to the run script • timeout: longest time in seconds for which the current gem5 job is allowed to execute The artifact parameters (gem5_artifact, gem5_git_artifact, and run_script_git_artifact) are used to ensure this is reproducible run. Apart from the above mentioned parameters, gem5Run class also keeps track of other features of a gem5 run e.g., the start time, the end time, the current status of gem5 run, the kill reason (if the run is finished), etc. While the user can write their own run script to use with gem5 (with any command line arguments), currently when a gem5Run object is created for a full-system experiment using createFSRun method, it is assumed that the path to the linux_binary and disk_image is passed to the run script on the command line (as arguments of the createFSRun method).

3.3 Running an experiment

The gem5Run object has everything needed to run one gem5 execution. Normally, this will be performed by using the gem5art tasks package. However, it is also possible to manually execute a gem5 run. The run function executes the gem5 experiment. It takes two optional parameters: a task associated with the run for bookkeeping and an optional directory to execute the run in.

16 Chapter 3. Run gem5art, Release 0.2.1

The run function executes the gem5 binary by using Popen. This creates another process to execute gem5. The run function is blocking and does not return until the child process has completed. While the child process is running, every 5 seconds the parent python process will update the status in the info.json file. The info.json file is the serialized gem5run object which contains all of the run information and the current status. gem5Run objects have 7 possible status states. These are currently simple strings stored in the status property. • Created: The run has been created. This is set in the constructor when either createSRRun or createFSRun is called. • Begin run: When run() is called, after the database is checked, we enter the Begin run state. • Failed artifact check for ...: The status is set to this when the artifact check fails • Spawning: Next, just before Popen is called, the run enters the Spawning state • Running: Once the parent process begins spinning waiting for the child to finish, the run enters the Running state. • Finished: When the child finished with exit code 0, the run enters the Finished state. • Failed: When the child finished with a non-zero exit code, the run enters the Failed state.

3.4 Run Already in the Database

When starting a run with gem5art, it might complain that the run already exists in the database. Basically, before launching a gem5 job, gem5art checks if this run matches an existing run in the database. In order to uniquely identify a run, a single hash is made out of: • the runscript • the parameters passed to the runscript • the artifacts of the run object which, for an SE run, include: gem5 binary artifact, gem5 source git artifact, run script (experiments repo) artifact. For an FS run, the list of artifacts also include linux binary artifact and disk image artifacts in addition to the artifacts of an SE run. If this hash already exists in the database, gem5art will not launch a new job based on this run object as a run with same parameters would have already been executed. In case, user still wants to launch this job, the user will have to remove the existing run object from the database.

3.5 Searching the Database to find Runs

3.5.1 Utility script

gem5art provides the utility gem5art-getruns to search the database and retrieve runs. Based on the parameters, gem5art-getruns will dump the results into a file in the json format.

usage: gem5art-getruns [-h] [--fs-only] [--limit LIMIT] [--db-uri DB_URI] [-s SEARCH_NAME] filename

Dump all runs from the database into a json file (continues on next page)

3.4. Run Already in the Database 17 gem5art, Release 0.2.1

(continued from previous page)

positional arguments: filename Output file name

optional arguments: -h,--help show this help message and exit --fs-only Only output FS runs --limit LIMIT Limit of the number of runs to return. Default: all --db-uri DB_URI The database to connect to. Default mongodb://localhost:27017 -s SEARCH_NAME,--search_name SEARCH_NAME Query for the name field

3.5.2 Manually searching the database

Once you start running the experiments with gem5 and want to know the status of those runs, you can look at the gem5Run artifacts in the database. For this purpose, gem5art provides a method getRuns, which you can use as follows:

import gem5art.run from gem5art.artifact import getDBConnection db= getDBConnection() for i in gem5art.run.getRuns(db, fs_only=False, limit=100): print(i)

The documentation on getRuns is available at the bottom of this page.

3.6 Searching the Database to find Runs with Specific Names

As discussed above, while creating a FS or SE mode Run object, the user has to pass a name field to recognize a particular set of runs (or experiments). We expect that the user will take care to use a name string which fully characterizes a set of experiments and can be thought of as a Nonce. For example, if we are running experiments to test linux kernel boot on gem5, we can use a name field boot_tests_v1 or boot_tests_[month_year] (where mont_year correspond to the month and year when the experiments were run). Later on, the same name can be used to search for relevant gem5 runs in the database. For this purpose, gem5art provides a method getRunsByName, which can be used as follow:

import gem5art.run from gem5art.artifact import getDBConnection db= getDBConnection() for i in gem5art.run.getRunsByName(db, name='boot_tests_v1', fs_only=True, limit=100): print(i)

The documentation on getRunsByName is available here.

3.7 Runs API Documentation

3.7.1 Run

This file defines a gem5Run object which contains all information needed to run a single gem5 test.

18 Chapter 3. Run gem5art, Release 0.2.1

This class works closely with the artifact module to ensure that the gem5 experiment is reproducible and the output is saved to the database. class gem5art.run.gem5Run This class holds all of the info required to run gem5. checkArtifacts(cwd: str) → bool Checks to make sure all of the artifacts are up to date This should happen just before running gem5. This function will return False if the artifacts don’t check and true if they are all the same. For the git repos, this checks the git hash, for binary artifacts this checks the md5 hash. checkKernelPanic() → bool Returns true if the gem5 instance specified in args has a kernel panic Note: this gets around the problem that gem5 doesn’t exit on panics. classmethod createFSRun(name: str, gem5_binary: str, run_script: str, outdir: str, gem5_artifact: gem5art.artifact.artifact.Artifact, gem5_git_artifact: gem5art.artifact.artifact.Artifact, run_script_git_artifact: gem5art.artifact.artifact.Artifact, linux_binary: str, disk_image: str, linux_binary_artifact: gem5art.artifact.artifact.Artifact, disk_image_artifact: gem5art.artifact.artifact.Artifact, *params, timeout: int = 900) → gem5art.run.gem5Run name is the name of the run. The name is not necessarily unique. The name could be used to query the results of the run. gem5_binary and run_script are the paths to the binary to run and the script to pass to gem5. The linux_binary is the kernel to run and the disk_image is the path to the disk image to use. Further parameters can be passed via extra arguments. These parameters will be passed in order to the gem5 run script. Note: When instantiating this class for the first time, it will create a file info.json in the outdir which contains a serialized version of this class. classmethod createSERun(name: str, gem5_binary: str, run_script: str, outdir: str, gem5_artifact: gem5art.artifact.artifact.Artifact, gem5_git_artifact: gem5art.artifact.artifact.Artifact, run_script_git_artifact: gem5art.artifact.artifact.Artifact, *params, timeout: int = 900) → gem5art.run.gem5Run name is the name of the run. The name is not necessarily unique. The name could be used to query the results of the run. gem5_binary and run_script are the paths to the binary to run and the script to pass to gem5. Full paths are better. The artifact parameters (gem5_artifact, gem5_git_artifact, and run_script_git_artifact) are used to ensure this is reproducible run. Further parameters can be passed via extra arguments. These parameters will be passed in order to the gem5 run script. timeout is the time in seconds to run the subprocess before killing it. Note: When instantiating this class for the first time, it will create a file info.json in the outdir which contains a serialized version of this class. dumpJson(filename: str) → None Dump all info into a json file dumpsJson() → str Like dumpJson except returns string

3.7. Runs API Documentation 19 gem5art, Release 0.2.1

classmethod loadFromDict(d: Dict[str, Union[str, uuid.UUID]]) → gem5art.run.gem5Run Returns new gem5Run instance from the dictionary of values in d classmethod loadJson(filename: str) → gem5art.run.gem5Run run(task: Any = None, cwd: str = ’.’) → None Actually run the test. Calls Popen with the command to fork a new process. Then, this function polls the process every 5 seconds to check if it has finished or not. Each time it checks, it dumps the json info so other applications can poll those files. task is the celery task that is running this gem5 instance. cwd is the directory to change to before running. This allows a server process to run in a different directory than the running process. Note that only the spawned process runs in the new directory. saveResults() → None Zip up the output directory and store the results in the database. gem5art.run.getRuns(db: gem5art.artifact._artifactdb.ArtifactDB, fs_only: bool = False, limit: int = 0) → Iterable[gem5art.run.gem5Run] Returns a generator of gem5Run objects. If fs_only is True, then only full system runs will be returned. Limit specifies the maximum number of runs to return. gem5art.run.getRunsByName(db: gem5art.artifact._artifactdb.ArtifactDB, name: str, fs_only: bool = False, limit: int = 0) → Iterable[gem5art.run.gem5Run] Returns a generator of gem5Run objects, which have the field “name” exactly the same as the name parameter. The name used in this query is case sensitive. If fs_only is True, then only full system runs will be returned. Limit specifies the maximum number of runs to return. gem5art.run.getRunsByNameLike(db: gem5art.artifact._artifactdb.ArtifactDB, name: str, fs_only: bool = False, limit: int = 0) → Iterable[gem5art.run.gem5Run] Return a generator of gem5Run objects, which have the field “name” containing the name parameter as a sub- string. The name used in this query is case sensitive. If fs_only is True, then only full system runs will be returned. Limit specifies the maximum number of runs to return.

Authors: • Ayaz Akram

20 Chapter 3. Run CHAPTER 4

Tasks

This package contains two parallel task libraries for running gem5 experiments. he actual gem5 experiment can be executed with the help of Python multiprocessing support, Celery or even without using any job manager (a job can be directly launched by calling run() function of gem5Run object). This package implicitly depends on the gem5art run package. Please cite the the gem5art paper when using the gem5art packages. This documentation can be found on the gem5 website

4.1 Use of Python Multiprocessing

This is a simple way to run gem5 jobs using Python multiprocessing library. You can use the following function in your job launch script to execute gem5art run objects:

run_job_pool([a list containing all run objects you want to execute], num_parallel_

˓→jobs= [Number of parallel jobs you want to run])

4.2 Use of Celery

Celery server can run many gem5 tasks asynchronously. Once a user creates a gem5Run object (discussed pre- viously) while using gem5art, this object needs to be passed to a method run_gem5_instance() registered with Celery app, which is responsible for starting a Celery task to run gem5. The other argument needed by the run_gem5_instance() is the current working directory. Celery server can be started with the following command:

celery -E -A gem5art.tasks.celery worker --autoscale=[number of workers],0

This will start a server with events enabled that will accept gem5 tasks as defined in gem5art. It will autoscale from 0 to desired number of workers.

21 gem5art, Release 0.2.1

Celery relies on a message broker RabbitMQ for communication between the client and workers. If not already installed, you need to install RabbitMQ on your system (before running celery) using: apt-get install rabbitmq-server

4.2.1 Monitoring Celery

Celery does not explicitly show the status of the runs by default. flower, a Python package, is a web-based tool for monitoring and administrating Celery. To install the flower package, pip install flower

You can monitor the celery cluster doing the following: flower -A gem5art.tasks.celery --port=5555

This will start a webserver on port 5555.

4.2.2 Removing all tasks celery -A gem5art.tasks.celery purge

4.2.3 Viewing state of all jobs in celery celery -A gem5art.tasks.celery events

4.3 Tasks API Documentation

4.3.1 Task gem5art.tasks.tasks.run_job_pool(job_list, num_parallel_jobs=1) Runs gem5 jobs in parallel when Celery is not used. Creates as many parallel jobs as core count if no explicit job count is provided Receives a list of run objects created by the launch script gem5art.tasks.tasks.run_single_job(run)

Authors: • Hoa Nguyen • Ayaz Akram

22 Chapter 4. Tasks CHAPTER 5

Disk Images

5.1 Introduction

This section discusses an automated way of creating gem5-compatible disk images with Ubuntu server installed. We make use of Packer to do this which uses .json template files to build and configure a disk image. These template files can be configured to build a disk image with specific benchmarks installed.

5.2 Building a Simple Disk Image with Packer

5.2.1 a. How It Works, Briefly

We use Packer and QEMU to automate the process of disk creation. Essentially, QEMU is responsible for setting up a virtual machine and all interactions with the disk image during the building process. The interactions include installing Ubuntu Server to the disk image, copying files from your machine to the disk image, and running scripts on the disk image after Ubuntu is installed. However, we will not use QEMU directly. Packer provides a simpler way to interact with QEMU using a JSON script, which is more expressive than using QEMU from command line.

5.2.2 b. Install Required Software/Dependencies

If not already installed, QEMU can be installed using: sudo apt-get install qemu

The packer binary can be downloaded from the official website using the following commands: wget https://releases.hashicorp.com/packer/1.4.3/packer_1.4.3_linux_amd64.zip unzip packer_1.4.3_linux_amd64.zip

23 gem5art, Release 0.2.1

5.2.3 c. Customize the Packer Script

The default packer script template.json should be modified and adapted according to the required disk image and the available resources for the build process. We will rename the default template to [disk-name].json. The variables that should be modified appear at the end of [disk-name].json file, in variables section. The configuration files that we use to build the disk image, and the directory structure is shown below:

disk-image/ experiment-specific-folder/ [disk-name].json: packer script Any experiment-specific post installation script

shared/ post-installation.sh: generic shell script that is executed after Ubuntu is

˓→installed preseed.cfg: pre-seeded configuration to install Ubuntu

i. Customizing the VM (Virtual Machine)

In [disk-name].json, following variables are available to customize the VM (to be used for the disk building process):

ii. Customizing the Disk Image

In [disk-name].json, disk image size can be customized using following variable:

iii. File Transfer

While building a disk image, users would need to move their files (benchmarks, data sets etc.) to the disk image. In order to do this file transfer, in [disk-name].json under provisioners, you could add the following:

{ "type": "file", "source": "shared/post_installation.sh", "destination": "/home/gem5/", "direction": "upload" }

The above example copies the file shared/post_installation.sh from the host to /home/gem5/ in the disk image. This method is also capable of copying a folder from host to the disk image and vice versa. It is important to note that the trailing slash affects the copying process (more details). The following are some notable examples of the effect of using slash at the end of the paths. If direction is download, the files will be copied from the image to the host. Note: This is a way to run script once after installing Ubuntu without copying to the disk image. iv. Install Benchmark Dependencies

To install the dependencies, we utilize the bash script shared/post_installation.sh, which will be run after the Ubuntu installation and file copying is done. For example, if we want to install gfortran, add the following in scripts/post_installation.sh:

24 Chapter 5. Disk Images gem5art, Release 0.2.1

echo '12345' | sudo apt-get install gfortran;

In the above example, we assume that the user password is 12345. This is essentially a bash script that is executed on the VM after the file copying is done, you could modify the script as a bash script to fit any purpose. v. Running Other Scripts on Disk Image

In [disk-name].json, we could add more scripts to provisioners. Note that the files are on the host, but the effects are on the disk image. For example, the following example runs shared/post_installation.sh after Ubuntu is installed,

{ "type": "shell", "execute_command": "echo '{{ user `ssh_password` }}' | {{.Vars}} sudo -E -S bash '

˓→{{.Path}}'", "scripts": [ "scripts/post-installation.sh" ] }

5.2.4 d. Build the Disk Image i. Build

In order to build a disk image, the template file is first validated using:

./packer validate[disk-name].json

Then, the template file can be used to build the disk image:

./packer build[disk-name].json

On a fairly recent machine, the building process should not take more than 15 minutes to complete. The disk image with the user-defined name (image_name) will be produced in a folder called [image_name]-image. We recommend to use a VNC viewer in order to inspect the building process. ii. Inspect the Building Process

While the building of disk image takes place, packer will run a VNC (Virtual Network Computing) server and you will be able to see the building process by connecting to the VNC server from a VNC client. There are a plenty of choices for VNC client. When you run the packer script, it will tell you which port is used by the VNC server. For example, if it says qemu: Connecting to VM via VNC (127.0.0.1:5932), the VNC port is 5932. To connect to VNC server from the VNC client, use the address 127.0.0.1:5932 for a port number 5932. If you need port forwarding to forward the VNC port from a remote machine to your local machine, use SSH tunneling ssh -L 5932:127.0.0.1:5932 @

This command will forward port 5932 from the host machine to your machine, and then you will be able to connect to the VNC server using the address 127.0.0.1:5932 from your VNC viewer.

5.2. Building a Simple Disk Image with Packer 25 gem5art, Release 0.2.1

Note: While packer is installing Ubuntu, the terminal screen will display “waiting for SSH” without any update for a long time. This is not an indicator of whether the Ubuntu installation produces any errors. Therefore, we strongly recommend using VNC viewer at least once to inspect the image building process.

26 Chapter 5. Disk Images CHAPTER 6

Frequently Asked Questions

What is gem5art? gem5art (libraries for artifacts, reproducibility and testing) is a set of python modules to do experiments with gem5 in a reproducible and structured way. Do I need celery to run gem5 jobs using gem5art? Celery is not required to run gem5 jobs with gem5art. You can use any other job scheduling tool or no tool at all. In order to run your job without celery, simply call the run() method of your run object once it is created. For example, assuming created run object (in a launch script) is called run, you can do the following: run.run()

Is there a more user-friendly way to launch gem5 jobs? You can use python multiprocessing library based function calls (provided by gem5art) to launch multiple gem5 jobs in parallel. Specifically, you can call the following function in your gem5art launch script: run_job_pool([a list containing all run objects you want to execute], num_parallel_

˓→jobs= [Number of parralel jobs you want to run])

How to access/search the files/artifacts in the database? You can use the pymongo API functions to access the files in the database. gem5art also provides methods that make it easy to access the entries in the database. You can look at the different available methods here. What if I want to re-run an experiment, using the same artifacts? As explained in the documentation, when a new run object is created in the launch script, a hash is created out of the artifacts that this run is dependent on. This hash is used to check if a the same run exists in the database. One of the artifacts used to create the hash is runscript artifact (which basically is same as experiments repository artifact, as gem5 configuration scripts are part of the base experiments repository). The easiest way to re-run an experiment is to update the name field of your launch script and commit the changes in the launch script to the base experiments repository. Make sure to use the new name field to query the results or runs in the database. How can I monitor the status of jobs launched using gem5art launch script?

27 gem5art, Release 0.2.1

Celery does not explicitly show the status of the runs by default. flower, a Python package, is a web-based tool for monitoring and administrating Celery. To install the flower package, pip install flower

If you are using celery to run your tasks, you can use celery monitoring tool called flower. For this purpose, use the following command: flower -A gem5art.tasks.celery --port=5555

You can access this server on your web browser using http://localhost:5555. Celery also generates some log files in the directory where you are running celery from. You can also look at those log files to know the status of your jobs. How to contribute to gem5art? gem5art is open-source. If you want to add a new feature or fix a bug, you can create a PR on the gem5art github repo.

Authors: • Ayaz Akram

28 Chapter 6. Frequently Asked Questions CHAPTER 7

Tutorial: Run Full System Linux Boot Tests

7.1 Introduction

This tutorial explains how to use gem5art to run experiments with gem5. The specific experiment we will be doing is to test booting of various linux kernel versions and simulator configurations. The main steps to perform such an experiment using gem5art include: setting up the environment, building gem5, creating a disk image, compiling linux kernels, preparing gem5 run script, creating a job launch script (which will also register all of the required artifacts) and finally running this script. We assume the following directory structure to follow in this tutorial: boot-tests/ |___ gem5/ # gem5 source code | |___ disk-image/ ||___ shared/ # Auxiliary files needed for disk

˓→creation ||___ boot-exit/ ||___ boot-exit-image/ # Will be created once the disk is

˓→generated |||___ boot-exit # The generated disk image ||___ boot-exit.json # The Packer script ||___ exit.sh # Exits the simulated guest upon

˓→booting ||___ post-installation.sh # Moves exit.sh to guest's .bashrc | |___ configs ||___ system # gem5 system config files ||___ run_exit.py # gem5 run script | |___ linux-configs # Folder with Linux kernel

˓→configuration files | |___ linux # Linux source will be downloaded in

˓→this folder (continues on next page)

29 gem5art, Release 0.2.1

(continued from previous page) | |___ launch_boot_tests.py # script to launch jobs and register

˓→artifacts

7.2 Setting up the environment

First, we need to create the primary directory boot-tests which will contain all the created artifacts to run these tests. This directory also needs to be converted into a git repository. Through the use of boot-tests git repo, we will try to keep track of changes in those files which are not an artifact themselves or not a part of any other artifact. An example of such files is gem5 run and config scripts (config-boot-tests). We want to make sure that we can keep record of any changes in these scripts, so that a particular run of gem5 can be associated with a particular snapshot of these files. All such files, which are not part of other artifacts, will be a part of the experiments repo artifact (we will show how to register that artifact later in this tutorial). We also need to add a git remote to this repository pointing to a remote location where we want this repository to be hosted. Create the main directory named boot-tests and turn it into a git repo:

mkdir boot-tests cd boot-tests git init git remote add origin https://your-remote-add/boot-tests.git

We also need to add a .gitignore file in our git repo, to ignore tracking files we don’t care about:

*.pyc m5out .vscode results venv disk-image/packer disk-image/packer_1.4.3_linux_amd64.zip disk-image/boot-exit/boot-exit-image/boot-exit disk-image/packer_cache gem5 linux-stable/

gem5art relies on Python 3, so we suggest creating a virtual environment (inside boot-tests) before using gem5art.

virtualenv -p python3 venv source venv/bin/activate

gem5art can be installed (if not already) using pip:

pip install gem5art-artifact gem5art-run gem5art-tasks

7.3 Building gem5

Next, we have to clone gem5 and build it. If you want to use the exact gem5 source that was used at the time of creating this tutorial you will have to checkout the relevant commit. If you want to try with the current version of gem5 at the time of reading this tutorial, you can ignore the git checkout command. See the commands below:

30 Chapter 7. Tutorial: Run Full System Linux Boot Tests gem5art, Release 0.2.1

git clone https://gem5.googlesource.com/public/gem5 cd gem5 git checkout v20.1.0.0 scons build/X86/gem5.opt -j8

You can also add your changes to gem5 source before building it. Make sure to commit any changes you make to gem5 repo and documenting it while registering gem5 artifact in the launch script. We will look at the details of our launch script later on, but following is how we can register gem5 source and binary artifacts that we just created. gem5_repo= Artifact.registerArtifact( command='git clone https://gem5.googlesource.com/public/gem5', typ='git repo', name='gem5', path='gem5/', cwd='./', documentation='cloned gem5 from googlesource and checked out v20.1.0.0' ) gem5_binary= Artifact.registerArtifact( command= '''cd gem5; git checkout v20.1.0.0; scons build/X86/gem5.opt -j8 ''', typ='gem5 binary', name='gem5', cwd='gem5/', path='gem5/build/X86/gem5.opt', inputs= [gem5_repo,], documentation='gem5 binary based on v20.1.0.0' )

Note, that the use of git checkout command in the command field of the gem5_binary artifact (along with the documentation field) will be helpful later on to figure out exactly which gem5 source was used to create this gem5 binary. Also make sure to build the m5 utility at this point which will be moved to the disk image eventually. m5 utility allows to trigger simulation tasks from inside the simulated system. For example, it can be used to dump simulation statistics when the simulated system triggers to do so. We will mainly need m5 to exit the simulation when the simulated system boots linux. cd gem5/util/m5/ scons build/x86/out/m5

7.4 Creating a disk image

First create a disk-image folder where we will keep all disk image related files: mkdir disk-image

We will follow the similar directory structure as discussed in Disk Images section. Add a folder named shared for config files which will be shared among all disk images (and will be kept to their defaults) and one folder named boot-exit which is specific to the disk image needed to run experiments of this tutorial. Add three files boot-exit.json, exit.sh and post-installation.sh in boot-exit/ and preseed.cfg and [email protected] in shared/. boot-exit.json is our primary configuration file. The provisioners and variables section of this file configure the files

7.4. Creating a disk image 31 gem5art, Release 0.2.1

that need to be transferred to the disk and other things like disk image’s name. post-installation.sh (which is a script to run after Ubuntu is installed on the disk image) makes sure that the m5 binary is installed on the system and also moves the contents of our other script (exit.sh, which should be already transferred inside the disk image as configured in boot-exit.json) to .bashrc as exit.sh contains the stuff that we want to be executed as soon as the system boots. exit.sh just contains one command m5 exit, which will eventually terminate the simulation as the system boots up. Next, download packer (if not already downloaded) in the disk-image folder:

cd disk-image/ wget https://releases.hashicorp.com/packer/1.4.3/packer_1.4.3_linux_amd64.zip unzip packer_1.4.3_linux_amd64.zip

Now, to build the disk image, inside the disk-image folder, run:

./packer validate boot-exit/boot-exit.json

./packer build boot-exit/boot-exit.json

Once this process succeeds, the disk image can be found on boot-exit/boot-exit-image/boot-exit.A disk image already created following the above instructions can be found, gzipped, here.

7.5 Compiling the linux kernel

In this tutorial, we want to experiment with different linux kernels to examine the state of gem5’s ability to boot different linux kernels. These tests use following five LTS (long term support) releases of the Linux kernel: • 4.4.186 • 4.9.186 • 4.14.134 • 4.19.83 • 5.4.49 Let’s use an example of kernel v5.4.49 to see how to compile the kernel. First, add a folder linux-configs to store linux kernel config files. The configuration files of interest are available here. Then, we will get the linux source and checkout the required linux version (e.g. v5.4.49 in this case).

git clone https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git mv linux linux-stable cd linux-stable git checkout v{version-no: e.g. 5.4.49}

Compile the Linux kernel from its source (using an appropriate config file from linux-configs/):

cp../linux-configs/config.{version-no: e.g. 5.4.49}.config make-j8 cp vmlinux vmlinux-{version-no: e.g. 5.4.49}

Repeat the above process for other kernel versions that we want to use in this experiment. Note: The above instructions are tested with gcc 7.5.0 and the already compiled Linux binaries can be downloaded from the following links: • vmlinux-4.4.186 • vmlinux-4.9.186

32 Chapter 7. Tutorial: Run Full System Linux Boot Tests gem5art, Release 0.2.1

• vmlinux-4.14.134 • vmlinux-4.19.83 • vmlinux-5.4.49

7.6 gem5 run scripts

Next, we need to add gem5 run scripts. We will do that in a folder named configs-boot-tests. Get the run script named run_exit.py from here, and other system configuration files from here. The run script (run_exit.py) takes the following arguments: • kernel: compiled kernel to be used for simulation • disk: built disk image to be used for simulation • cpu_type: gem5 cpu model (KVM, atomic, timing or O3) • mem_sys: gem5 memory system (classic, MI_example, MESI_Two_Level, MOESI_CMP_directory) • num_cpus: number of parallel cpus to be simulated • boot_type: linux kernel boot type (with init or systemd) An example use of this script is the following: gem5/build/X86/gem5.opt configs/run_exit.py[path to the Linux kernel][path to the

˓→disk image] kvm classic4 init

7.7 Database and Celery Server

If not already running/created, you can create a database using:

`docker run -p 27017:27017 -v :/data/db --

˓→name mongo- -d mongo` in a newly created directory. If not already installed, install RabbitMQ on your system (before running celery) using: apt-get install rabbitmq-server

Now, run celery server using: celery -E -A gem5art.tasks.celery worker --autoscale=[number of workers],0

Note: Celery is not required to run gem5 jobs with gem5art. You can also use python multiprocessing library based function calls (provided by gem5art) to launch these jobs in parallel (we will show how to do that later in our launch script).

7.8 Creating a launch script

Finally, we will create a launch script with the name launch_boot_tests.py, which will be responsible for registering the artifacts to be used for these tests and then launching gem5 jobs.

7.6. gem5 run scripts 33 gem5art, Release 0.2.1

The first thing to do in the launch script is to import required modules and classes: import os import sys from uuid import UUID from itertools import starmap from itertools import product from gem5art.artifact import Artifact from gem5art.run import gem5Run from gem5art.tasks.tasks import run_gem5_instance import multiprocessing as mp

Next, we will register artifacts. For example, to register packer artifact we will add the following lines: packer= Artifact.registerArtifact( command= '''wget https://releases.hashicorp.com/packer/1.4.3/packer_1.4.3_linux_

˓→amd64.zip; unzip packer_1.4.3_linux_amd64.zip; ''', typ='binary', name='packer', path='disk-image/packer', cwd='disk-image', documentation='Program to build disk images. Downloaded sometime in August/19

˓→from hashicorp.' )

For our boot-tests repo, experiments_repo= Artifact.registerArtifact( command='git clone https://your-remote-add/boot_tests.git', typ='git repo', name='boot_tests', path='./', cwd='../', documentation='main experiments repo to run full system boot tests with gem5 20.

˓→1' )

Note that the name of the artifact (returned by the registerArtifact method) is totally up to the user as well as most of the other attributes of these artifacts. For all other artifacts, add following lines in launch_boot_tests.py: gem5_repo= Artifact.registerArtifact( command='git clone https://gem5.googlesource.com/public/gem5', typ='git repo', name='gem5', path='gem5/', cwd='./', documentation='cloned gem5 from googlesource and checked out v20.1.0.0' ) m5_binary= Artifact.registerArtifact( command='scons build/x86/out/m5', typ='binary', (continues on next page)

34 Chapter 7. Tutorial: Run Full System Linux Boot Tests gem5art, Release 0.2.1

(continued from previous page) name='m5', path='gem5/util/m5/build/x86/out/m5', cwd='gem5/util/m5', inputs= [gem5_repo,], documentation='m5 utility' ) disk_image= Artifact.registerArtifact( command='./packer build boot-exit/boot-exit.json', typ='disk image', name='boot-disk', cwd='disk-image', path='disk-image/boot-exit/boot-exit-image/boot-exit', inputs= [packer, experiments_repo, m5_binary,], documentation='Ubuntu with m5 binary installed and root auto login' ) gem5_binary= Artifact.registerArtifact( command= '''cd gem5; git checkout v20.1.0.0; scons build/X86/gem5.opt -j8 ''', typ='gem5 binary', name='gem5', cwd='gem5/', path='gem5/build/X86/gem5.opt', inputs= [gem5_repo,], documentation='gem5 binary based on v20.1.0.0' ) gem5_binary_MESI_Two_Level= Artifact.registerArtifact( command= '''cd gem5; git checkout v20.1.0.0; scons build/X86_MESI_Two_Level/gem5.opt --default=X86 PROTOCOL=MESI_Two_Level

˓→SLICC_HTML=True -j8 ''', typ='gem5 binary', name='gem5', cwd='gem5/', path='gem5/build/X86_MESI_Two_Level/gem5.opt', inputs= [gem5_repo,], documentation='gem5 binary based on v20.1.0.0' ) gem5_binary_MOESI_CMP_directory= Artifact.registerArtifact( command= '''cd gem5; git checkout v20.1.0.0; scons build/MOESI_CMP_directory/gem5.opt --default=X86 PROTOCOL=MOESI_CMP_

˓→directory -j8 ''', typ='gem5 binary', name='gem5', cwd='gem5/', path='gem5/build/X86_MOESI_CMP_directory/gem5.opt', inputs= [gem5_repo,], documentation='gem5 binary based on v20.1.0.0' ) (continues on next page)

7.8. Creating a launch script 35 gem5art, Release 0.2.1

(continued from previous page) linux_repo= Artifact.registerArtifact( command= '''git clone https://git.kernel.org/pub/scm/linux/kernel/git/stable/

˓→linux.git; mv linux linux-stable''', typ='git repo', name='linux-stable', path='linux-stable/', cwd='./', documentation='linux kernel source code repo from June 24-2020' ) linuxes=['5.4.49','4.19.83','4.14.134','4.9.186','4.4.186'] linux_binaries={ version: Artifact.registerArtifact( name=f'vmlinux- {version}', typ='kernel', path=f'linux-stable/vmlinux- {version}', cwd='linux-stable/', command=f'''cd linux-stable; git checkout v{version}; cp ../linux-configs/config.{version} .config; make -j8; cp vmlinux vmlinux-{version}; ''', inputs= [experiments_repo, linux_repo,], documentation=f"Kernel binary for {version} with simple" "config file", ) for version in linuxes }

Once, all the artifacts are registered the next step is to launch all gem5 jobs. To do that, first we will create a method createRun to create gem5art runs based on a few arguments: def createRun(linux, boot_type, cpu, num_cpu, mem):

if mem =='MESI_Two_Level': binary_gem5='gem5/build/X86_MESI_Two_Level/gem5.opt' artifact_gem5= gem5_binary_MESI_Two_Level elif mem =='MOESI_CMP_directory': binary_gem5='gem5/build/MOESI_CMP_directory/gem5.opt' artifact_gem5= gem5_binary_MOESI_CMP_directory else: binary_gem5='gem5/build/X86/gem5.opt' artifact_gem5= gem5_binary

return gem5Run.createFSRun( 'boot experiments with gem5-20.1', binary_gem5, 'configs-boot-tests/run_exit.py', 'results/run_exit/vmlinux-{}/boot-exit/{}/{}/{}/{}'. format(linux, cpu, mem, num_cpu, boot_type), artifact_gem5, gem5_repo, experiments_repo, os.path.join('linux-stable','vmlinux'+'-'+linux), 'disk-image/boot-exit/boot-exit-image/boot-exit', (continues on next page)

36 Chapter 7. Tutorial: Run Full System Linux Boot Tests gem5art, Release 0.2.1

(continued from previous page) linux_binaries[linux], disk_image, cpu, mem, num_cpu, boot_type, timeout= 10 *60*60 #10 hours )

Next, initialize all the parameters to pass to createRun method, depending on the configuration space we want to test: if __name__ =="__main__": boot_types=['init'] num_cpus=['1','2','4','8'] cpu_types=['kvm','atomic','simple','o3'] mem_types=['MI_example','MESI_Two_Level','MOESI_CMP_directory']

Then, to run actual jobs depending on if you want to use celery or python multiprocessing library, add the following in your launch script:

7.9 If Using Celery

# For the cross product of tests, create a run object. runs= starmap(createRun, product(linuxes, boot_types, cpu_types, num_cpus, mem_

˓→types)) # Run all of these experiments in parallel for run in runs: run_gem5_instance.apply_async((run, os.getcwd(),))

7.10 If Using Python Multiprocessing Library:

def worker(run): run.run() json= run.dumpsJson() print(json)

jobs=[] # For the cross product of tests, create a run object. runs= starmap(createRun, product(linuxes, boot_types, cpu_types, num_cpus, mem_

˓→types)) # Run all of these experiments in parallel for run in runs: jobs.append(run)

with mp.Pool(mp.cpu_count()//2) as pool: pool.map(worker, jobs)

The above lines are responsible for looping through all possible combinations of variables involved in this experiment. For each combination, a gem5Run object is created and eventually passed to run_gem5_instance to be executed asyn- chronously if using Celery. In case of python multiprocessing library, these run objects are pushed to a list and then mapped to a job pool. Look at the definition of createFSRun() here to understand the use of passed arguments. Here, we are using a timeout of 10 hours, after which the particular gem5 job will be killed (assuming that gem5 should complete the booting process of linux kernel on the given hardware resources). You can configure this time according to your settings.

7.9. If Using Celery 37 gem5art, Release 0.2.1

The complete launch script is available here:. Finally, make sure you are in python virtual env and then run the script: python launch_boot_tests.py

7.11 Results

Once you start running these experiments, you can access the database to check their status or to find results. There are different ways to do this. For example, you can use the getRuns method of gem5art as discussed in the Runs section previously. You can also directly access the database and access the run artifacts as follows:

#!/usr/bin/env python3 from pymongo import MongoClient db= MongoClient().artifact_database linuxes=['5.4.49','4.19.83','4.14.134','4.9.186','4.4.186'] boot_types=['init'] num_cpus=['1','2','4','8'] cpu_types=['kvm','atomic','simple','o3'] mem_types=['MI_example','MESI_Two_Level','MOESI_CMP_directory'] for linux in linuxes: for boot_type in boot_types: for cpu in cpu_types: for num_cpu in num_cpus: for mem in mem_types: for i in db.artifacts.find({'outdir':'/home/username/boot_tests/

˓→results/run_exit/vmlinux-{}/boot-exit/{}/{}/{}/{}'.format(linux, cpu, mem, num_cpu,

˓→boot_type)}):print(i)

Note: Update the “outdir” path in the above lines of code to where your results are stored in your system. Following plots show the status of linux booting based on the results of the experiments of this tutorial:

You can look here for the latest status of these tests on gem5.

Authors: • Ayaz Akram

38 Chapter 7. Tutorial: Run Full System Linux Boot Tests CHAPTER 8

Tutorial: Run NAS Parallel Benchmarks with gem5

8.1 Introduction

In this tutorial, we will use gem5art to create a disk image for NAS parallel benchmarks (NPB) and then run these benchmarks using gem5. NPB belongs to the category of high performance computing (HPC) workloads and consist of 5 kernels and 3 pseudo applications. Following are their details: Kernels: • IS: Integer Sort, random memory access • EP: Embarrassingly Parallel • CG: Conjugate Gradient, irregular memory access and communication • MG: Multi-Grid on a sequence of meshes, long- and short-distance communication, memory intensive • FT: discrete 3D fast Fourier Transform, all-to-all communication Pseudo Applications: • BT: Block Tri-diagonal solver • SP: Scalar Penta-diagonal solver • LU: Lower-Upper Gauss-Seidel solver There are different classes (A,B,C,D,E and F) of the workloads based on the data size that is used with the benchmarks. Detailed discussion of the data sizes is available here. In this tutorial, we will use only class A of these workloads. We assume the following directory structure to follow in this tutorial:

npb/ |___ gem5/ # gem5 source code | |___ disk-image/ ||___ shared/ # Auxiliary files needed for disk creation ||___ npb/ (continues on next page)

39 gem5art, Release 0.2.1

(continued from previous page) ||___ npb-image/ # Will be created once the disk is

˓→generated |||___ npb # The generated disk image ||___ npb.json # The Packer script to build the disk image ||___ runscript.sh # Executes a user provided script in

˓→simulated guest ||___ post-installation.sh # Moves runscript.sh to guest's .bashrc ||___ npb-install.sh # Compiles NPB inside the generated disk

˓→image ||___ npb-hooks # The NPB source (modified to use with

˓→gem5). | |___ config.4.19.83 # linux kernel configuration file | |___ configs ||___ system # gem5 system config files ||___ run_npb.py # gem5 run script to run NPB tests | |___ linux # Linux source and binary will live here | |___ launch_npb_tests.py # script to launch jobs and register

˓→artifacts

8.2 Setting up the environment

First, we need to create the main directory named npb-tests (from where we will run everything) and turn it into a git repository. Through the use of npb-tests git repo, we will try to keep track of changes in those files which are not included in any git repo otherwise. An example of such files is gem5 run and config scripts. We want to make sure that we can keep record of any changes in these scripts, so that a particular run of NPB benchmarks can be associated with a particular snapshot of these files. We also need to add a git remote to this repo pointing to a remote location where we want this repo to be hosted. mkdir npb-tests cd npb-tests git init git remote add origin https://your-remote-add/npb-tests.git

We also need to add a .gitignore file in our git repo, to avoid tracking those files which are not important or will be tracked through other git repos:

*.pyc m5out .vscode results venv disk-image/packer disk-image/packer_1.4.3_linux_amd64.zip disk-image/npb/npb-image/npb disk-image/npb/npb-hooks disk-image/packer_cache gem5 linux-stable/ gem5art relies on Python 3, so we suggest creating a virtual environment before using gem5art.

40 Chapter 8. Tutorial: Run NAS Parallel Benchmarks with gem5 gem5art, Release 0.2.1

virtualenv -p python3 venv source venv/bin/activate gem5art can be installed (if not already) using pip: pip install gem5art-artifact gem5art-run gem5art-tasks

8.3 Building gem5

Next clone gem5 from googlesource: git clone https://gem5.googlesource.com/public/gem5

If you want to use the exact gem5 source that was used at the time of creating this tutorial you will have to checkout the relevant commit. If you want to try with the current version of gem5 at the time of reading this tutorial, you can ignore the git checkout command. cd gem5 git checkout v20.1.0.0; scons build/X86/gem5.opt -j8

Also make sure to build the m5 utility which will be moved to the disk image eventually. m5 utility allows to trigger simulation tasks from inside the simulated system. For example, it can be used dump simulation statistics when the simulated system triggers to do so. We will need m5 mainly to exit the simulation when the simulated system will be done with the execution of a particular NPB benchmark. cd gem5/util/m5/ scons build/x86/out/m5

8.4 Creating a disk image

First create a disk-image folder where we will keep all disk image related files: mkdir disk-image

We will follow the similar directory structure as discussed in Disk Images section. Add a folder named shared for config files which will be shared among all disk images (and will be kept to their defaults) and one folder named npb which will contain files configured for NPB disk image. Add preseed.cfg and [email protected] in shared/. In npb/ we will add the benchmark source first, which will eventually be transferred to the disk image through our npb.json file. cd disk-image/npb git clone https://github.com/darchr/npb-hooks.git

This source of NPB has ROI (region of interest) annotations for each benchmark which will be used by gem5 to separate out simulation statistics of the important parts of a program from the rest of the program. Basically, gem5 magic instructions are used before and after the ROI which exit the guest and transfer control to gem5 run script which can then do things like dumping or resetting stats or switching to cpu of interest. Next, we will add few other files in npb/ which will be used for compilation of NPB inside the disk image and eventually running of these benchmarks with gem5. These files will be moved from host to the disk image using npb.json file as we will soon see.

8.3. Building gem5 41 gem5art, Release 0.2.1

First, create a file npb-install.sh, which will be executed inside the disk image (once it is built) and will install NPB on the disk image:

# install build-essential (gcc and g++ included) and gfortran

#Compile NPB

echo "12345" | sudo apt-get install build-essential gfortran

cd /home/gem5/NPB3.3-OMP/

mkdir bin

make suite HOOKS=1

HOOKS=1 flag in the above make command enables the ROI annotations while compiling NPB workloads. We are specifically compiling OpenMP (OMP) version of class A, B, C and D of NPB workloads. To configure the benchmark build process, the source of NPB which we are using relies on modified make.def and suite.def files (build system files). Look here, to understand the build process of NAS parallel benchmarks. suite.def file is used to determine which workloads (and of which class) do we want to compile when we run make suite command (as in the above script). You can look at the modified suite.def file here. The make.def file we are using add OMP flags to the compiler flags to compile OMP version of the benchmarks. We also add another flag -DM5OP_ADDR=0xFFFF0000 to the compiler flags, which makes sure that the gem5 magic instructions added to the benchmarks will also work in KVM mode. You can look at the complete file here. In npb/, create a file post-installation.sh and add following lines to it:

#!/bin/bash echo 'Post Installation Started'

mv /home/gem5/[email protected] /lib/systemd/system/

mv /home/gem5/m5 /sbin ln -s /sbin/m5 /sbin/gem5

# copy and run outside (host) script after booting cat /home/gem5/runscript.sh >> /root/.bashrc

echo 'Post Installation Done'

This post-installation.sh script (which is a script to run after Ubuntu is installed on the disk image) installs m5 and copies the contents of runscript.sh to .bashrc. Therefore, we need to add those things in runscript.sh which we want to execute as soon as the system boots up. Create runscript.sh in npb/ and add following lines to it:

#!/bin/sh

m5 readfile > script.sh if [ -s script.sh]; then # if the file is not empty, execute it chmod +x script.sh ./script.sh m5 exit fi # otherwise, drop to the terminal

runscript.sh uses m5 readfile to read the contents of a script which is how gem5 passes scripts to the simulated system from the host system. The passed script will then be executed and will be responsible for running benchmark/s which

42 Chapter 8. Tutorial: Run NAS Parallel Benchmarks with gem5 gem5art, Release 0.2.1 we will look into more later. Finally, create npb.json and add following contents:

{ "builders": [ { "type": "qemu", "format": "raw", "accelerator": "kvm", "boot_command": [ "{{ user `boot_command_prefix` }}", "debian-installer={{ user `locale` }} auto locale={{ user `locale` }}

˓→kbd-chooser/method=us ", "file=/floppy/{{ user `preseed` }} ", "fb=false debconf/frontend=noninteractive ", "hostname={{ user `hostname` }} ", "/install/vmlinuz noapic ", "initrd=/install/initrd.gz ", "keyboard-configuration/modelcode=SKIP keyboard-configuration/

˓→layout=USA ", "keyboard-configuration/variant=USA console-setup/ask_detect=false ", "passwd/user-fullname={{ user `ssh_fullname` }} ", "passwd/user-password={{ user `ssh_password` }} ", "passwd/user-password-again={{ user `ssh_password` }} ", "passwd/username={{ user `ssh_username` }} ", "-- " ], "cpus": "{{ user `vm_cpus`}}", "disk_size": "{{ user `image_size` }}", "floppy_files": [ "shared/{{ user `preseed` }}" ], "headless": "{{ user `headless` }}", "http_directory": "shared/", "iso_checksum": "{{ user `iso_checksum` }}", "iso_checksum_type": "{{ user `iso_checksum_type` }}", "iso_urls":[ "{{ user `iso_url` }}"], "memory": "{{ user `vm_memory`}}", "output_directory": "npb/{{ user `image_name` }}-image", "qemuargs": [ [ "-cpu", "host"], [ "-display", "none"] ], "qemu_binary":"/usr/bin/qemu-system-x86_64", "shutdown_command": "echo '{{ user `ssh_password` }}'|sudo -S shutdown -P

˓→now", "ssh_password": "{{ user `ssh_password` }}", "ssh_username": "{{ user `ssh_username` }}", "ssh_wait_timeout": "60m", "vm_name": "{{ user `image_name` }}" } ], "provisioners": (continues on next page)

8.4. Creating a disk image 43 gem5art, Release 0.2.1

(continued from previous page) [ { "type": "file", "source": "../gem5/util/m5/m5", "destination": "/home/gem5/" }, { "type": "file", "source": "shared/[email protected]", "destination": "/home/gem5/" }, { "type": "file", "source": "npb/runscript.sh", "destination": "/home/gem5/" }, { "type": "file", "source": "npb/npb-hooks/NPB3.3.1/NPB3.3-OMP", "destination": "/home/gem5/" }, { "type": "shell", "execute_command": "echo '{{ user `ssh_password` }}' | {{.Vars}} sudo -E -

˓→S bash '{{.Path}}'", "scripts": [ "npb/post-installation.sh", "npb/npb-install.sh" ] } ], "variables": { "boot_command_prefix": "

˓→

˓→

˓→

˓→", "desktop": "false", "image_size": "12000", "headless": "true", "iso_checksum": "34416ff83179728d54583bf3f18d42d2", "iso_checksum_type": "md5", "iso_name": "ubuntu-18.04.2-server-amd64.iso", "iso_url": "http://old-releases.ubuntu.com/releases/18.04.2/ubuntu-18.04.2-

˓→server-amd64.iso", "locale": "en_US", "preseed" : "preseed.cfg", "hostname": "gem5", "ssh_fullname": "gem5", "ssh_password": "12345", "ssh_username": "gem5", "vm_cpus": "16", "vm_memory": "8192", "image_name": "npb" } (continues on next page)

44 Chapter 8. Tutorial: Run NAS Parallel Benchmarks with gem5 gem5art, Release 0.2.1

(continued from previous page)

} npb.json is our primary .json configuration file. The provisioners and variables section of this file configure the files that need to be transferred to the disk and other things like disk image’s name. Next, download packer (if not already downloaded) in the disk-image folder: cd disk-image/ wget https://releases.hashicorp.com/packer/1.4.3/packer_1.4.3_linux_amd64.zip unzip packer_1.4.3_linux_amd64.zip

Now, to build the disk image inside the disk-image folder, run:

./packer validate npb/npb.json

./packer build npb/npb.json

Once this process succeeds, the created disk image can be found on npb/npb-image/npb. A disk image already created following the above instructions can be found, gzipped, here.

8.5 Compiling the linux kernel

In this tutorial, we use one of the LTS (long term support) releases of linux kernel v4.19.83 with gem5 to run NAS parallel benchmarks. First, get the linux kernel config file from here, and place it in npb-tests folder. Then, we will get the linux source of version 4.19.83: git clone--branch v4.19.83--depth1 https://git.kernel.org/pub/scm/linux/kernel/git/

˓→stable/linux.git mv linux linux-stable cd linux-stable

Compile the linux kernel from its source (using already downloaded config file config.4.19.83): cp../config.4.19.83.config make-j8 cp vmlinux vmlinux-4.19.83

Note: The above instructions are tested with gcc 7.5.0 and an already compiled Linux binary can be downloaded from the following link: • vmlinux-4.19.83

8.6 gem5 run scripts

Next, we need to add gem5 run scripts. We will do that in a folder named configs-npb-tests. Get the run script named run_npb.py from here, and other system configuration files from [here]((https://gem5.googlesource.com/public/gem5- resources/+/refs/heads/stable/src/npb/configs/system/). The main script run_npb.py expects following arguments: kernel: path to the Linux kernel. disk: path to the npb disk image.

8.5. Compiling the linux kernel 45 gem5art, Release 0.2.1

cpu: CPU model (kvm, atomic, timing). mem_sys: memory system (classic, MI_example, MESI_Two_Level, MOESI_CMP_directory). benchmark: NPB benchmark to execute (bt.A.x, cg.A.x, ep.A.x, ft.A.x, is.A.x, lu.A.x, mg.A.x, sp.A.x). Note: By default, the previously written instructions to build npb disk image will build class A,B,C and D of NPB in the disk image. We have only tested class A of the NPB. Replace A with any other class in the above listed benchmark names to test with other classes. num_cpus: number of CPU cores.

8.7 Database and Celery Server

If not already running/created, you can create a database using:

`docker run -p 27017:27017 -v :/data/db --

˓→name mongo- -d mongo`

in a newly created directory. If not already installed, install RabbitMQ on your system (before running celery) using:

apt-get install rabbitmq-server

Now, run celery server using:

celery -E -A gem5art.tasks.celery worker --autoscale=[number of workers],0

Note: Celery is not required to run gem5 jobs with gem5art. You can also use python multiprocessing library based function calls (provided by gem5art) to launch these jobs in parallel (we will show how to do that later in our launch script).

8.8 Creating a launch script

Finally, we will create a launch script with the name launch_npb_tests.py, which will be responsible for registering the artifacts to be used and then launching gem5 jobs. The first thing to do in the launch script is to import required modules and classes:

import os import sys from uuid import UUID from itertools import starmap from itertools import product

from gem5art.artifact import Artifact from gem5art.run import gem5Run from gem5art.tasks.tasks import run_gem5_instance import multiprocessing as mp

Next, we will register artifacts. For example, to register packer artifact we will add the following lines:

46 Chapter 8. Tutorial: Run NAS Parallel Benchmarks with gem5 gem5art, Release 0.2.1

packer= Artifact.registerArtifact( command= '''wget https://releases.hashicorp.com/packer/1.4.3/packer_1.4.3_linux_

˓→amd64.zip; unzip packer_1.4.3_linux_amd64.zip; ''', typ='binary', name='packer', path='disk-image/packer', cwd='disk-image', documentation='Program to build disk images. Downloaded sometime in August/19

˓→from hashicorp.' )

For our npb-tests repo, experiments_repo= Artifact.registerArtifact( command='git clone https://your-remote-add/npb-tests.git', typ='git repo', name='npb-tests', path='./', cwd='../', documentation='main repo to run npb with gem5' )

Note that the name of the artifact (returned by the registerArtifact method) is totally up to the user as well as most of the other attributes of these artifacts. For all other artifacts, add following lines in launch_npb_tests.py: gem5_repo= Artifact.registerArtifact( command='git clone https://gem5.googlesource.com/public/gem5', typ='git repo', name='gem5', path='gem5/', cwd='./', documentation='cloned gem5 from googlesource and checked out v20.1.0.0' ) m5_binary= Artifact.registerArtifact( command='scons build/x86/out/m5', typ='binary', name='m5', path='gem5/util/m5/build/x86/out/m5', cwd='gem5/util/m5', inputs= [gem5_repo,], documentation='m5 utility' ) disk_image= Artifact.registerArtifact( command='packer build npb.json', typ='disk image', name='npb', cwd='disk-image/npb', path='disk-image/npb/npb-image/npb', inputs= [packer, experiments_repo, m5_binary,], documentation='Ubuntu with m5 binary and NPB (with ROI annotations: darchr/npb-

˓→hooks/) installed.' ) (continues on next page)

8.8. Creating a launch script 47 gem5art, Release 0.2.1

(continued from previous page) gem5_binary= Artifact.registerArtifact( command= '''cd gem5; git checkout v20.1.0.0; scons build/X86/gem5.opt -j8 ''', typ='gem5 binary', name='gem5', cwd='gem5/', path='gem5/build/X86/gem5.opt', inputs= [gem5_repo,], documentation='gem5 binary based on v20.1.0.0' ) gem5_binary_MESI_Two_Level= Artifact.registerArtifact( command= '''cd gem5; git checkout v20.1.0.0; scons build/X86_MESI_Two_Level/gem5.opt --default=X86 PROTOCOL=MESI_Two_Level

˓→SLICC_HTML=True -j8 ''', typ='gem5 binary', name='gem5', cwd='gem5/', path='gem5/build/X86_MESI_Two_Level/gem5.opt', inputs= [gem5_repo,], documentation='gem5 binary based on v20.1.0.0' ) linux_repo= Artifact.registerArtifact( command= '''git clone https://git.kernel.org/pub/scm/linux/kernel/git/stable/

˓→linux.git; mv linux linux-stable''', typ='git repo', name='linux-stable', path='linux-stable/', cwd='./', documentation='linux kernel source code repo from June 24-2020' ) linux_binary= Artifact.registerArtifact( name='vmlinux-4.19.83', typ='kernel', path='linux-stable/vmlinux-4.19.83', cwd='linux-stable/', command= ''' cp ../config.4.19.83 .config; make -j8; cp vmlinux vmlinux-4.19.83; ''', inputs= [experiments_repo, linux_repo,], documentation="kernel binary for v4.19.83", )

Once, all the artifacts are registered the next step is to launch all gem5 jobs. To do that, first we will create a method createRun to create gem5art runs based on a few arguments:

48 Chapter 8. Tutorial: Run NAS Parallel Benchmarks with gem5 gem5art, Release 0.2.1

def createRun(bench, clas, cpu, mem, num_cpu):

if mem =='MESI_Two_Level': binary_gem5='gem5/build/X86_MESI_Two_Level/gem5.opt' artifact_gem5= gem5_binary_MESI_Two_Level else: binary_gem5='gem5/build/X86/gem5.opt' artifact_gem5= gem5_binary

return gem5Run.createFSRun( 'npb with gem5-20.1', binary_gem5, 'configs-npb-tests/run_npb.py', f'''results/run_npb_multicore/{bench}/{clas}/{cpu}/{num_cpu}''', artifact_gem5, gem5_repo, experiments_repo, 'linux-stable/vmlinux-4.19.83', 'disk-image/npb/npb-image/npb', linux_binary, disk_image, cpu, mem, bench.replace('.x',f'. {clas}.x'), num_cpu, timeout= 240 *60*60 #240 hours )

Next, initialize all the parameters to pass to createRun method, depending on the configuration space we want to test: if __name__ =="__main__": num_cpus=['1','8'] benchmarks=['is.x','ep.x','cg.x','mg.x','ft.x','bt.x','sp.x','lu.x']

classes=['A'] mem_sys=['MESI_Two_Level'] cpus=['kvm','timing']

Then, to run actual jobs depending on if you want to use celery or python multiprocessing library, add the following in your launch script:

8.9 If Using Celery

# For the cross product of tests, create a run object. runs= starmap(createRun, product(benchmarks, classes, cpus, mem_sys, num_cpus)) # Run all of these experiments in parallel for run in runs: run_gem5_instance.apply_async((run, os.getcwd(),))

8.10 If Using Python Multiprocessing Library:

def worker(run): run.run() json= run.dumpsJson() print(json)

(continues on next page)

8.9. If Using Celery 49 gem5art, Release 0.2.1

(continued from previous page) jobs=[]

# For the cross product of tests, create a run object. runs= starmap(createRun, product(benchmarks, classes, cpus, mem_sys, num_cpus)) # Run all of these experiments in parallel for run in runs: jobs.append(run)

with mp.Pool(mp.cpu_count()//2) as pool: pool.map(worker, jobs)

The above lines are responsible for looping through all possible combinations of variables involved in this experiment. For each combination, a gem5Run object is created and eventually passed to run_gem5_instance to be executed asyn- chronously if using Celery. In case of python multiprocessing library, these run objects are pushed to a list and then mapped to a job pool. Look at the definition of createFSRun() here to understand the use of passed arguments. Here, we are using a timeout of 240 hours, after which the particular gem5 job will be killed (assuming that gem5 should complete the booting process of linux kernel on the given hardware resources). You can configure this time according to your settings. The complete launch script is available here:. Finally, make sure you are in python virtual env and then run the script: python launch_boot_tests.py

8.11 Results

Once you run the launch script, the declared artifacts will be registered by gem5art and stored in the database. Celery will run as many jobs in parallel as allowed by the user (at the time of starting the server). As soon as a gem5 job finishes, a compressed version of the results will be stored in the database as well. User can also query the database using the methods discussed in the Artifacts, Runs sections and boot-test tutorial previously. The status of working of the NAS parallel benchmarks on gem5 based on the results from the experiments of this tutorial is following:

You can look here for the latest status of these tests on gem5.

Authors: • Ayaz Akram • Nadia Etemadi

50 Chapter 8. Tutorial: Run NAS Parallel Benchmarks with gem5 CHAPTER 9

Tutorial: Run Microbenchmarks with gem5

9.1 Introduction

In this tutorial, we will learn how to run some simple microbenchmarks using gem5art. Microbenchmarks are small benchmarks designed to test a component of a larger system. The particular microbenchmarks we are using in this tutorial were originally developed at the University of Wisconsin-Madison. This microbenchmark suite is divided into different control, execution and memory benchmarks. We will use system emulation (SE) mode of gem5 to run these microbenchmarks with gem5. This tutorial follows the following directory structure: • configs-micro-tests: the base gem5 configuration to be used to run SE mode simulations • gem5: gem5 source code and the compiled binary • results: directory to store the results of the experiments (generated once gem5 jobs are executed) • launch_micro_tests.py: gem5 jobs launch script (creates all of the needed artifacts as well)

9.2 Setting up the environment

First, we need to create the main directory named micro-tests (from where we will run everything) and turn it into a git repository like we did in the previous tutorials. Next, add a git remote to this repo pointing to a remote location where we want this repo to be hosted. mkdir micro-tests cd micro-tests git init git remote add origin https://your-remote-add/micro-tests.git

We also need to add a .gitignore file in our git repo to leave unnecessary files untracked:

51 gem5art, Release 0.2.1

*.pyc m5out .vscode results gem5 venv

Next, we will create a virtual python3 environment before using gem5art. virtualenv -p python3 venv source venv/bin/activate

This virtual environment needs to be running in order to run experiments with gem5art. You can deactivate the environment at any time with the command deactivate. gem5art can be installed (if not already) using pip: pip install gem5art-artifact gem5art-run gem5art-tasks

9.3 Build gem5

First clone gem5 in your micro-tests repo: git clone https://gem5.googlesource.com/public/gem5 cd gem5

Before building gem5, we need to apply a patch to the source repo. As you will later see, we will run gem5 with various memory configs. Inf (SimpleMemory with 0ns latency) and SingleCycle (SimpleMemory with 1ns latency) do not use any caches. Therefore, to implement cacheless SimpleMemory, we need to add support of vector ports in SimpleMemory by applying this patch. This becomes necessary as we need to connect cpu’s icache and dcache ports to the mem_ctrl port (a vector port). You can download and apply the patch as follows: wget https://github.com/darchr/gem5/commit/f0a358ee08aba1563c7b5277866095b4cbb7c36d.

˓→patch git am f0a358ee08aba1563c7b5277866095b4cbb7c36d.patch --reject

Now, build gem5: scons build/X86/gem5.opt -j8

9.4 Download and compile the microbenchmarks

Download the microbenchmarks: git clone https://github.com/darchr/microbench.git

Commit the source of microbenchmarks to the micro-tests repo, so that the current version of the microbenchmarks repo becomes a part of the micro-tests repository. git add microbench/ git commit -m "Add microbenchmarks"

Compile the benchmarks:

52 Chapter 9. Tutorial: Run Microbenchmarks with gem5 gem5art, Release 0.2.1

cd microbench make

By default, these microbenchmarks are compiled for the x86 ISA, which will be our focus in this tutorial. You can use the following commands to compile these benchmarks for ARM and RISC-V ISAs if you wish to work with them.

make ARM

make RISCV

9.5 gem5 run scripts

Now, we will add the gem5 run and configuration scripts to a new folder named configs-micro-tests. Get the run script named run_micro.py from here, and other system configuration file from here. The run script (run_micro.py) takes the following arguments: • cpu: cpu type [TimingSimple: timing simple cpu model, DerivO3: O3 cpu model] • memory: memory type [Inf: 0ns latency memory, SingleCycle: 1ns latency memory, SlowMemory: 100ns latency memory. All types have infinite bandwidth. Caches are only enabled for SlowMemory.] • benchmark: benchmark binary to run with gem5

9.6 Database and Celery Server

If not already running or created, you can create a database using:

`docker run -p 27017:27017 -v :/data/db --

˓→name mongo- -d mongo`

in a newly created directory. If not already installed, install RabbitMQ on your system (before running celery) using:

apt-get install rabbitmq-server

Now, run the celery server using:

celery -E -A gem5art.tasks.celery worker --autoscale=[number of workers],0

9.7 Creating a launch script

Next, we will create a launch script with the name launch_micro_tests.py, which will register the artifacts to be used and will start gem5 jobs. Like we did in previous tutorials, the first step is to import the required modules and classes:

import os import sys from uuid import UUID

(continues on next page)

9.5. gem5 run scripts 53 gem5art, Release 0.2.1

(continued from previous page) from gem5art.artifact import Artifact from gem5art.run import gem5Run from gem5art.tasks.tasks import run_gem5_instance

Next, we will register the artifacts: experiments_repo= Artifact.registerArtifact( command='git clone https://your-remote-add/micro-tests.git', typ='git repo', name='micro-tests', path='./', cwd='../', documentation='main experiments repo to run microbenchmarks with gem5' ) gem5_repo= Artifact.registerArtifact( command= '''git clone https://gem5.googlesource.com/public/gem5; cd gem5; wget https://github.com/darchr/gem5/commit/

˓→38d07ab0251ea8f5181abc97a534bb60157b2b5d.patch; git am 38d07ab0251ea8f5181abc97a534bb60157b2b5d.patch --reject; ''', typ='git repo', name='gem5', path='gem5/', cwd='./', documentation='git repo with gem5 cloned on Nov 22 from googlesource (patch

˓→applied to support mem vector port)' ) gem5_binary= Artifact.registerArtifact( command='scons build/X86/gem5.opt', typ='gem5 binary', name='gem5', cwd='gem5/', path='gem5/build/X86/gem5.opt', inputs= [gem5_repo,], documentation='default gem5 x86' )

The number of artifacts is less than what we had to use in previous (full-system) tutorials (boot, npb), as expected. Now to run the benchmarks, we will iterate through possible cpu types, memory types and all of the microbenchmarks from the microbench repository. We will also register an artifact for each microbenchmark. If you want to run certain benchmarks, you can indicate which ones in the bm_list array. if __name__ =="__main__":

cpu_types=['TimingSimple','DerivO3'] mem_types=['Inf','SingleCycle','Slow']

bm_list=[]

# iterate through files in microbench dir to # create a list of all microbenchmarks

(continues on next page)

54 Chapter 9. Tutorial: Run Microbenchmarks with gem5 gem5art, Release 0.2.1

(continued from previous page) for filename in os.listdir('microbench'): if os.path.isdir(f'microbench/{filename}') and filename !='.git': bm_list.append(filename)

# create an artifact for each single microbenchmark for bm in bm_list: bm= Artifact.registerArtifact( command= ''' cd microbench/{}; make X86; '''.format(bm), typ='binary', name= bm, cwd='microbench/ {}'.format(bm), path='microbench/ {}/bench.X86'.format(bm), inputs= [experiments_repo,], documentation='microbenchmark ( {}) binary for X86 ISA'.format(bm) )

for bm in bm_list: for cpu in cpu_types: for mem in mem_types: run= gem5Run.createSERun( 'microbench_tests', 'gem5/build/X86/gem5.opt', 'configs-micro-tests/run_micro.py', 'results/X86/run_micro/{}/{}/{}'.format(bm,cpu,mem), gem5_binary,gem5_repo,experiments_repo, cpu,mem,os.path.join('microbench',bm,'bench.X86')) run.run()

Note that, in contrast to previous tutorials (boot, npb), we are using createSERun this time, as we want to run gem5 in SE mode. The details of the arguments needed by createSERun() can be found here). The full launch script is available here. Once you run this launch script (as shown below), your microbenchmark experiments will start running, which will simulate execution of microbenchmarks on different cpu and memory types. python launch_micro_tests.py

Later, you can access the database to see the status of these jobs and further analyze the results of your microbenchmark experiments. Happy experimenting!

Authors: • Hoa Nguyen

9.7. Creating a launch script 55 gem5art, Release 0.2.1

56 Chapter 9. Tutorial: Run Microbenchmarks with gem5 CHAPTER 10

Tutorial: Run SPEC CPU 2017 / SPEC CPU 2006 Benchmarks in Full System Mode with gem5art

10.1 Introduction

In this tutorial, we will demonstrate how to utilize gem5art and gem5-resources to run SPEC CPU 2017 benchmarks in gem5 full system mode. The scripts in this tutorial work with gem5art v1.3.0, gem5 20.1.0.4, and gem5-resources 20.1.0.4. The content of this tutorial is mostly for conducting SPEC CPU 2017 experiments. However, due to the similarity of SPEC 2006 and SPEC 2017 resources, this tutorial also applies to conducting SPEC 2006 experiment by using src/spec-2006 folder instead of src/spec-2017 of gem5-resources.

10.1.1 gem5-resources gem5-resources is an actively maintained collections of gem5-related resources that are commonly used. The resources include scripts, binaries and disk images for full system simulation of many commonly used benchmarks. This tutorial will offer guidance in utilizing gem5-resources for full system simulation.

10.1.2 gem5 Full System Mode

Different from gem5 SE mode (system emulation mode), the FS mode (full system mode) uses an actual Linux kernel binary instead of emulating the responsibilities of a typical modern OS such as managing page tables and taking care of system calls. As a result, gem5 FS simulation would be more realistic compared to gem5 SE simulation, especially when the interactions between the workload and the OS are significant part of the simulation. A typical gem5 full system simulation requires a compiled Linux kernel, a disk image containing compiled bench- marks, and gem5 system configurations. gem5-resources typically provides all required all of the mentioned resources for every supported benchmark such that one could download the resources and run the experiment without much mod- ification. However, due to license issue, gem5-resources does not provide a disk image containing SPEC CPU 2017 benchmarks. In this tutorial, we will provide a set of scripts that generates a disk image containing the benchmarks assuming the ISO file of the SPEC CPU 2017 benchmarks is available.

57 gem5art, Release 0.2.1

10.1.3 Overall Structure of the Experiment spec-2017/ |___ gem5/ # gem5 folder | |___ disk-image/ ||___ shared/ ||___ spec-2017/ ||___ spec-2017-image/ |||___ spec-2017 # the disk image will be generated here ||___ spec-2017.json # the Packer script ||___ cpu2017-1.1.0.iso # SPEC 2017 ISO (add here) | |___ configs ||___ system/ ||___ run_spec.py # gem5 run script | |___ vmlinux-4.19.83 # Linux kernel, link to download

˓→provided below | |___ README.md

58Chapter10. Tutorial: Run SPEC CPU 2017 / SPEC CPU 2006 Benchmarks in Full System Mode with gem5art gem5art, Release 0.2.1

10.1.4 An Overview of Host System - gem5 Interactions

Figure 1. A visual depict of how gem5 interacts with the host system. gem5 is configured to do the following: booting the Linux kernel, running the benchmark, and copying the SPEC outputs to the host system. However, since we are interested in getting the stats only for the benchmark, we will configure gem5 to exit after the kernel is booted, and then we reset the stats before running the benchmark. We use KVM CPU model in gem5 for Linux booting process to quickly boot the system, and after the process is complete, we switch to the desired detailed CPU to run the benchmark. Similarly, after the benchmark is complete, gem5 exits to host, which allows us to get the stats at that point. After that, optionally, we switch the CPU back to KVM, which allows us to quickly write the SPEC output files to the host. Note: gem5 will output the stats again when the gem5 run is complete. Therefore, we will see two sets of stats in one file in stats.txt. The stats of the benchmark is the the first part of stats.txt, while the second part of the file contains the stats of the benchmark AND the process of writing output files back to the host. We are only interested in the first part of stats.txt.

10.1. Introduction 59 gem5art, Release 0.2.1

10.2 Setting up the Experiment

In this part, we have two concurrent tasks: setting up the resources and documenting the process us- ing gem5art. We will structure the SPEC 2017 resources as laid out by gem5-resources. The script launch_spec2017_experiment.py will contain the documentation about the artifacts we create and will also serve as Python script that launches the experiment.

10.2.1 Acquiring gem5-resources and Setting up the Experiment Folder

First, we clone the gem5-resource repo and check out the stable branch upto the 1fe56ffc94005b7fa0ae5634c6edc5e2cb0b7357 commit, which is the most recent version of gem5- resources that is compatible with gem5 20.1.0.4 as of March 2021.

git clone https://gem5.googlesource.com/public/gem5-resources cd gem5-resources git checkout 1fe56ffc94005b7fa0ae5634c6edc5e2cb0b7357

Since all resources related to the SPEC CPU 2006 benchmark suite are in the src/spec-2017 and other folders in src/ are not related to this experiment, we set the root folder of the experiment in the src/spec-2017 folder of the cloned repo. To keep track of changes that are specific to src/spec-2017, we set up a git structure for the folder. Also, the git remote pointing to origin should also be setup as gem5art will use origin information. In the gem5-resources folder,

cd src/spec-2017 git init git remote add origin https://remote-address/spec-experiment.git

We document the root folder of the experiment in launch_spec2017_experiment.py as follows,

experiments_repo= Artifact.registerArtifact( command=''' git clone https://gem5.googlesource.com/public/gem5-resources cd gem5-resources git checkout 1fe56ffc94005b7fa0ae5634c6edc5e2cb0b7357 cd src/spec-2017 git init git remote add origin https://remote-address/spec-experiment.git ''', typ= 'git repo', name= 'spec2017 Experiment', path= './', cwd= './', documentation=''' local repo to run spec 2017 experiments with gem5 full system mode; resources cloned from https://gem5.googlesource.com/public/gem5-resources

˓→upto commit 1fe56ffc94005b7fa0ae5634c6edc5e2cb0b7357 of stable branch ''' )

We use .gitignore file to ingore changes of certain files and folders. In this experiment, we will use this . gitignore file,

*.pyc m5out .vscode (continues on next page)

60Chapter10. Tutorial: Run SPEC CPU 2017 / SPEC CPU 2006 Benchmarks in Full System Mode with gem5art gem5art, Release 0.2.1

(continued from previous page) results gem5art-env disk-image/packer disk-image/packer_cache disk-image/spec-2017/spec-2017-image/spec-2017 disk-image/spec-2017/cpu2017-1.1.0.iso gem5 vmlinux-4.19.83

In the script above, we ignore files and folders that we use other gem5art Artifact objects to keep track of them, or the presence of those files and folders do not affect the experiment’s results. For example, disk-image/packer is the path to the packer binary which generates the disk image, and newer versions packer probably won’t affect the content of the disk image. Another example is that we use another gem5art Artifact object to keep track of vmlinux-4.19.83, so we put the name of the file in the .gitignore file. Note: You probably notice that there are more than one way of keeping track of the files in the experiment folder: either the git structure of the experiment will keep track of a file, or we can create a separate gem5art Artifact object to keep track of that file. The decision of letting the git structure or creating a new Artifact object leads to different outcomes. The difference lies on the type of the Artifact object (specified by the typ parameter): for Artifact objects that has typ of git repo, gem5art won’t upload the files in the git structure to gem5art’s database, instead, it will only keep track of the hash of the HEAD commit of the git structure. However, for Artifact’s that do not have typ that is git repo, the file specfied in the path parameter will be uploaded to the database. Essentially, we tend to keep small-size files (such as scripts and texts) in a git structure, and to keep large-size files (such as gem5 binaries and disk images) in Artifact’s of type gem5 binary or binary. Another important difference is that gem5art does not keep track of files in a git Artifact, while it does upload other types of Artifact to its database.

10.2.2 Building gem5

In this step, we download the source code and build gem5 v20.1.0.4. In the root folder of the experiment,

git clone -b v20.1.0.4 https://gem5.googlesource.com/public/gem5 cd gem5 scons build/X86/gem5.opt -j8

We have two artifacts: one is the gem5 source code (the gem5 git repo), and the gem5 binary (gem5.opt). In launch_spec2017_experiments.py, we document the step in Artifact objects as follows,

gem5_repo= Artifact.registerArtifact( command= ''' git clone -b v20.1.0.4 https://gem5.googlesource.com/public/gem5 cd gem5 scons build/X86/gem5.opt -j8 ''', typ='git repo', name='gem5', path='gem5/', cwd='./', documentation='cloned gem5 v20.1.0.4' )

gem5_binary= Artifact.registerArtifact( command='scons build/X86/gem5.opt -j8', typ='gem5 binary', (continues on next page)

10.2. Setting up the Experiment 61 gem5art, Release 0.2.1

(continued from previous page) name='gem5-20.1.0.4', cwd='gem5/', path='gem5/build/X86/gem5.opt', inputs= [gem5_repo,], documentation='compiled gem5 v20.1.0.4 binary' )

10.2.3 Building m5

m5 is a binary that facilitates the communication between the host system and the guest system (gem5). The use of the m5 binary will be demonstrated in the runscripts that we will describe later. m5 binary will be copied to the disk image so that the guest could run m5 binary during the simulation. m5 binary should be compiled before we build the disk image. Note: it’s important to compile the m5 binary with -DM5_ADDR=0xFFFF0000 as is default in the SConscript. This address is used by the guest binary to communicate with the simulator. If you change the address in the guest binary, you also have to update the simulator to use the new address. Additionally, when running in KVM, it is required that you use the address form of guest<->simulator communication and not the pseudo instruction form (i.e., using -DM5_ADDR is required when compiling a guest binary for which you want to run in KVM mode on gem5). To compile m5 binary, in the root folder of the experiment,

cd gem5/util/m5/ scons build/x86/out/m5

In launch_spec2017_experiments.py, we document the step in an Artifact object as follows,

m5_binary= Artifact.registerArtifact( command='scons build/x86/out/m5', typ='binary', name='m5', path='gem5/util/m5/build/x86/out/m5', cwd='gem5/util/m5', inputs= [gem5_repo,], documentation='m5 utility' )

10.2.4 Building the Disk Image

In this step, we will build the disk image using packer. Note: If you are interested in modifying the SPEC configuration file, Appendix II describes how the scripts that build the disk image work. Also, more information about using packer and building disk images can be found here. First, we download the packer binary. The current version of packer as of December 2020 is 1.6.6.

cd disk-image/ wget https://releases.hashicorp.com/packer/1.6.6/packer_1.6.6_linux_amd64.zip unzip packer_1.6.6_linux_amd64.zip rm packer_1.6.6_linux_amd64.zip

In launch_spec2017_experiments.py, we document how we obtain the binary as follows,

62Chapter10. Tutorial: Run SPEC CPU 2017 / SPEC CPU 2006 Benchmarks in Full System Mode with gem5art gem5art, Release 0.2.1

packer= Artifact.registerArtifact( command= ''' wget https://releases.hashicorp.com/packer/1.6.6/packer_1.6.6_linux_amd64.zip; unzip packer_1.6.6_linux_amd64.zip; ''', typ='binary', name='packer', path='disk-image/packer', cwd='disk-image', documentation='Program to build disk images. Downloaded from https://www.packer.

˓→io/.' )

Second, we build the disk image. The script disk-image/spec-2017/spec-2017.json specifies how the disk image is built. In this step, we assume the SPEC 2017 ISO file is in the disk-image/spec-2017 folder and the ISO file name is cpu2017-1.1.0.iso. The path and the name of the ISO file could be changed in the JSON file. To build the disk image, in the root folder of the experiment,

cd disk-image/ ./packer validate spec-2017/spec-2017.json # validate the script, including checking

˓→the input files ./packer build spec-2017/spec-2017.json

The process should take about than an hour to complete on a fairly recent machine with a cable internet speed. The disk image will be in disk-image/spec-2017/spec-2017-image/spec-2017. Note: Packer will output a URL to a VNC server that could be connected to to inspect the building process. Note: More about using packer and building disk images. Now, in launch_spec2017_experiments.py, we make an Artifact object of the disk image.

disk_image= Artifact.registerArtifact( command='./packer build spec-2017/spec-2017.json', typ='disk image', name='spec-2017', cwd='disk-image/', path='disk-image/spec-2017/spec-2017-image/spec-2017', inputs= [packer, experiments_repo, m5_binary,], documentation='Ubuntu Server with SPEC 2017 installed, m5 binary installed and

˓→root auto login' )

10.2.5 Obtaining a Compiled Linux Kernel that Works with gem5

The compiled Linux kernel binaries that is known to work with gem5 can be found here: https://www.gem5.org/documentation/general_docs/gem5_resources/. The Linux kernel configurations that are used to compile the Linux kernel binaries are docu- mented and maintained in gem5-resources: https://gem5.googlesource.com/public/gem5-resources/+/ cee972a1727abd80924dad73d9f3b5cf0f13012d/src/linux-kernel/. The following command downloads the compiled Linux kernel of version 4.19.83. In the root folder of the experiment,

wget http://dist.gem5.org/dist/v20-1/kernels/x86/static/vmlinux-4.19.83

10.2. Setting up the Experiment 63 gem5art, Release 0.2.1

Now, in launch_spec2017_experiments.py, we make an Artifact object of the Linux kernel binary.

linux_binary= Artifact.registerArtifact( name='vmlinux-4.19.83', typ='kernel', path='./vmlinux-4.19.83', cwd='./', command= ''' wget http://dist.gem5.org/dist/v20-1/kernels/x86/static/vmlinux-4.

˓→19.83''', inputs= [experiments_repo,], documentation="kernel binary for v4.19.83", )

10.2.6 gem5 System Configurations

The gem5 system configurations can be found in the configs/ folder. The gem5 run script located in configs/ run_spec.py, takes the following parameters: • --kernel: (required) the path to vmlinux file. • --disk: (required) the path to spec image. • --cpu: (required) name of the detailed CPU model. Currently, we are supporting the following CPU models: kvm, o3, atomic, timing. More CPU models could be added to getDetailedCPUModel() in run_spec.py. • --benchmark: (required) name of the SPEC CPU 2017 benchmark. The availability of the benchmarks could be found at the end of the tutorial. • --size: (required) size of the benchmark. There are three options: ref, train, test. • --no-copy-logs: this is an optional parameter specifying whether the spec log files should be copied to the host system. • --allow-listeners: this is an optional parameter specifying whether gem5 should open ports so that gdb or telnet could connect to. No listeners are allowed by default. We don’t use another Artifact object to document this file. The Artifact repository object of the root folder will keep track of the changes of the script. Note: The first two parameters of the gem5 run script for full system simulation should always be the path to the linux binary and the path to the disk image, in that order

10.3 Running the Experiment

10.3.1 Setting up the Python virtual environment gem5art code works with Python 3.5 or above. The following script will set up a python3 virtual environment named gem5art-env. In the root folder of the experiment,

virtualenv -p python3 gem5art-env

To activate the virtual environment, in the root folder of the experiment,

source gem5art-env/bin/activate

To install the gem5art dependency (this should be done when we are in the virtual environment),

64Chapter10. Tutorial: Run SPEC CPU 2017 / SPEC CPU 2006 Benchmarks in Full System Mode with gem5art gem5art, Release 0.2.1

pip install gem5art-artifact gem5art-run gem5art-tasks

To exit the virtual environment, deactivate

Note: the following steps should be done while using the Python virtual environment.

10.3.2 Running the Database Server

The following script will run the MongoDB database server in a docker container. docker run -p 27017:27017 -v /path/in/host:/data/db --name mongo-1 -d mongo

The -p 27017:27017 option maps the port 27017 in the container to port 27017 on the host. The -v /path/in/host:/data/db option mounts the /data/db folder in the docker container to the folder /path/in/host in the host. The path of the host folder should an absoblute path, and the database files created by MongoDB will be in that folder. The –name mongo-1 option specifies the name of the docker container. We can use this name to identify to the container. The -d option will let the container run in the background. mongo is the name of the offical mongo image.

10.3.3 Running Celery Server (optional)

This step is only necessary if you want to use Celery to manage processes. Inisde the path in the host specified above, celery -E -A gem5art.tasks.celery worker --autoscale=[number of workers],0

10.3.4 Creating the Launch Script Running the Experiment

Now, we can put together the run script! In launch_spec2017_experiments.py, we import the required modules and classes at the beginning of the file, import os import sys from uuid import UUID from gem5art.artifact import Artifact from gem5art.run import gem5Run from gem5art.tasks.tasks import run_job_pool

And then, we put the launch function at the end of launch_spec2017_experiments.py, if __name__ =="__main__": cpus=['kvm','atomic','o3','timing'] benchmark_sizes={'kvm':['test','ref'], 'atomic':['test'], 'o3':['test'], 'timing':['test'] } benchmarks=["503.bwaves_r","507.cactuBSSN_r","508.namd_r","510.parest_r",

˓→"511.povray_r","519.lbm_r", "521.wrf_r","526.blender_r","527.cam4_r","538.imagick_r","544.

˓→nab_r","549.fotonik3d_r", (continues on next page)

10.3. Running the Experiment 65 gem5art, Release 0.2.1

(continued from previous page) "554.roms_r","997.specrand_fr","603.bwaves_s","607.cactuBSSN_s",

˓→"619.lbm_s","621.wrf_s", "627.cam4_s","628.pop2_s","638.imagick_s","644.nab_s","649.

˓→fotonik3d_s","654.roms_s", "996.specrand_fs","500.perlbench_r","502.gcc_r","505.mcf_r",

˓→"520.omnetpp_r","523.xalancbmk_r", "525.x264_r","531.deepsjeng_r","541.leela_r","548.exchange2_r",

˓→"557.xz_r","999.specrand_ir", "600.perlbench_s","602.gcc_s","605.mcf_s","620.omnetpp_s","623.

˓→xalancbmk_s","625.x264_s", "631.deepsjeng_s","641.leela_s","648.exchange2_s","657.xz_s",

˓→"998.specrand_is"]

runs=[] for cpu in cpus: for size in benchmark_sizes[cpu]: for benchmark in benchmarks: run= gem5Run.createFSRun( 'gem5 v20.1.0.4 spec 2017 experiment', # name 'gem5/build/X86/gem5.opt', # gem5_binary 'gem5-configs/run_spec.py', # run_script 'results/{}/{}/{}'.format(cpu, size, benchmark), # relative_outdir gem5_binary, # gem5_artifact gem5_repo, # gem5_git_artifact run_script_repo, # run_script_git_artifact 'linux-4.19.83/vmlinux-4.19.83', # linux_binary 'disk-image/spec2017/spec2017-image/spec2017', # disk_image linux_binary, # linux_binary_artifact disk_image, # disk_image_artifact cpu, benchmark, size, # params timeout= 10 *24*60*60 # 10 days ) runs.append(run)

run_job_pool(runs)

The above launch function will run the all the available benchmarks with kvm, atomic, timing, and o3 cpus. For kvm, both test and ref sizes will be run, while for the rest, only benchmarks of size test will be run. Note that the line 'results/{}/{}/{}'.format(cpu, size, benchmark), # relative_outdir specifies how the results folder is structured. The results folder should be carefully structured so that there does not exist two gem5 runs write to the same place.

10.3.5 Run the Experiment

Having celery and mongoDB servers running, we can start the experiment. In the root folder of the experiment, python3 launch_spec2017_experiment.py

Note: The URI to a remote database server could be specified by specifying the environment variable GEM5ART_DB. For example, if the mongo database server is running at localhost123, the command to run the launch script would be,

66Chapter10. Tutorial: Run SPEC CPU 2017 / SPEC CPU 2006 Benchmarks in Full System Mode with gem5art gem5art, Release 0.2.1

GEM5ART_DB="mongodb://localhost123" python3 launch_spec2017_experiment.py

10.4 Appendix I. Working Status

Not all benchmarks are compiled in the above set up as of March 2020. The working status of SPEC 2017 workloads is available here: https://www.gem5.org/documentation/benchmark_status/gem5-20#spec-2017-tests.

10.5 Appendix II. Disk Image Generation Scripts

disk-image/spec-2017/install-spec2017.sh: a Bash script that will be executed on the guest machine after Ubuntu Server is installed in the disk image; this script installs depedencies to compile and run SPEC workloads, mounts the SPEC ISO and installs the benchmark suite on the disk image, and creates a SPEC configuration from gcc42 template. disk-image/spec-2017/post-installation.sh: a script that will be executed on the guest machine; this script copies the [email protected] file to the systemd folder, copies m5 binary to /sbin, and appends the content of runscript.sh to the disk image’s .bashrc file, which will be executed after the booting process is done. disk-image/spec-2017/runscript.sh: a script that will be copied to .bashrc on the disk image so that the commands in this script will be run immediately after the booting process. disk-image/spec-2017/spec-2017.json: contains a configuration telling Packer how the disk image should be built.

Authors: • Mahyar Samani

10.4. Appendix I. Working Status 67 gem5art, Release 0.2.1

68Chapter10. Tutorial: Run SPEC CPU 2017 / SPEC CPU 2006 Benchmarks in Full System Mode with gem5art CHAPTER 11

Tutorial: Run PARSEC Benchmarks with gem5

11.1 Introduction

In this tutorial, we will use gem5art to create a disk image for PARSEC benchmarks (PARSEC) and then run the benchmarks using gem5. PARSEC is mainly designed to represent the applications that require vast amount of shared- memory. Following are their details: Kernels: • canneal: Simulated cache-aware annealing to optimize routing cost of a chip design • dedup: Next-generation compression with data deduplication • streamcluster: Online clustering of an input stream Pseudo Applications: • blackscholes: Option pricing with Black-Scholes Partial Differential Equation (PDE) • bodytrack: Body tracking of a person • facesim: Simulates the motions of a human face • ferret: Content similarity search server • fluidanimate: Fluid dynamics for animation purposes with Smoothed Particle Hydrodynamics (SPH) method • freqmine: Frequent itemset mining • raytrace: Real-time raytracing • swaptions: Pricing of a portfolio of swaptions • vips: Image processing (Project Website) • x264: H.264 video encoding (Project Website) There are different sizes for possible inputs to each workload. Each size is explained below:

69 gem5art, Release 0.2.1

• test: very small set of inputs just to test the functionality of the program. • simdev: small set of inputs intended to generate general behaviour of each program. Mainly used for simulators and development. • simsmall, simmedium, simlarge: variable size inputs appropriate for testing microarchitectures with simula- tors. • native: very large set of inputs intended for native execution. This tutorial follows the following directory structure (inside the main directory): • configs-parsec-tests: gem5 run and configuration scripts to run PARSEC • disk-image: contains packer script and template files used to build a disk image. The built disk image will be stored in the same folder • gem5: gem5 source code and the compiled binary • linux-stable: linux kernel source code used for full-system experiments • config.4.19.83: linux kernel config file used for its compilation • results: directory to store the results of the experiments (generated once gem5 jobs are executed) • launch_parsec_tests.py: gem5 jobs launch script (creates all of the needed artifacts as well)

11.2 Setting up the environment

First, we need to create the main directory named parsec-tests (from where we will run everything) and turn it into a git repository. Through the use of parsec-tests git repo, we will try to keep track of changes in those files which are not included in any git repo otherwise. An example of such files is gem5 run and config scripts (config-parsec-tests). We want to make sure that we can keep record of any changes in these scripts, so that a particular run of PARSEC benchmarks can be associated with a particular snapshot of these files. We also need to add a git remote to this repo pointing to a remote location where we want this repo to be hosted. mkdir parsec-tests cd parsec-tests git init git remote add origin https://your-remote-add/parsec-tests.git

We also need to add a .gitignore file in our git repo, to avoid tracking those files which are not important or will be tracked through other git repos:

*.pyc m5out .vscode results venv disk-image/packer disk-image/packer_1.4.3_linux_amd64.zip disk-image/parsec/parsec-image/parsec disk-image/parsec/ disk-image/parsec-benchmark/ disk-image/packer_cache gem5 linux-stable/ gem5art relies on Python 3, so we suggest creating a virtual environment before using gem5art.

70 Chapter 11. Tutorial: Run PARSEC Benchmarks with gem5 gem5art, Release 0.2.1

virtualenv -p python3 venv source venv/bin/activate gem5art can be installed (if not already) using pip: pip install gem5art-artifact gem5art-run gem5art-tasks

11.3 Building gem5

For instructions on how to build gem5 look here.

11.4 Creating a disk image

First create a disk-image folder where we will keep all disk image related files: mkdir disk-image

We will follow the similar directory structure as discussed in Disk Images section. Add a folder named shared for config files which will be shared among all disk images (and will be kept to their defaults) and one folder named parsec which will contain files configured for PARSEC disk image. Add preseed.cfg and [email protected] in shared/. In parsec/ we will add the benchmark source first, which will eventually be transferred to the disk image through our parsec.json file. cd disk-image/parsec-benchmark git clone https://github.com/darchr/parsec-benchmark.git

This source of PARSEC has ROI (region of interest) annotations for each benchmark which will be used by gem5 to separate out simulation statistics of the important parts of a program from the rest of the program. Basically, gem5 magic instructions are used before and after the ROI which exit the guest and transfer control to gem5 run script which can then do things like dumping or resetting stats or switching to cpu of interest. Next, we will add few other files in parsec/ which will be used for compilation of PARSEC inside the disk image and eventually running of these benchmarks with gem5. These files will be moved from host to the disk image using parsec.json file as we will soon see. First, create a file parsec-install.sh, which will be executed inside the disk image (once it is built) and will install PARSEC on the disk image:

# install build-essential (gcc and g++ included) and gfortran

#Compile PARSEC cd /home/gem5/ su gem5 echo "12345" | sudo -S apt update

# Allowing services to restart while updating some # libraries. sudo apt install -y debconf-utils sudo debconf-get-selections | grep restart-without-asking > libs.txt sed -i 's/false/true/g' libs.txt (continues on next page)

11.3. Building gem5 71 gem5art, Release 0.2.1

(continued from previous page) while read line; do echo $line | sudo debconf-set-selections; done < libs.txt sudo rm libs.txt ##

# Installing packages needed to build PARSEC sudo apt install -y build-essential sudo apt install -y m4 sudo apt install -y git sudo apt install -y python sudo apt install -y python-dev sudo apt install -y gettext sudo apt install -y libx11-dev sudo apt install -y libxext-dev sudo apt install -y xorg-dev sudo apt install -y unzip sudo apt install -y texinfo sudo apt install -y freeglut3-dev ##

# Building PARSEC echo "12345" | sudo -S chown gem5 -R parsec-benchmark/ echo "12345" | sudo -S chgrp gem5 -R parsec-benchmark/ cd parsec-benchmark ./install.sh ./get-inputs cd.. echo "12345" | sudo -S chown gem5 -R parsec-benchmark/ echo "12345" | sudo -S chgrp gem5 -R parsec-benchmark/ ##

In parsec/, create a file post-installation.sh and add following lines to it:

#!/bin/bash echo 'Post Installation Started' mv /home/gem5/[email protected] /lib/systemd/system/ mv /home/gem5/m5 /sbin ln -s /sbin/m5 /sbin/gem5

# copy and run outside (host) script after booting cat /home/gem5/runscript.sh >> /root/.bashrc echo 'Post Installation Done'

This post-installation.sh script (which is a script to run after Ubuntu is installed on the disk image) installs m5 and copies the contents of runscript.sh to .bashrc. Therefore, we need to add those things in runscript.sh which we want to execute as soon as the system boots up. Create runscript.sh in parsec/ and add following lines to it:

#!/bin/sh m5 readfile > script.sh if [ -s script.sh]; then # if the file is not empty, execute it chmod +x script.sh (continues on next page)

72 Chapter 11. Tutorial: Run PARSEC Benchmarks with gem5 gem5art, Release 0.2.1

(continued from previous page) ./script.sh m5 exit fi # otherwise, drop to the terminal runscript.sh uses m5 readfile to read the contents of a script which is how gem5 passes scripts to the simulated system from the host system. The passed script will then be executed and will be responsible for running benchmark/s which we will look into more later. Finally, create parsec.json and add following contents:

{ "builders": [ { "type": "qemu", "format": "raw", "accelerator": "kvm", "boot_command": [ "{{ user `boot_command_prefix` }}", "debian-installer={{ user `locale` }} auto locale={{ user `locale` }}

˓→kbd-chooser/method=us ", "file=/floppy/{{ user `preseed` }} ", "fb=false debconf/frontend=noninteractive ", "hostname={{ user `hostname` }} ", "/install/vmlinuz noapic ", "initrd=/install/initrd.gz ", "keyboard-configuration/modelcode=SKIP keyboard-configuration/

˓→layout=USA ", "keyboard-configuration/variant=USA console-setup/ask_detect=false ", "passwd/user-fullname={{ user `ssh_fullname` }} ", "passwd/user-password={{ user `ssh_password` }} ", "passwd/user-password-again={{ user `ssh_password` }} ", "passwd/username={{ user `ssh_username` }} ", "-- " ], "cpus": "{{ user `vm_cpus`}}", "disk_size": "{{ user `image_size` }}", "floppy_files": [ "shared/{{ user `preseed` }}" ], "headless": "{{ user `headless` }}", "http_directory": "shared/", "iso_checksum": "{{ user `iso_checksum` }}", "iso_checksum_type": "{{ user `iso_checksum_type` }}", "iso_urls":[ "{{ user `iso_url` }}"], "memory": "{{ user `vm_memory`}}", "output_directory": "parsec/{{ user `image_name` }}-image", "qemuargs": [ [ "-cpu", "host"], [ "-display", "none"] ], "qemu_binary":"/usr/bin/qemu-system-x86_64", "shutdown_command": "echo '{{ user `ssh_password` }}'|sudo -S shutdown -P ˓→now", (continues on next page)

11.4. Creating a disk image 73 gem5art, Release 0.2.1

(continued from previous page) "ssh_password": "{{ user `ssh_password` }}", "ssh_username": "{{ user `ssh_username` }}", "ssh_wait_timeout": "60m", "vm_name": "{{ user `image_name` }}" } ], "provisioners": [ { "type": "file", "source": "../gem5/util/m5/m5", "destination": "/home/gem5/" }, { "type": "file", "source": "shared/[email protected]", "destination": "/home/gem5/" }, { "type": "file", "source": "parsec/runscript.sh", "destination": "/home/gem5/" }, { "type": "file", "source": "parsec/parsec-benchmark/", "destination": "/home/gem5/" }, { "type": "shell", "execute_command": "echo '{{ user `ssh_password` }}' | {{.Vars}} sudo -E -

˓→S bash '{{.Path}}'", "scripts": [ "parsce/post-installation.sh", "parsec/parsec-install.sh" ] } ], "variables": { "boot_command_prefix": "

˓→

˓→

˓→

˓→", "desktop": "false", "image_size": "12000", "headless": "true", "iso_checksum": "34416ff83179728d54583bf3f18d42d2", "iso_checksum_type": "md5", "iso_name": "ubuntu-18.04.2-server-amd64.iso", "iso_url": "http://old-releases.ubuntu.com/releases/18.04.2/ubuntu-18.04.2-

˓→server-amd64.iso", "locale": "en_US", "preseed" : "preseed.cfg", "hostname": "gem5", (continues on next page)

74 Chapter 11. Tutorial: Run PARSEC Benchmarks with gem5 gem5art, Release 0.2.1

(continued from previous page) "ssh_fullname": "gem5", "ssh_password": "12345", "ssh_username": "gem5", "vm_cpus": "16", "vm_memory": "8192", "image_name": "parsec" }

} parsec.json is our primary .json configuration file. The provisioners and variables section of this file configure the files that need to be transferred to the disk and other things like disk image’s name. Next, download packer (if not already downloaded) in the disk-image folder: cd disk-image/ wget https://releases.hashicorp.com/packer/1.4.3/packer_1.4.3_linux_amd64.zip unzip packer_1.4.3_linux_amd64.zip

Now, to build the disk image inside the disk-image folder, run:

./packer validate parsec/parsec.json

./packer build parsec/parsec.json

11.5 Compiling the linux kernel

Follow the instructions here to compile your linux kernel

11.6 gem5 run scripts

Next, we need to add gem5 run scripts. We will do that in a folder named configs-parsec-tests. Get the run script named run_parsec.py from here, and other system configuration files from here. The run script (run_parsec.py) takes the following arguments: • kernel: compiled kernel to be used for simulation • disk: built disk image to be used for simulation • cpu: the cpu model to use (e.g. kvm or atomic) • benchmark: PARSEC workload to run (e.g. blackscholes, bodytrack, facesim, etc.) • num_cpus: number of parallel cpus to be simulated

11.7 Database and Celery Server

To create a database and start a celery server follow the instructions here.

11.5. Compiling the linux kernel 75 gem5art, Release 0.2.1

11.8 Creating a launch script

Finally, we will create a launch script with the name launch_parsec_tests.py, which will be responsible for registering the artifacts to be used and then launching gem5 jobs. The first thing to do in the launch script is to import required modules and classes: import os import sys from uuid import UUID from gem5art.artifact import Artifact from gem5art.run import gem5Run from gem5art.tasks.tasks import run_gem5_instance

Next, we will register artifacts. For example, to register packer artifact we will add the following lines: packer= Artifact.registerArtifact( command= '''wget https://releases.hashicorp.com/packer/1.4.3/packer_1.4.3_linux_

˓→amd64.zip; unzip packer_1.4.3_linux_amd64.zip; ''', typ='binary', name='packer', path='disk-image/packer', cwd='disk-image', documentation='Program to build disk images. Downloaded sometime in August from

˓→hashicorp.' )

For our parsec-tests repo, experiments_repo= Artifact.registerArtifact( command='git clone https://your-remote-add/parsec-tests.git', typ='git repo', name='parsec-tests', path='./', cwd='../', documentation='main repo to run parsec with gem5' )

Note that the name of the artifact (returned by the registerArtifact method) is totally up to the user as well as most of the other attributes of these artifacts. For all other artifacts, add following lines in launch_parsec_tests.py: parsec_repo= Artifact.registerArtifact( command= '''mkdir parsec-benchmark/; cd parsec-benchmark; git clone https://github.com/darchr/parsec-benchmark.git;''', typ='git repo', name='parsec_repo', path='./disk-image/parsec-benchmark/parsec-benchmark/', cwd='./disk-image/', documentation='main repo to copy parsec source to the disk-image' ) gem5_repo= Artifact.registerArtifact( (continues on next page)

76 Chapter 11. Tutorial: Run PARSEC Benchmarks with gem5 gem5art, Release 0.2.1

(continued from previous page) command= ''' git clone https://gem5.googlesource.com/public/gem5; cd gem5; git remote add darchr https://github.com/darchr/gem5; git fetch darchr; git cherry-pick 6450aaa7ca9e3040fb9eecf69c51a01884ac370c; git cherry-pick 3403665994b55f664f4edfc9074650aaa7ddcd2c; ''', typ='git repo', name='gem5', path='gem5/', cwd='./', documentation='cloned gem5 master branch from googlesource (Nov 18, 2019) and

˓→cherry-picked 2 commits from darchr/gem5' ) m5_binary= Artifact.registerArtifact( command='make -f Makefile.x86', typ='binary', name='m5', path='gem5/util/m5/m5', cwd='gem5/util/m5', inputs= [gem5_repo,], documentation='m5 utility' ) disk_image= Artifact.registerArtifact( command='./packer build parsec/parsec.json', typ='disk image', name='parsec', cwd='disk-image', path='disk-image/parsec/parsec-image/parsec', inputs= [packer, experiments_repo, m5_binary, parsec_repo,], documentation='Ubuntu with m5 binary and PARSEC installed.' ) gem5_binary= Artifact.registerArtifact( command='scons build/X86/gem5.opt', typ='gem5 binary', name='gem5', cwd='gem5/', path='gem5/build/X86/gem5.opt', inputs= [gem5_repo,], documentation='gem5 binary' ) linux_repo= Artifact.registerArtifact( command= '''git clone --branch v4.19.83 --depth 1 https://git.kernel.org/pub/scm/

˓→linux/kernel/git/stable/linux.git; mv linux linux-stable''', typ='git repo', name='linux-stable', path='linux-stable/', cwd='./', documentation='linux kernel source code repo' )

(continues on next page)

11.8. Creating a launch script 77 gem5art, Release 0.2.1

(continued from previous page) linux_binary= Artifact.registerArtifact( name='vmlinux-4.19.83', typ='kernel', path='linux-stable/vmlinux-4.19.83', cwd='linux-stable/', command= ''' cp ../config.4.19.83 .config; make -j8; cp vmlinux vmlinux-4.19.83; ''', inputs= [experiments_repo, linux_repo,], documentation="kernel binary for v4.19.83" )

Once, all of the artifacts are registered, the next step is to launch all gem5 jobs. To do that, add the following lines in your script: if __name__ =="__main__": num_cpus=['1'] benchmarks=['blackscholes','bodytrack','canneal','dedup','facesim','ferret',

˓→ 'fluidanimate','freqmine','raytrace','streamcluster','swaptions','vips','x264

˓→']

sizes=['simsmall','simlarge','native'] cpus=['kvm','timing']

for cpu in cpus: for num_cpu in num_cpus: for size in sizes: if cpu =='timing' and size !='simsmall': continue for bm in benchmarks: run= gem5Run.createFSRun( 'parsec_tests', 'gem5/build/X86/gem5.opt', 'configs-parsec-tests/run_parsec.py', f'''results/run_parsec/{bm}/{size}/{cpu}/{num_cpu}''', gem5_binary, gem5_repo, experiments_repo, 'linux-stable/vmlinux-4.19.83', 'disk-image/parsec/parsec-image/parsec', linux_binary, disk_image, cpu, bm, size, num_cpu, timeout= 24 *60*60 #24 hours ) run_gem5_instance.apply_async((run, os.getcwd(), ))

The above lines are responsible for looping through all possible combinations of variables involved in this experiment. For each combination, a gem5Run object is created and eventually passed to run_gem5_instance to be executed asyn- chronously using Celery. Note that when using timingSimpleCPU model only size simsmall has been used because the other sizes take more than 24 hours to simulate. Finally, make sure you are in python virtual env and then run the script:

78 Chapter 11. Tutorial: Run PARSEC Benchmarks with gem5 gem5art, Release 0.2.1

python launch_parsec_tests.py

11.9 Results

Once you run the launch script, the declared artifacts will be registered by gem5art and stored in the database. Celery will run as many jobs in parallel as allowed by the user (at the time of starting the server). As soon as a gem5 job finishes, a compressed version of the results will be stored in the database as well. User can also query the database using the methods discussed in the Artifacts, Runs sections and boot-test tutorial previously. Here is the status of each workload after simulation:

WorkingStatusKVM

WorkingStatusTiming Below are the simulation time for KVM and TimingSimple cpu models.

11.9. Results 79 gem5art, Release 0.2.1

SimTimeKVM

SimTimeTiming The number of instructions run on each cpu model is shown below:

InstCountKVM

80 Chapter 11. Tutorial: Run PARSEC Benchmarks with gem5 gem5art, Release 0.2.1

InstCountTiming

11.9. Results 81 gem5art, Release 0.2.1

82 Chapter 11. Tutorial: Run PARSEC Benchmarks with gem5 CHAPTER 12

Indices and tables

• genindex • modindex • search

83 gem5art, Release 0.2.1

84 Chapter 12. Indices and tables Python Module Index

g gem5art.artifact,9 gem5art.artifact._artifactdb, 12 gem5art.artifact.artifact, 10 gem5art.artifact.artifact.Artifact, 11 gem5art.artifact.common_queries, 11 gem5art.run, 18 gem5art.tasks.celery, 22 gem5art.tasks.tasks, 22

85 gem5art, Release 0.2.1

86 Python Module Index Index

A get() (gem5art.artifact._artifactdb.ArtifactMongoDB Artifact (class in gem5art.artifact),9 method), 13 Artifact (class in gem5art.artifact.artifact), 10 getByName() (in module gem5art.artifact), 10 ArtifactDB (class in gem5art.artifact._artifactdb), 12 getByName() (in module ArtifactMongoDB (class in gem5art.artifact.common_queries), 11 gem5art.artifact._artifactdb), 12 getDBConnection() (in module gem5art.artifact), 10 C getDBConnection() (in module gem5art.artifact._artifactdb), 13 checkArtifacts() (gem5art.run.gem5Run method), getDiskImages() (in module gem5art.artifact), 10 19 getDiskImages() (in module checkKernelPanic() (gem5art.run.gem5Run gem5art.artifact.common_queries), 11 method), 19 getgem5Binaries() (in module gem5art.artifact), createFSRun() (gem5art.run.gem5Run class 10 method), 19 getgem5Binaries() (in module createSERun() (gem5art.run.gem5Run class gem5art.artifact.common_queries), 12 method), 19 getGit() (in module gem5art.artifact.artifact), 11 D getHash() (in module gem5art.artifact.artifact), 11 getLinuxBinaries() (in module gem5art.artifact), downloadFile() (gem5art.artifact._artifactdb.ArtifactDB 10 method), 12 getLinuxBinaries() (in module downloadFile() (gem5art.artifact._artifactdb.ArtifactMongoDB gem5art.artifact.common_queries), 11 method), 13 getRuns() (in module gem5art.run), 20 dumpJson() (gem5art.run.gem5Run method), 19 getRunsByName() (in module gem5art.run), 20 dumpsJson() (gem5art.run.gem5Run method), 19 getRunsByNameLike() (in module gem5art.run), 20 G L gem5art.artifact (module),9 loadFromDict() (gem5art.run.gem5Run class gem5art.artifact._artifactdb (module), 12 method), 19 gem5art.artifact.artifact (module), 10 loadJson() (gem5art.run.gem5Run class method), 20 gem5art.artifact.artifact.Artifact (mod- ule), 11 P gem5art.artifact.common_queries (module), put() (gem5art.artifact._artifactdb.ArtifactDB 11 method), 12 gem5art.run (module), 18 put() (gem5art.artifact._artifactdb.ArtifactMongoDB gem5art.tasks.celery (module), 22 method), 13 gem5art.tasks.tasks (module), 22 gem5Run (class in gem5art.run), 19 R get() (gem5art.artifact._artifactdb.ArtifactDB registerArtifact() (gem5art.artifact.Artifact method), 12 class method), 10

87 gem5art, Release 0.2.1 registerArtifact() (gem5art.artifact.artifact.Artifact class method), 11 run() (gem5art.run.gem5Run method), 20 run_job_pool() (in module gem5art.tasks.tasks), 22 run_single_job() (in module gem5art.tasks.tasks), 22 S saveResults() (gem5art.run.gem5Run method), 20 searchByLikeNameType() (gem5art.artifact._artifactdb.ArtifactDB method), 12 searchByLikeNameType() (gem5art.artifact._artifactdb.ArtifactMongoDB method), 13 searchByName() (gem5art.artifact._artifactdb.ArtifactDB method), 12 searchByName() (gem5art.artifact._artifactdb.ArtifactMongoDB method), 13 searchByNameType() (gem5art.artifact._artifactdb.ArtifactDB method), 12 searchByNameType() (gem5art.artifact._artifactdb.ArtifactMongoDB method), 13 searchByType() (gem5art.artifact._artifactdb.ArtifactDB method), 12 searchByType() (gem5art.artifact._artifactdb.ArtifactMongoDB method), 13 U upload() (gem5art.artifact._artifactdb.ArtifactDB method), 12 upload() (gem5art.artifact._artifactdb.ArtifactMongoDB method), 13

88 Index