MASARYK UNIVERSITY FACULTY}w¡¢£¤¥¦§¨  OF I !"#$%&'()+,-./012345

Virtualization and repository management in academic environment

DIPLOMA THESIS

Bc. Jakub Hadvig

Brno, Spring 2013

Declaration

Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Advisor: Mgr. Marek Graz´

iii

Acknowledgement

I would like to thank Mgr. Marek Grac´ for supervising this master thesis. Mainly I want to thank my colleague Ivan Necasˇ for all his advice and feedback during the applica- tion development. I would also like to thank the whole Katello and Foreman develop- ment team and community for their effort and work. Furthermore, I would like to thank Marek Mahut, Jozef Zigmund,ˇ Luka´sˇ Votypka´ and my whole family, for their help and support during the writing of this thesis. Finally I would like to thank Red Hat company for the internship, during which I was able to write this thesis.

v

Abstract

This thesis deals with usage of virtualization and repository management in academic environment. Its goal is to make application which will provide virtual machines and their management for students and teachers in their class courses. The systems virtu- alization and their management will be handled by open source projects Katello and Foreman dedicated for system provisioning and management. Their REST API will be used at the back-end of the application to provide all the necessary services.

vii

Keywords

Rest API, virtualization, libvirt, provisioning, smart proxy, , ruby, , foreman, katello, open-source

ix

Contents

1 Introduction ...... 1 1.1 Goals ...... 1 1.2 Why virtualization in laboratory IT courses ...... 2 1.3 Structure of work ...... 3 2 Virtualization ...... 5 2.1 Types of virtualization ...... 5 2.2 Virtualization APIs ...... 7 2.2.1 Libvirt ...... 8 3 Puppet ...... 11 3.1 Agent-Master model ...... 11 3.2 How Puppet works ...... 12 3.3 Puppet architecture ...... 14 3.3.1 Puppet modules ...... 14 3.3.2 Module structure ...... 16 3.3.3 Puppet environments ...... 16 4 Project Foreman ...... 19 4.1 Foreman architecture ...... 20 4.1.1 Puppet ...... 20 4.1.2 Smart proxy ...... 21 4.1.3 Foreman Rest API ...... 23 4.2 Provisioning ...... 26 5 Project Katello ...... 29 5.1 Katello components ...... 29 5.1.1 Pulp ...... 30 5.1.2 Candlepin ...... 31 5.2 System registration and basic managemet ...... 33 6 Katello Scholar ...... 35 6.1 Project Specification ...... 35 6.1.1 Used Technologies ...... 35

xi 6.1.2 Project configuration ...... 36 6.2 Structure of project ...... 40 6.2.1 Project workflow ...... 41 6.2.1.1 Preparation of application environment ...... 41 6.2.1.2 Application usage ...... 42 6.2.2 Model Mapping ...... 44 6.3 Present application limitations ...... 45 7 Conclusion ...... 47 7.1 The theoretical part ...... 47 7.2 The practical part ...... 47 7.3 Personal experiences ...... 48 7.4 Future development plans ...... 48 8 Terminology ...... 49 A Application source code ...... 51

xii List of Figures

2.1 Implementing an application without and with libvirt[10] 9 3.1 Difference between system configuration without(left) and with(right) puppet master. 12 3.2 How Puppet works.[5] 13 4.1 Foreman architecture.[15] 20 4.2 Smart Variable example 21 4.3 Smart proxy distributed services scenario 22 4.4 Smart proxy all in one scenario 23 4.5 Foreman REST API example of a JSON input, for creating a Hostgroup object 24 4.6 Available REST API calls for Foreman Hostgroup object 24 4.7 Foreman REST API call example together with returned values 25 5.1 Project Katello features and dependencies.[7] 30 5.2 Pulp diagram.[3] 31 5.3 Candlepin diagram. 32 6.1 Katello Scholar entity-relationship model 40

xiii

Chapter 1

Introduction

In today’s education environment there is a lack of resources that could provide stu- dents with an opportunity to experiment with operating systems, their tools, so they can study without worrying of breaking some part of the system, the advantages and disadvantages of the system, to better understand it thoroughly and master its admin- istration. Also an desirable idea is that students will come to their lab lessons, and all the needed lesson resources and tools will be pre-installed and up and running, so they don’t have to worry about any missing dependencies associated with the lessons cur- riculum, and in case something goes wrong everything can be restored to initial state in a few minutes. They just log into the system and practice the lesson curriculum, even if that means learning by trial and error.

This chapter is meant to provide a basic overview of what this thesis is about, spec- ify its goals and expected outcomes. The text part of this thesis will give overview of individual parts which the practical part consists of. The practical output of this thesis will be an application used in academic environment for virtual machine provisioning and management.

1.1 Goals

The main goal is to create an open source web-based application capable of providing fully configured virtual machines and their management for lectors and their students for their laboratory IT courses. The application in its basic form will be able to provide from Red Hat operating system family (Red Hat Enterprise Linux, Fe- dora, CentOS, Scientific Linux) and settings specified by the lector and software pack- ages, that will be pre-installed. In addition the application will be able to assign needed repositories to the virtual machines of the desired lesson. The application will be built upon two open-source projects, whose RESTful API will

1 1. INTRODUCTION be used at the back-end of the application to provide all the necessary services. These two projects are The Foreman project which will be responsible for the virtual machine provisioning, configuration and management, and project Katello, which is a system life cycle management tool developed by Red Hat company and will be responsible for the repository management. The text part of this thesis will describe virtualization and introduce these two projects, their functionality and usage. Also tools, which these projects are using, will be pre- sented and described because of their important role. The text part will therefore consist of four main fields that will be looked into:

• Virtualization

• Puppet

• Project Foreman

• Project Katello

1.2 Why virtualization in laboratory IT courses

Virtualization in laboratory IT courses has a lot of advantages compared to classic ap- proach, where each class has predefined operating system or collection of operating systems, on all the class computers and students log in with their credentials. In this case students have to prepare their environment at the beginning of each lesson in case they need extra packages as lesson requirements. They also have to be careful not to disrupt their environment. On the other hand, virtualization and virtual machine pre- configuration in these courses come with number of benefits:

• All students in the class have identical, easily pre-set environments when using virtual machines.

• All students will have a clean virtual machine for each lesson without any old dependencies that could cause problems (This kind of approach is also used in software development environment).

• All students can be given an administrator access to the virtual machines, where they won’t affect the physical host machine.

2 1. INTRODUCTION

• In case something goes wrong, due to the students’ administrator rights or other circumstances, the virtual machine can be easily restored (re-installed) to the initial state.

• Every student can be given more virtual machines to work with.

• Lector can prepare environments with necessary operating system and software packages for each course lesson, so no further package installation and configu- ration will be needed during the lesson.

• Virtual machines will install themselves automatically before the lesson begins so when the lesson starts everything is prepared and set up for the lesson cur- riculum.

1.3 Structure of work

The second chapter will summarize basic types and aspects of virtualization. Third chapter will describe the Puppet, tool designed to manage system configuration, which is utilized in both projects. Fourth and fifth chapters describe the two used projects and their usage. Chapter six discusses the practical part of this thesis and describes the design and architecture of the created web-based application.

3

Chapter 2

Virtualization

Virtualization has its roots in early 1960s, when IBM corporation came up with and time-sharing solution so their users and customers can more efficiently use their expen- sive computer resources. Virtualization in this days is a common technique in informa- tion technology to create a virtual version of devise or resource, such as server, storage devise, operating or network system. It is a modified solution between centralized and decentralized deployments. Instead of purchasing and maintaining an entire computer for one application, each application can be given its own operating system, and all those operating systems can reside on a single piece of hardware. This provides the benefits of decentralization, like security and stability, while making the most of a ma- chines resources.[8] On the other hand virtualization comes with a few disadvantages. It can be deployed only on processors with hardware virtualization support, otherwise it will be slow, and also the more the virtualization is used, the more compute resources does it needs.

2.1 Types of virtualization

There are different types of virtualization. Each of this type is being used for all kinds of tasks, use cases and are typically named after the virtual resources: memory virtual- ization, network virtualization, storage virtualization, devise virtualization, hardware virtualization, operating system, etc. This thesis will treat virtualization as a option to create an virtual laboratory with arbitrary operation systems and the sort of software and tools that will be vital for laboratory lessons, so the hardware virtualization will be the most important one[11][20]. To understand hardware virtualization properly, one must first understand several important terms[10]:

• Host OS is the original operating system installed and running on a physical machine. It allocates physical resources such as processor or memory, to virtual

5 2. VIRTUALIZATION

operating systems(Guest OS) running on the host OS as needed

• Guest OS is an operating system that runs in a virtual environment. A guest OS may be a client desktop, physical server or operating system that runs inside host OS, or concurrently with it, depending on the virtualization method. The guest OS uses hardware resources allocated dynamically through a hypervisor or similar intermediary software.

• Virtual machine is a runtime computing environment that a guest OS runs inside, upon the host OS. Some virtual machines can emulate an arbitrary hardware configuration, while others merely provide way to communicate with the un- derlying hardware using a hypervisor.

• Hypervisor(or Virtual Machine Monitor, VMM) is a piece of software, used for cre- ating and running virtual machines. Hypervisor provides virtual machines with means of communicating with the host OS or directly with the physical hard- ware.

This section introduces the most common methods of virtualization. The industry sometimes uses different terms to describe the same virtualization methods. This are the most common used[9][10][19][17]:

• Hardware emulation

• Full Virtualization

• Paravirtualization

• Operating system-level virtualization

Hardware emulation is used to simulate required hardware, together with all its features (CPU pipelining, caching, etc.). This method is very useful in in certain cur- cumstances like operating system behaviour testing on different hardware types, but its performance can be very slow. One of the most implementation of hardware emulation is QEMU. QUMU together with KVM dispose with great performance improvement when compared to other hardware emulators, like Bochs.

The Full virtualization uses a middle communication layer between host and guest operating system which is hypervisor. Some privileged instructions from the guest OS

6 2. VIRTUALIZATION

(for example I/O) are trapped by the hypervisor, which also handles them, while non- privileged instructions are allowed to run directly on the CPU. The fact that the hypervi- sor traps and programatically translates some instructions (this process is called binary translation[19, p. 4]) at runtime, results in a significant performance drop when com- pared to an operating system running directly on the hardware. Although this fact full virtualization is much faster then hardware emulation. Examples of hypervisors that support full virtualization are VirtualBox1 or Microsoft Virtual Server2.

In paravirtualization method the guest operating system needs to know that he is used for paravirtualization and has to be appropriately modified to work in paravirtu- alized environment. Guest operating system communicates directly with the hypervisor (hypercall communication) to execute a privileged instructions. The direct communica- tion with hypervisor increases the speed in comparison with trapping instructions in full virtualization. Due to this fact paravirtualized system s are almost as fast as non- virtualized systems. One of the most used implementation of paravirtualization is Xen hypervisor3.

In contrast to so far presented virtualization methods there is Operating system-level virtualization. The host OS with a modified kernel controls multiple isolated instances of the system which is running on the top of the host. This technique requires modified kernel and it’s not very flexible. On the other hand it offers nearly native performance. The examples of tools which are implementing this type of virtualization are OpenVZ4 or Linux-VServer5

2.2 Virtualization APIs

Because of slightly different functions and their parameters there is a problem with common API for different variety of hypervisiors, even though they have the same functionality. Nevertheless the are several tools that are providing common interface for different APIs:

1. http://www.virtualbox.org/ 2. http://www.microsoft.com/hyper-v-server/ 3. http://www.xen.org/ 4. http://wiki.openvz.org/Main Page/ 5. http://linux-vserver.org/

7 2. VIRTUALIZATION

• UVAPI(Universal Virtualization API)6

• Ganeti7

• Libvirt

UVAPI is a project that provides a unified object model and API for interfacing with popular hypervisors and hosted virtual solutions. UVAPI is a simple Java library that provides communication with different hypervisors, such as VMware, Microsoft Hyper- V and Xen.

Ganeti is a cluster virtual server management software tool developed by Google built on top of existing virtualization technologies such as Xen or KVM and other Open Source software.

Libvirt is a open source library written in C that provides platform virtualization and allows to manage virtual machines hosted on any supported back-end. Libvirt library is used in the practical part of this thesis to provide virtual machine provisioning. Libvirt is described in next subsection.

2.2.1 Libvirt

As mentioned in section before, Libvirt is a library that provides platform virtualization. It’s written in C, but there are bindings for several programming languages like Python, Ryby, Java or Perl. Libvirt and supports a large range of hypervisors like KVM/QEMU, Xen, LXC, OpenVZ, VirtualBox, MVware, Microsoft Hyper-V, IBM PowerVM and other.

For basic Libvirt description, new terms have to be defined:

• Node is a physical machine.

• Domain is a guest OS running inside a virtual machine.

On each node that runs with Libvirt, there can be multiple hypervisors running upon it. Each hypervisor can also run multiple domains(virtual machines). Library provides common API to manipulate all these resources for different hypervisor(figure 2.1)[16]. Some of the major Libvirt functionality is[4]:

6. http://uvapi.sourceforge.net/index.html 7. http://code.google.com/p/ganeti/

8 2. VIRTUALIZATION

Figure 2.1: Implementing an application without and with libvirt[10]

• Virtual Machine Management: Various domain lifecycle operations such as start, stop, pause, save, restore, and migrate. Hotplug operations for many device types including disk and network interfaces, memory, and cpus.

• Remote machine support: All libvirt functionality is accessible on any machine run- ning the libvirt daemon, including remote machines. A variety of network trans- ports are supported for connecting remotely, with the simplest being SSH, which requires no extra explicit configuration.

• Storage management: Any host running the libvirt daemon can be used to man- age various types of storage: create file images of various formats, mount NFS shares, enumerate existing LVM volume groups, create new LVM volume groups and logical volumes, partition raw disk devices, mount iSCSI shares, and much more. Since libvirt works remotely as well, all these options are available for remote hosts as well.

• Network interface management: Any host running the libvirt daemon can be used to manage physical and logical network interfaces. Enumerate existing inter- faces, as well as configure (and create) interfaces, bridges, vlans, and bond de- vices.

• Virtual NAT and Route based networking: Any host running the libvirt daemon can manage and create virtual networks. Libvirt virtual networks use firewall rules to act as a router, providing VMs transparent access to the host machines network.

9

Chapter 3

Puppet

This chapter briefly describers Puppet tool and heavily drawn from its online documen- tation, that is available at http://docs.puppetlabs.com/ Puppet[6] is a open source configuration management tool for system administrators that helps them to automate management of their compute resources. It’s designed to configure Unix-like systems as well as also Microsoft Windows systems. Puppet is writ- ten in Ruby programming language and uses declarative language for expressing sys- tem configuration[1]. For Puppet description, new terms have to be defined:

• Node is a physical machine.

• Manifest is a Puppet program file with .pp extension. It which defines resources and it’s desired states.

• Catalog is a directed acyclic graph which represents resources and the order in which they need to be applied and synced.

• Puppet Agent is a daemon which runs on client node.

• Puppet Master is the server which serves as a maintainer of his agents.

3.1 Agent-Master model

For distribution of system configuration Puppet uses client-server model using the REST API. In the Puppet terms is this model called agent-master model. In case of standalone puppet agent wihout puppet master there are two possibilities how to get the system in the desired state. System configuration described in the manifest, can be either applied directly on system, or multiple manifests can be compiled into a catalog which will en- force the required, defined state on the system.

11 3. PUPPET

Puppet agent-master model works on similar principles. Difference is that the man- ifests and their compilation is taking place on the puppet master server. Agents don’t have to see any manifest files at all, and have no access to configuration information that isn’t in their own catalog. Figure 3.1. shows the differences between these two ap- proaches.

Manifest Manifest Agent Master Request Classify catalog Who is this and what do thet Compile Sends node need ? name and facts

Catalog Catalog Class Class

Apply Query Apply Query Status Status Compile Enforce Enforce defined defined state state

Defined system Defined system Report state state

Figure 3.1: Difference between system configuration without(left) and with(right) pup- pet master.

3.2 How Puppet works

Puppet typically uses agent-master model. All of the agents are communicating to one or more centralized servers. Each agent contacts the server periodically(by default, ev- ery half hour), downloads the latest configuration, and makes sure it is synchronized with it. Figure 3.2 visualizes how puppet enforces the desired state on the node, an each step is briefly described below.

12 3. PUPPET

Figure 3.2: How Puppet works.[5]

• Each agent contacts the server periodically(by default, every half hour) and sends facts, or data about its state, to the puppet master server.

• Using the facts, the puppet master server compiles manifests into a catalog, or detailed data about how the node should be configured, and sends this back to the puppet agent.

• Once the agent is configured, he sends a complete report to the puppet mas- ter, indicating if everything has been successfully configured or an error had occurred.

• The reports are fully accessible via open APIs for integration with other IT sys- tems.

13 3. PUPPET 3.3 Puppet architecture

To configure nodes to desired state puppet uses self-contained bundles of code and data called modules.

3.3.1 Puppet modules

Puppet modules are just a directories that contain classes and types which are automat- ically loaded from module and used as a source code that defines the node state. Each class describes the configuration by using types. Each type has it own set of attributes which take a parameter. Puppet modules are stored based on the environment in which they should take place. In this case they should be stored at /etc/puppet/$environment/, where $environment is the desired environment. If the module isn’t put in any of the environment directories, and is stored at /etc/puppet/modules/, puppet master assumes that this module is common and will be visible and usable for all agents, despite their environment. Below is an example of a puppet class, which defines NTP protocol for Linux Cen- tOS, RHEL, Debian, Ubuntu distributions.

1 class ntp {

2 case $operatingsystem {

3 centos, redhat: {

4 $service_name = ’ntpd’

5 $conf_file = ’ntp.conf.el’

6 }

7 debian, ubuntu: {

8 $service_name = ’ntp’

9 $conf_file = ’ntp.conf.debian’

10 }

11 }

12 package { ’ntp’:

13 ensure => installed,

14 }

15 service { ’ntp’:

16 name => $service_name,

17 ensure => running,

18 enable => true,

14 3. PUPPET

19 subscribe => File[’ntp.conf’],

20 }

21 file { ’ntp.conf’:

22 path => ’/etc/ntp.conf’,

23 ensure => file,

24 require => Package[’ntp’],

25 source => "/root/learning-manifests/${conf_file}",

26 }

27 }

This class shows a example of a simple puppet class. First is determined which oper- ating system and distribution is running on the node, so appropriate configuration file and service name can be created. Next in this class, three types are defined - package, service, file.

• Package - Manages software packages

– Ensure - Ensures the state of given package

• Service - Manages services running on the node

– Name - Name of the service – Ensure - Desired status of the service – Enable - Whether the service should be started on boot. – Subscribe - Defines relationship between two entities, so if an action is trig- gered, the listener can react accordingly(e.g. restart a service everytime file changes).

• File - Manages local files

– Path - Fully qualified path to the file – Ensure - Whether the file should exist and what it should be (file, directory, etc.) – Require - Sets dependency – Source - Where to download the file

There is 48 defined types. The full list of stable types can be found in the puppet documentation at http://docs.puppetlabs.com/references/stable/type. html.

15 3. PUPPET

3.3.2 Module structure

As mentioned before, module is simply a directory tree a with a specific structure:

• Module Name - This outermost directory’s name matches the name of the mod- ule

– Manifests - Contains all of the manifests in the module – Files - Contains static files, which managed nodes can download – Templates - Contains templates, which the module’s manifests can use – Lib -Contains plugins, like custom facts and custom resource types – Tests - Contains examples showing how to declare the module’s classes and defined types – Spec - Contains spec tests for any plugins in the lib directory

3.3.3 Puppet environments

Each puppet agent and master is configured to have one or several environments. En- vironments are simply a short label specified in puppet.conf file environment setting. Whenever that agent node makes a request, the puppet master gets informed of its en- vironment and applies appropriate configuration on the given agent node. If the envi- ronment is not specified, the agent will use default ”production” environment[14][18]. Puppet master uses that environment in several ways:

• If the master’s puppet.conf file has a [config block] for this agent’s environ- ment, those settings will override the master’s normal settings when serving that agent.

• Environment can be interpolated with values that reference to the given envi- ronment variable.

• Different requests can be allowed or denied, based on configuration of puppet’s REST API access in auth.conf file.

• The agent’s environment will also be accessible in puppet manifests.

An example of a puppet.conf file:

16 3. PUPPET

1 # /etc/puppet/puppet.conf

2 [main]

3 server = puppet.example.com

4 environment = production

5 confdir = /etc/puppet

6 [production]

7 manifest = $confdir/environments/production/manifests/site.pp

8 modulepath = $confdir/environments/production/modules

9 [testing]

10 manifest = $confdir/environments/testing/manifests/site.pp

11 modulepath = $confdir/environments/testing/modules

12 [development]

13 manifest = $confdir/environments/development/manifests/site.pp

14 modulepath = $confdir/environments/development/modules

In this puppet configuration file, three environments are defined(production, test, development), each in their own block. All these environments define where the pup- pet modules are stored and also the directory of the manifest file. The block main de- fines the default environment and basic configuration(in this case server address and configuration files directory).

17

Chapter 4

Project Foreman

Foreman is an open-source project for virtual machines life cycle management. It serves for provisioning on bera-metal and public or private clouds. Foreman is written in Ruby programming language and uses Ruby on Rails framework, for the Foreman web inter- face. For virtual machine configuration management Foreman uses Puppet. Foreman also uses Puppet for collecting reports, facts, monitoring hosts configuration and for report- ing statuses, distributions and trends.For provisioning can Foreman select from select from multiple virtualization services, of which API is used at the Foremans back-end to provide the provisioning of the virtial machine either on bare-metal or to cloud. These services are:

• Libvirt - Platform virtualization management tool. Described in second chapter.

• Ovirt - Open-source web application for platform virtualization based on Lib- virt, but focuses mainly on KVM virtualization infrastructure for the Linux ker- nel.

• EC2 - Cloud computing platform that allows provisioning of virtual machine into private clouds.

• WMware - Bare-metal virtualization platform.

• OpenStack - Open-source web application cloud operating system for control over large portion of compute, storage and networking resources.

• RackSpace - Cloud computing platform that allows provisioning of virtual ma- chine into private clouds.

On the top of this all Foreman provides web front-end and robust Rest API. Foremans’ REST API is used at the practical part of this thesis, to communicate with Foreman and to provide provisioning of virtual machine and their configuration.

19 4. PROJECT FOREMAN 4.1 Foreman architecture

Foreman[13] consists of several services with together provide a virtual machine provi- sioning and management capabilities. Figure 4.1 describes Foreman project architecture and how these services work together.

Figure 4.1: Foreman architecture.[15]

Name Host is in the context of Foreman application is a bit misleading, because it doesn’t actually represents system that is hosting a virtual machine, but is in fact is a virtual machine itself.

4.1.1 Puppet

Foreman cooperates with Puppet that provides virtual machine configuration of desired state. Puppet environments are mapped directly into Foreman. They can be used at var- ious levels through the Foreman interface. Puppet environments are generally used to separate puppet classes for different types of host, which can be developed or tested in one environment (e.g. development, test) before being pushed to another (e.g produc- tion), where it can use different configuration. Puppet environments and also puppet classes can be created in Foreman by two methods. One way is that Foreman itself offers a possibility to create a new environ- ment and also p uppet classes. The other way is that Foreman can detect all the Puppet

20 4. PROJECT FOREMAN environments and Puppet classes contained in the Puppet Master and import them au- tomatically to the system. Created or imported environments and classes can be then assigned to hosts or group of hosts(hostgroup) that will be provisioned by Foreman. As mentioned in Puppet chapter, the purpose of environments is to group desired collec- tion of Puppet classes, from which the user then chooses those, which will be applied on the host or hostgroup. Puppet classes also provide an option of Smart Variables. Smart Variables are a tool that can provide additional logic to Puppet classes a user may wish to apply. They may have multiple values, depending on hierarchical contest or various conditions. An example of using Smart Variables can be a simple change to a single host. Like if all our hosts use server.bar for DNS, but we need that host aneta.domain.com uses server.bar. The example below shows how would this Smart Variable in Foreman look like. Name dns-server Description The target server to talk to Default Value server.foo Type Validator string Validator Constraint Order fqdn hostgroup os domain Match fqdn = aneta.domain.com Value server.bar

Figure 4.2: Smart Variable example

4.1.2 Smart proxy

Smart Proxy[12] is a independent Foreman component which provides multiple ser- vices with REST API to various subsystems. Its main purpouse is to provide API for tools like Foreman, with a higher level of orchestration, so the new subsystems which are desired to be added to the system, and be managed, could be added, or the existing subsystems could be extended. Smart Proxy is placed on host system so its provisioned virtual machines can be managed throughout their life-cycle, commissioning and decommissioning process, and performs necessery services within used subsystem. Services that are supported by Smart Proxies:

21 4. PROJECT FOREMAN

• DHCP - Smart Proxy supports two types of DHCP servers

– ISC DHCP – MS DHCP

• DNS - Smart Proxy supports two types of DNS servers

– Bind – MS DNS

• TFTP - Smart Proxy supports any UNIX based TFTP server

• Puppet - Smart Proxy supports Puppet server 0.24.0 and higher

• Puppet CA - Puppet Certificate Authority responsible for certificate signing, cleaning and autosign Each of these services may exist on separate machine or several of them may be hosted on the same machine. Figure 4.1 represents a scenario, when each of these ser- vices is deployed on different host, and this host is responsible it. After the smart proxy is registered under its host, Foreman automatically detects, which of the services does it provides.

Figure 4.3: Smart proxy distributed services scenario

As each smart proxy instance is capable of managing all the of these services, there is only need for one proxy per host. Figure 4.2 show a use case where all the smart-proxy services are present on one single host.

22 4. PROJECT FOREMAN

Figure 4.4: Smart proxy all in one scenario

Due to smart proxy services, there is a need of at least one smart proxy per one sub- net. Its because each subnet contains its own network address, that need to be handled by DHCP, DNS, Tftp and Puppet. For this reason every additional subnet that is added into system there is a need of smart proxy into that subnet. The latest stable source code of the Smart Proxy project is availeble on GitHub https://github.com/theforeman/smart-proxy

4.1.3 Foreman Rest API

Foreman is dispatched with RESTful API which provides capability to use Foreman as a back-end service. Foreman API uses JSON format for communication. This format represents an object with its attributes. Figure 4.5 show an example of a JSON input used for creating an Hostgroup object.

23 4. PROJECT FOREMAN

1 {"hostgroup" => { "name" => "example_hostgroup_1",

2 "domain_id" => 1,

3 "subnet_id" => 4,

4 "environment_id" => 2,

5 "operatingsystem_id" => 3,

6 "architecture_id" => 2,

7 "medium_id" => 1,

8 "ptable_id" => 2,

9 "root_pass" => "scholar"}}

Figure 4.5: Foreman REST API example of a JSON input, for creating a Hostgroup object

Because of APIs’ RESTfulness, there is a possibility to use CRUD methods on almost all Foreman object, together with HTTP method and correct set of attributes. Figure 4.6 shows an example of calls that is provided by Foreman API for Hostgroup object.

HTTP method API call Description GET /api/hostgroups List all hostgroups. GET /api/hostgroups/:id Show a hostgroup. POST /api/hostgroups Create an hostgroup. PUT /api/hostgroups/:id Update an hostgroup. DELETE /api/hostgroups/:id Delete an hostgroup.

Figure 4.6: Available REST API calls for Foreman Hostgroup object

The API also has its return values which contain the return HTTP code, together with set of attributes that where provided with the API call and are required for by the call, together with those which weren’t provided and are optional, so their value is filled by system or are blank. As mentioned before, the API call attributes are in JSON format and so are the return attributes of the call. Figure 4.7 shows an example of API call, together with HTTP method, input attributes in JSON format, return code and returned values, also in JSON format.

24 4. PROJECT FOREMAN

1 POST /api/architectures

2 {

3 "architecture": {

4 "name": "x86_64"

5 }

6 }

7 200

8 {

9 "architecture": {

10 "name": "x86_64",

11 "id": 1,

12 "updated_at": "2013-04-13T23:24:43Z",

13 "operatingsystem_ids": [],

14 "created_at": "2013-04-13T23:24:43Z"

15 }

16 }

Figure 4.7: Foreman REST API call example together with returned values

There are various Foreman objects that are included in Foreman API. The complete list of all these object is listed below.

1. Architectures 10. Fact values 20. Puppetclasses

2. Audits 11. Home 21. Reports 12. Hostgroups 3. Auth source ldaps 22. Roles 4. Bookmarks 13. Hosts 23. Settings 5. Common parame- 14. Images 24. Smart proxies ters 15. Lookup keys 25. Subnets 6. Compute resource 16. Media 26. Template kinds 7. Dashboard 17. Models

8. Domain 18. Operating systems 27. Usergroups

9. Environment 19. Partition tables 28. Users

The Foreman API documentation can be found at http://theforeman.org/api/ apidoc.html, together with list of all available API calls for each Foreman object. Ob-

25 4. PROJECT FOREMAN

ject that are mandatory to provision a virtual machine will be briefly described in the next subsection, together with their relations.

4.2 Provisioning

To be able to provision a virtual host with Foreman there are several premises that needs to be taken care of. First there is a need of created Hostgroup which is a kind of template that could be deployed on a large variety of systems which provide the capability of provisioning a virtual machine. The Hostgroup takes several information, that needs to be defined upon creation. These are:

• Environment - defines which puppet modules should be taken into account the when the user is choosing from available puppet classes.

• Puppet CA - points on the smart proxy that has running Puppet Certificate Au- thority server. Every agent node in the dedicated network must set the CA server hostname in its puppet.conf file.

• Puppet Master - points on the smart proxy that has running Puppet Master server. Also Puppet Master hostname address is need to be located in the pup- pet.conf file.

• Puppet Classes - defines which puppet classes that should be installed on the provisioned machine. Due to the Smart variables these classes can be configured, according to desired or needed configuration of the provisioned host or group of hosts. The puppet classes from which can be chosen depend on the environment that the host or hostgroup is defined in, and according to that environment , puppet modules that are registered in the same or in general environment, will be available.

• Network - defines network attributes of the virtual machine.

– Domain - defines the domain in which the host will take place and sets the hosts hostname (e.g. katello-scholar.org) – Subnet - defines the subnet, from which IP address pool will the smart proxy DHCP server choose the IP addresses for provisioned hosts (e.g. 192.168.100.0/24, represents the network with 255 available IP addresses).

26 4. PROJECT FOREMAN

• Operating System - complexly defines type of operating system, path to its loca- tion and its installation process, therefore it consists of numerous attributes that need to be set. Those are:

– Architecture - defines the architecture of the operating system. – Installation Media - defines the path to the medium, can be a URL or a valid NFS server (e.g. http://download.eng.scholar.com/released/RHEL/$version/os/$arch ). The example URL path demonstrates that there is also an opportunity to put the variables to the path, that will be specified by the operating sys- tem attributes(architecture, version - major and minor, and by Solaris and Debian installation media may also be use $release). – Partition Table - defines the how the virtual host disk drive layout should be divided into partitions. Foreman offers set of predefined partition tables from which can be chosen. – Provisioning Templates - defines the installation phases(pre, post) of the provisioned virtual machine to deploy correct operating system with cor- rect options. There is various types of these templates. Those are: ∗ PXELinux - deployed to the TFTP server to ensure the virtual machine installation boots the correct installer with the correct kernel options ∗ Provision - Kickstart or Preseed file template that takes care of the whole unattended installation ∗ Finish - after the main provisioning process if completed there is an option of post-install script to be used, that defines custom actions upon provisioned virtual machine. ∗ Script - an arbitrary script, not used by default, useful for certain cus- tom tasks ∗ gPXE - used in gPXE and iPXE environments in place of PXELinux The provisioning templates are being associated with the operating system object, so Foreman has pre-created provisioning templates for more com- mon operating systems. – Root Password - an optional attribute which sets the root password for host of group of hosts virtual machines. If root password is not defined Foreman uses a default password that is pre-set in Foreman settings.

27 4. PROJECT FOREMAN

The existence of hostgroup is not mandatory, however the existence of all compo- nents that a hostgroup should contain mandatory is. So if hostgroup is not defined all its partial parts have to be defined before host creation process and used upon it. Be- sides components that could be defined by a hostgroup or upon host creation process, there is group of host attributes that are mandatory for creating a host. Those are:

• Compute Resource - defines the type of virtualization provider on the deployed machine and the URL of the machine together with port (e.g. qemu://master.katello- scholar.org:16509/system).

• IP address - defines the IP address of the host within defined subnet. IP address is automatically generated by the smart proxy DHCP server based on the subnet that the host suppose to be provisioned, and from the subnet pool an IP address is taken.

• MAC address - defines the MAC address of the host. MAC address is automat- ically generated by Foreman.

• Compute Attributes - defines the hardware attributes of the virtual machine. Those are:

– Network interface - defines the virtual machine network interface, to which the generated IP and MAC address will be assigned. – Number of CPUs - defines numbers of CPUs reserved for the virtual ma- chine. – Amount of memory - defines the amount of memory reserved for the vir- tual machine. – Disk capacity - defines the amount of disk capacity reserved for he virtual machine.

• Ownership - defines the the ownership of the host virtual machine to an user and with it access to all the CRUD methods that could be used over the virtual machine by defined owner.

There is more attributes that could be defined upon creation of virtual machine, but those are not mandatory, like those that had been listed and briefly described above.

28 Chapter 5

Project Katello

Project Katello is an open-source project developed by Red Hat company. The idea be- hind this project is to create an unified tool for system administrators that is able to control and manage big amount of virtual machines, together with their software con- tent, repository management and subscription management. Project Katello has dispose of web interface written in Ruby language, using the Ruby on Rails framework. Except the web interface, Katello also offers a command line interface written in Python. This features make Project Katello. Katello uses on its back-end three applications that serves as Katello component services. Those are:

• Pulp

• Candlepin

• Foreman

The figure 5.1. shows dependency between Katello and all its features and compo- nents. Katello communicates with its his components through REST API, which all of them contain. Each of these components is briefly described in following section. The development repository can be found on GitHub under https://github. com/Katello/katello

5.1 Katello components

This section briefly describes the Project Katello components Pulp and Candlepin. The Foreman component was described in its standalone section 4.

29 5. PROJECT KATELLO

Figure 5.1: Project Katello features and dependencies.[7]

5.1.1 Pulp

Pulp is an open source project community project supported by Red Hat company. Pulp is a platform for repository content management, such as software packages, errata, and distributions, that can provide this content to a large number of consumers. Pulp collects software packages from repositories, organizes them into custom con- tent repositories and distributes that content to desired systems at custom destinations. One of the features is that Pulp is capable to replicate software repositories, from variety of supported sources to local or remote repositories. Supported sources are:

• HTTP/HTTPS

• file system

• ISO

• RHN - Red Hat Network

Pulp dispose with well documented REST API and command line interface for man- agement. It is written in Python programming language. The following figure briefly shows a diagram how Pulp works. Pulp supports variaty of platforms, on which it could be deployed. All of them are RPM-based Linux distributions.

• Fedora 17, 18, 19

• RHEL 5, 6

30 5. PROJECT KATELLO

Figure 5.2: Pulp diagram.[3]

RPM Support for Pulp allows to create and publish repositories of RPM packages (in- cluding RPM, SRPM, DRPM, errata, distributions, etc.). The source code of Pulp project can be found on GitHub under https://github.com/pulp/pulp. There is also a Ruby binding for Pulp in a form of Gem available at http://rubygems.org/gems/ pulp.

5.1.2 Candlepin

Candlepin is a service which allows consumers to manage their software subscriptions. It tracks consumers software products to which he is subscribed to, and allows him to consume this subscribtion based on set of configurable rules. It also gives customers an possibility to manage their product in a disconnected on premise solution. Candlepin also consists of two subprojects:

• Headpin - is an open source front-end for Candlepin, written in Ruby on Rails framework.

• Thumbslug - is an open source proxy for Candlepin, written in client-server Java framework called Netty.

Candlepin can be enhanced with Headpins web interface and a proxy which allows local content requests to be made against package delivery tools. Figure 5.3. shows a diagram of how candlepin works together with Headpin and Thumbslug.

31 5. PROJECT KATELLO

Candlepin also dispose with REST API, that gives an opportunity to third-party ven- dors to create their own tailored software for their own subcribtion purposes.

Software vendor environment Package delivary tools subscription Candlepin manager Client Product and Information Content Order Data Identity and Request Entitlement Certificate Candlepin Data Thumbslug

User request Headpin

Client environment

Figure 5.3: Candlepin diagram.

Hosted Candlepin can provide data to a remote Candlepin installation. This allows larger customers to manage their subscriptions in a secure fashion but within their own networks. The design goal is to allow Candlepin instances to make this data available to consumers which are interested for them or demand them. So, if a central group manages purchasing for a large company, they then should be able to download the subscription data from the software vendor environment and send some of it to different departments who can then manage their own entitlements. Candlepin client(consumer) lifecycle[2]:

• Clients register with Candlepin. They are given identity certificates which con- tain their UUID. This identity certificate can be used for future communication.

• Clients can search for pools of subscriptions.

• Clients consume a subscriptions. This is also called binding to a subscription or creating an entitlement (the right to use a product). This results in entitlement certificates being provided to the client.

• Clients can retrieve updated certificates to handle cases where data has changed server side.

• Clients can unbind, or stop consuming certificates.

• Clients can unregister, or delete themselves from the system.

32 5. PROJECT KATELLO 5.2 System registration and basic managemet

This section briefly describes how to register client systems into Katello infrastructure.

Before the system registration there needs to be an appropriate organization with environments created, to which can the user be associated. There are two ways how to register system into Katello. The first makes usage of the user credential that are entered into the subscription manager. The second way is to use the activation key, which enables the user to register his system into Katello without cre- dentials. This activation key is generated in Katello, prior to registration. After system registration Katello provides several features:

• Repository management

– Infrastructure framework that Katello dispose with, to associate needed group of repositories to desired system. – Repositories are grouped base of the product to which they belong to. – Is provided by Pulp component.

• System errata

– Provides the management of security and bug fixes, as well as new releases (new software versions, enhancement, etc.). – Each system has its own set of associated erratas. – Is provided by Pulp.

• System facts and trends

– A database of individual system characteristics. – Flexible tool to provide real time data for system administrators. – Is provided by Foreman.

This are only few of major features of Katello project.

33

Chapter 6

Katello Scholar

This chapter describes the practical outcome of the this thesis, its structure and usage.

The practical output of this thesis is an application named Katello Scholar. Its main pur- pose is to automate unattended virtual machine provisioning on computers in school laboratories for education purposes which had been discussed in section 1.2. The provi- sioned virtual machines should have desired operating system and also software state that will suite the needs of the given course lesson.

6.1 Project Specification

6.1.1 Used Technologies

Katello Scholar is an open source project started by me as a part of my master thesis. It is written in Ruby language using Ruby on Rails framework. Katello Scholar is using API of both Foreman and Katello projects at its back-end, in a form of Ruby Gem wrappers. Both project API in a form of Gem are available at:

• http://rubygems.org/gems/foreman_api/

• http://rubygems.org/gems/katello_api/

Katello Scholar is using Ruby in version 1.9.2 and framework Ruby on Rails in ver- sion 3.2.9. Because of different versions of Ruby between the projects I have used for development of Katello Schollar Ruby Version Manager(RVM), to easily manage this differences. For the relation database I choosed MySQL database, with Ruby Gem wrapper mysql2, which is available at http://rubygems.org/gems/mysql2. To be able to create virtual machines on desired system host with defined software collection, there is need of running instance of Foreman project service on the deployed

35 6. KATELLO SCHOLAR network. Because project Katello consists and depends on his three components(Pulp, Candlepin, Foreman), its installation comes also with installation and configuration of these components, so it’s necessary only to install Katello. For installation of project Katello there is a need of Linux-based operating system to have on bare-metal machine or even virtualized. Supported distributions during writing of this thesis are:

• Fedora 16, 17, 18

• RHEL 6, RHEL 6 Server

Detailed instruction manual for the the installation is available on fedora community domain at:

• https://fedorahosted.org/katello/wiki/AdvancedInstallation

Because project Katello is still under development and so installation instructions may be change in the future, its good to stay in touch. Also in case of founded bug, it can be reported and even contribute and fix the bug, because of open-source nature of this project. If needed, project Foreman can be installed as a stand alone application. The instruc- tion manual is available at:

• http://theforeman.org/manuals/1.1/index.html

The manual covers everything from Foreman installation, its configuration to virtual machine provisioning and management. Both this manuals provide information about set of repositories to system host, nec- essary for projects installation.

6.1.2 Project configuration

To be able to create virtual machines on desired system host with defined software col- lection, there is need of running Foreman service on the deployed network. The Katello Scholar needs also to be running on the deployed network, but there are two scenarios how this applications could be deployed so they can communicate together.

• All the necessary the application will be running on the same server.

• All the applications will be located on separate servers.

36 6. KATELLO SCHOLAR

This is possible due to nature of the Katello and Foreman API, which besides the demanded API call takes also the address of the server, where the service runs. This address needs to be defined in the configuration file katello-scholar.yaml located in the config/ folder in the root path of the project. In the configuration file there also needs to be set admin name and password for the access to the Foreman API calls. To deploy Katello Scholar there is need to download a clone of my GitHub repos- itory. After downloading the repository, make sure that appropriate version of Ruby is running on the server machine. After that there is need to install needed dependen- cies. Those are Ruby Gems that will be automatically downloaded and installed after running bundle install command in the Katello Scholar root folder. After installation of the dependencies, there needs to be run a script set foreman.sh which will prepare Foreman for Katello Scholar usage. Then the application has to be configured. The configuration file is located at /Katello-Scholar/config/katello-scholar.yaml. Below is an example of katello-scholar.yaml configuration file with all the attributes that needs to be set. In this example the Katello and Foreman services are running on the same server as Katello-Scholar.

1

2 app_config:

3 app_mode: development

4 app_name: katello-scholar

5 domain: katello-scholar.org

6 url_prefix: /katello-scholar

7 host: 127.0.0.1

8 port: 3000

9 use_ssl: false

10

11 foreman:

12 url: https://localhost/foreman

13 admin_name:

14 admin_password:

15

16 katello:

17 url: https://localhost/katello

18 admin_name:

19 admin_password:

20

21 scholar:

37 6. KATELLO SCHOLAR

22 semester_dates:

23 spring:

24 start: 18-2-2013

25 end: 17-5-2013

26 winter:

27 start: 17-9-2012

28 end: 21-12-2012

29

30 provisioning_start: 30

To be able provision virtual machines on a system hosts, the host itself has to run with Libvirt service with active daemon. Katello Scholar comes with set of scripts that sets the system host environment, so after running them on the host is able to create, run and manage virtual machines. These script do:

• Set hostname and IP address of the host into his /etc/hosts file

• Install necessary libvirt virtualization packages

– libvirt – kvm – foreman-proxy-installer – foreman-libvirt

• Configure Libvirt dependencies on the host

– defines virtual network that will be used for communication between Fore- man and host. – defines virtual storage pool type and capacity. – defines virtual interface together and assigned an IP address to it. – foreman-libvirt

• Configure host firewall so specific ports can be forwarded.

– 80, 433 HTTP and SSH ports – 53, 953 DNS ports – 67, 7911 DHCP ports – 69 TFPT port

38 6. KATELLO SCHOLAR

– 9090 smart proxy communication port – 16509 libvirt communication port

• Configure smart proxy on a host.

If all the scripts are applied on the host, it will have capability to work as a smart proxy with all its features (DNS, DHCP, TFTP, Puppet) in the network. In case of host without need of smart proxy, only some of this scripts are required to run. Both of these two scenarios are covered in under one script. The script takes two parameters. First tell whether the host machine will serve as proxy or just as a client. The second one will set the desired IP address on the machine. In case the smart proxy, the second parameter is not necessary due to the need of only one smart proxy needed in the network, and so its IP address is chosen as 192.168.100.1.

Examples below show the usage of this script with appropriate comment.

1 ./client_proxy.sh client 2

2 # Sets up a client with IP address 192.168.100.2

3

4 ./client_proxy.sh proxy

5 # Sets up a client with smart proxy on IP address 192.168.100.1

All the mentioned scripts are located in /Katello-Scholar/scripts/ folder. Right after the application is deployed and the configuration file katello-scholar.yml is configured, there is need of running the three task. Those are:

• rake db:create - creates database

• rake db:migrate - migrates database tables

• rake db:seed - prepares the database with canned data

The canned data are defined at /Katello-Scholar/db/seed.rb file. There is also need to create a cron job that will run at the beginning of each days a rake task named daily provisioning jobs.rake, which will collect all the lessons which will take place on that day and puts them in a queue of Gem called Delayed Jobs, which will start the lesson system guest provisioning N minutes before the lesson begins in the background of the application. This N variable be chosen upon the hardware parame- ters and set in the katello-scholar.yaml file. In the background Katello Scholar gathers the information about the desired virtual machines profile and via Foreman and Katello

39 6. KATELLO SCHOLAR

API will arrange that the Foreman will create the virtual machines and runs upon them Puppet which will manage their state and software collection, register those machines into Katello and subscribes them to appropriate repositories. For running provisioning in the background Delayed Job daemon that is located in /Katello-Scholar/scripts/ must be started. To start the daemon run the script by typing script/delayed job start and scrip- t/delayed job stop to stop it. The source code of the project, together with all the scripts is available at my GitHub page http://github.com/jhadvig/Katello-Scholar.

6.2 Structure of project

This section describes models which where created for this project, how they interact and their purpose. The application itself is designed to hold information about models and functionality of the project. Figure 6.1 show an entite relationship between models of the application.

Role

Course Puppet Puppet Class Class User Group

Operating Seminar Template Architecture system

Lab Repository Repository Group Lesson

System Host

System Guest

Figure 6.1: Katello Scholar entity-relationship model

40 6. KATELLO SCHOLAR

6.2.1 Project workflow

This and next subsections describes theoretically step that are need to be taken so the application is capable of been used during semester for providing courses lessons with virtual machines.

6.2.1.1 Preparation of application environment

As mentioned at the end of last section, the seed.rb script that has to be runned before starting the application for the first time. This first initial run will create object that are vital for the application run. Those are:

• Roles • Subnet

• Admin User • Architectures

• Domain • Smart proxy

As first the roles for users are created. There are the basic roles: • Admin

• Lector

• Student Each of this roles has different permissions. Admin user can execute all the CRUD ac- tions on any object whit which GUI manipulates, except those created be the run of the seed.rb script. After creation of roles the admin user is define, so they can prepare ap- plication environment lectors and their students and add their account based on their credentials. Next up a domain and subnet is created that defines the network of all com- pute resources that will be registered in Katello Scholar. Then two types of architectures are defined: • i386

• x86 64 At the end of the script a smart proxy is created to provide necessary DNS, DHCP, TFTP, Puppet services on the deployed network. Before using the application during semester, it is mandatory for admin user to reg- ister certain objects into the system so the lector users are capable to work with them and apply them on other objects. Those are:

41 6. KATELLO SCHOLAR

• Courses • Templates (optional)

• • Laboratories Operating Systems (optional) • Puppet Classes (optional) • System Hosts • Repositories Group and Reposito- • Proxy ries (optional)

Courses represent real life entity of an school course. It consists of course name and code that uniquely identifies the course against other courses. Admin has to create this courses and add lectors that will have access to it. Next, admin has to registed labo- ratories into the application. Laboratories are also defined by their name and network IP address, by which they are uniquely identified. Laboratories also take an attribute of network mask to determine the IP address range. After the laboratories are regis- tered admin can register system host which represent real-life compute resources (e.g. desktop computers) with their name, IP and MAC address. The smart proxy which is available on the network, admin has to add environments folders into the puppet. They should be located in /etc/puppet/environments/ and should should be named after the course code. Then administrator has to load puppet modules into /etc/puppet/modules/, which will ensure that all the environments (courses) will have access to all puppet modules. If there is need to add puppet module just to one specific environment, these modules have tobe load into that specific environment. The logic of the puppet modules is explained in section 3.3. Next up operating systems are needed to be registered into the system. For creation of an operating system admin has to put in his name, version(e.g. RHEL 6.3), architec- ture and full path to the medium, which can be an URL or a valid NTS server. At last, repositories group whit appropriate repositories need to be registered into the system by their name and and their path.

6.2.1.2 Application usage

After admin prepares application environment for usage, lector users can start their lesson management. For that he would be able also to create some application objects which he will use afterwards. Those are:

• Seminar groups • Templates

42 6. KATELLO SCHOLAR

• Lessons • Operating Systems

• Repositories Group and Reposito- • System Guest ries

Every lector can create an seminar group in desired course, where he defines lab- oratorie, day and time of the week when does the seminar group take place. He also register set of students which will attend his seminar group. Lector can also create les- son templates inside the course which will take place when defining each lessons of seminar group. These templates consists of the three dependencies.

• Operating System - defines the operating system that will be installed on the provisioned virtual machine

• Puppet classes - defines the state of the virtual machine and what software col- lection should be installed after provisioning

• Repositories Group - defines set of repositories which will be assosiated with given template (and then with lesson)

Temlates define how should the curriculum of seminar groups lessons look like. In them lector can prepare the desired lesson environment to suite the lesson purposes. After at least one template is created, lector can create each lesson where he picks one of the available course templates and confirms the time and the laboratories where does the lesson take place. The confirmation of time and place serves for cases when the lesson has to take time and place in other laboratories and other day and time than it uses to be. When the lesson is created automatically without day editation, its date when the lesson take place will be automatically assigned to it, based on lesson number. The Lesson can be also cloned in case no special editation is needed to the last one. When the lesson is created, for each system host that is present in the selected les- son laboratory, there will be created a system guest object which represents and virtual machine. This systemy guest is not actually a virtual machine. It is just a dummy object, which takes the necessary parameters of the virtual machine which it should represent. The provisioning itself begins N minutes before the lesson start. The time suppose to variable and could be easily adjusted in config.yml under provisioning start attribute. During the provisioning the Foreman will take care installing and settin the Puppet in the virtual machine, and after the installation it will take care of running Puppet with modules defined for current lesson. After all virtual machines are provisioned and

43 6. KATELLO SCHOLAR configured, their IP address will be mailed to the appropriate student together with his generated password for a root access. At the end of the lesson all the virtual machines will destroy them self.

6.2.2 Model Mapping

Because Katello Scholar uses API of Katello and Foreman project it obviously map its own models on those used in Katello and Foreman. This subsection describes how are the models in Katello Scholar mapped on those in Katello and Foreman. Foreman and Katello models name are capitalized in this subsections.

Courses are mapped into Foreman as Environments. Environments are named after the course code, which should ensure the Environment uniqueness and simplicity. Sem- inars and templates models are not reflected into the Foreman, they just serve to gather necessary data and prepare them for the lesson model.

Lesson model represents a Hostgroup model which is described in section4.2. Katello Scholar architecture model is mapped into Foreman as Architecture model. Katello Scholars operating system model is mapped into Foreman as Installation Medium object, that after creation will serve to create Operating System object to which is automatically associated basic Partition table and Provisioning table for Red Hat op- erating system family. Puppet groups and classes are imported from Foreman, in which they are named similarly, except the puppet groups which are created just for the use of Katello Scholar to gather puppet classes of the same puppet module at the smart proxy. Repository groups are mapped into Katello as Products, and their repositories as Katello Repositories. Template object carry information about operating system, puppet classes and repositories but are not directly mapped into Foreman. In Katello the templates are represented as Activation Keys. Labs are an abstract group that associate system host models which are placed in the same and are mapped in Foreman as Compute Resource model. To Katello are labs are mapped as System Groups model. By the time lesson is created, it should contain all the necessary data about lab and its system hosts and template and its operating system, puppet classes and repositories. When the system guest in desired lab are provisioned the are registered into Katello as a System Group to which then will be associated activation key with appropriate repository groups and their repositories. After that, when the repositories are available

44 6. KATELLO SCHOLAR to the provisioned system guest, Puppet will ensure that they will be configured based on the chosen puppet classes.

6.3 Present application limitations

Because of the development version of this application, a lot of its functionality is re- duced to its minimum functionality.

All the compute resources should be divided based on the labs in which they are. Each of this lab should represent a virtual network. In the current version of the appli- cation all the compute resources are on the same virtual network and with one smart proxy.

Puppet classes have to be loaded into the smart proxy manually and they can’t be edited by the lectors, in other way, then just edit them at the smart proxy server.

Repository management is not fully implemented due to agile form of development, problems with Katello API.

45

Chapter 7

Conclusion

This chapter concludes and discusses the reached goals of the thesis.

7.1 The theoretical part

Throughout the text part, I tried to focus on virtualization, unattended virtual machine provisioning, its configuration and repository management. Several tools and projects that deal with these topics were presented.

7.2 The practical part

The practical outcome of this thesis is Katello Scholar, a web application designed for providing fully configured virtual machines and their management for lectors and their students for their laboratory IT courses in academic environment. Katello Scholar meets these goals:

• Katello Scholar is made using open source software.

• Katello Scholar provides use case logic that can be deployed in almost any tech- nically based educational environment.

• Katello Scholar provides user interface for preparing course templates of operat- ing systems, their configuration and software collections management for each lesson.

• Katello Scholar is able to provision operating systems from Red Hat operating system family.

• Katello Scholar is able to provision virtual machines on a desired set of compute resources, together with requested software.

47 7. CONCLUSION 7.3 Personal experiences

This work has given me a great deal of experience in fields of virtualization, program- ming and software design. My Linux system administration and installation has also made a big progress and I gained important knowledge of open source software development.

7.4 Future development plans

The next step in development is to keep contributing to the Katello and Foreman projects, their unfinished API, so that Katello scholar can be further developed. Even though all its features are not implemented, I decided to make Katello Scholar an open source project on my GitHub, and release it under GNU/GPL license. This decision will hope- fully create a small community of people passionate for open source software develop- ment to contribute to this project and also to Katello and Foreman projects to extend their functionality and help with development. As next step in development of the application I would like to concentrate on these unfinished aspects of Katello Scholar project:

• Finish with implementation of repository synchronization handled by Katello API.

• Write more tests and test cases for the project.

• Write documentation and user manual to the project GitHub page.

• Help with Katello and Foreman projects and their API.

• Implement Ovirt based virtualization as a virtualization provider.

• Add more functionality into the project.

• Get feedback on the GUI functionality and make it even more transparent and user friendly.

I hope that thanks to the time and effort invested in this project, it will find its place in educational environment and that lectors and their students will benefit from it and also get involved in its development.

48 Chapter 8

Terminology

• Provisioning - process of creating a new guest virtual machine upon physical host machine.

• Orchestration - process of coordination and information exchange processes by using web services.

• QEMU - open-source hardware virtualization software.

• KVM - Kernel-based Virtual Machine - virtualization infrastructure for the Linux kernel.

• NFS - Network File System - distributed file system protocol

• LVM - Logical Volume Management - system of managing logical volumes, or filesystems.

• iSCSI - Internet Small Computer System Interface. An Internet Protocol (IP)- based storage networking standard for linking data storage facilities.

• Unix-like systems - operating system that behaves in a similar manner like a Unix system (eg. Debian, Fedora, Mandriva, RHEL, MacOS X, etc.).

• NTP - Network Time Protocol is a networking protocol for clock synchroniza- tion between computer systems over packet-switched, variable-latency data net- works.

• Life-cycle - in the term of virtual machine that describes period from initial re- quest and VM provisioning, setting and enforcing configurations, till it reaches it’s expiration and shuts down.

• Gem - standard format for distibuting Ruby programs and libraries.

49 8. TERMINOLOGY

• JSON - text-based standard for data structures that represents an object.

• RHEL - Red Hat Enterprise Linux. An operating system developed by Red Hat company targets on commercial market.

• RHN - Red Hat Network. A family of systems management services for RHEL and RHL to make system update, patches and fixes available to consumers to subscribe. Its operated by Red Hat company.

• RPM - Red Hat Package Manager. Package management system.

• PXELinux - serves for booting Linux off a network server.

• Rake - Rake is a Ruby Make, a standalone Ruby utility that replaces the Unix utility ‘make’, and uses a ‘Rakefile’ and .rake files to build up a list of tasks.

• RestAPI - style of software architecture for distributed systems that defines a set of functions to which can be perform requests and receive responses via HTTP protocol.

• GitHub - web-based hosting service for software development projects that use the Git revision control system

• GUI - Graphic User Interface.

50 Appendix A

Application source code

Because the application is still in development and without any stable release version, its source code, together will all the scripts is available at my GitHub page:

http://github.com/jhadvig/Katello-Scholar

51

Bibliography

[1] Dan Bode and Nan Liu. Puppet Types and Providers. O’Reilly Media, December 2012.

[2] Candlepin Community. Candlepin - Software subscription manager [computer program, online]. https://fedorahosted.org/candlepin/wiki/.

[3] Pulp Community. Pulp - Content management system [computer program, on- line]. http://www.pulpproject.org/.

[4] Libvirt Documentation. Major libvirt functionality. http://wiki.libvirt. org/page/FAQ#What_is_some_of_the_major_functionality_ provided_by_libvirt.3F.

[5] Puppet Labs Inc. Puppet documentation. http://docs.puppetlabs.com.

[6] Puppet Labs Inc. Puppet - Automating Configuration Management [computer program, online]. http://github.com/puppetlabs/puppet, 2005. [quoted 2013-05-21].

[7] Red Hat Inc. Katello - Systems life cycle management tool [computer program, online]. http://www.katello.org/.

[8] Red Hat Inc. What is Virtualization?, Technical report. http://www.redhat. com/f/pdf/virtualization/gunner_virtual_paper2.pdf.

[9] M. Tim Jones. Virtual linux. http://www.ibm.com/developerworks/linux/ library/l-linuxvirt/index.html, 2006.

[10] Bohuslav Kabrda. Virtual Labs Module for MEDUSY. Masaryk University, Faculty of Informatics, 2011. http://is.muni.cz/th/207673/fi_m/main.pdf.

[11] Dan Kusnetzky. Virtualization: A Managers Guide. O’Reilly Media, Gravenstein Highway North, Sebastopol, CA 95472, 2011.

53 BIBLIOGRAPHY

[12] Ohad Levy. Smart Proxy [computer program, online]. http://github.com/ theforeman/smart-proxy, 2011. [quoted 2013-05-21].

[13] Ohad Levy and Paul Kelly. The Foreman Version 1.1 [computer program, online]. http://github.com/theforeman/foreman, 2009. [quoted 2013-05-21].

[14] James Loope. Managing Infrastructure with Puppet. O’Reilly Media, June 2011.

[15] Foreman Manual. Foreman manual. http://theforeman.org/manuals/1. 1/index.html.

[16] J. Russell and R. Cohn. Libvirt. Book on Demand, 2012. http://books.google. cz/books?id=vCagMQEACAAJ.

[17] Prashant Shenoy. Types of virtualization, 2006. lass.cs.umass.edu/˜shenoy/ courses/spring07/lectures/Lec05.pdf.

[18] James Turnbull. Pulling Strings with Puppet. Springer-Verlag New York Incorpo- rated, May 2008.

[19] Inc. VMWare. Understanding Full Virtualization, Paravirtualization, and Hard- ware Assist. Technical report, 2007.

[20] Chris Wolf and Erick M. Halte. Virtualization: From the Desktop to the Enterprise. Apress, 2005.

54