Iaas Documentation Release 0.1.0

iaas Documentation Release 0.1.0

NorCAMS

Sep 14, 2021

CONTENTS

1 Contents 3 1.1 Getting started...... 3 1.1.1 For end users of NREC...... 3 1.1.2 For NREC developer and operators...... 3 1.1.3 For team and project status information...... 3 1.1.4 Customers and local administrators...... 3 1.2 Design...... 3 1.2.1 Locations...... 4 1.2.2 Physical hardware...... 4 1.2.3 Networking overview...... 5 1.2.4 Virtual machines...... 7 1.2.5 Development hardware requirements...... 9 1.2.6 Node overview...... 10 1.3 Security...... 11 1.3.1 [2021] System documentation...... 11 1.3.2 [2021] Management...... 12 1.3.3 [2021] Secure communication...... 17 1.3.4 [2021] API endpoints...... 19 1.3.5 [2021] Identity...... 21 1.3.6 [2021] Dashboard (FIXME)...... 24 1.3.7 [2021] Compute...... 29 1.3.8 [2021] Block Storage...... 32 1.3.9 [2021] Image Storage...... 33 1.3.10 [2021] Shared File Systems...... 34 1.3.11 [2019] Networking...... 35 1.3.12 [2019] Object Storage...... 37 1.3.13 Message queuing...... 38 1.3.14 [2019] Data processing...... 39 1.3.15 Databases...... 40 1.3.16 Tenant data privacy...... 42 1.3.17 [2019] Instance security management...... 46 1.4 Howtos and guides...... 47 1.4.1 Build docs locally using Sphinx...... 47 1.4.2 Git in the real world...... 48 1.4.3 Install KVM on CentOS 7 from minimal install...... 48 1.4.4 Conﬁgure a Dell S55 FTOS switch from scratch...... 49 1.4.5 Install cumulus linux on ONIE enabled Dell S4810...... 50 1.4.6 Create Cumulus VX vagrant boxes for himlar dev...... 51 1.4.7 Routed, virtual network interfaces for guest VMs on controllers...... 52 1.4.8 Conﬁgure iDRAC-settings on Dell 13g servers with USB stick...... 54

i 1.4.9 Using vncviewer to access the console...... 55 1.4.10 Building puppet-agent for PPC-based Cumulus Linux...... 55 1.4.11 How to create the designate-dashboard RPM package...... 55 1.5 Team operations...... 56 1.5.1 Getting started...... 56 1.5.2 Development...... 60 1.5.3 Operations...... 94 1.5.4 Installation...... 145 1.5.5 Tips and tricks...... 156 1.6 Status...... 168 1.6.1 Teamdeltakere og roller...... 168 1.6.2 Vakt...... 169 1.6.3 Navnekonvensjon...... 170 1.6.4 Kontaktpunkter...... 171 1.6.5 List historie...... 173 1.6.6 Rapporter og referat...... 174 1.6.7 Aktiviteter...... 261 1.6.8 Arkiv...... 285 1.6.9 Hardware overview...... 302 1.7 Kundeinformasjon...... 304 1.7.1 Priser og ﬂavors...... 304 1.7.2 Prosjektinformasjon...... 307

ii iaas Documentation, Release 0.1.0

This documentation are intended for the team working on UH-IaaS, and people involved in the project. End user documentation are found at http://docs.uh-iaas.no In addition to information about development and operations, we also have a section in Norwegian about the current status.

CONTENTS 1 iaas Documentation, Release 0.1.0

2 CONTENTS CHAPTER ONE

CONTENTS

1.1 Getting started

1.1.1 For end users of NREC

• Documentation at https://docs.nrec.no • Status at https://status.nrec.no

1.1.2 For NREC developer and operators

• Team operations

1.1.3 For team and project status information

This will be in Norwegian only. • Status

1.1.4 Customers and local administrators

This will be in Norwegian only. • Kundeinformasjon

1.2 Design

High-level documents describing the IaaS platform design

3 iaas Documentation, Release 0.1.0

1.2.1 Locations

UH-IaaS is located in Bergen, as the OpenStack region BGO, and in Oslo, as the OpenStack region OSL. For the most part, the services in the two regions are logically identical.

1.2.2 Physical hardware

The following illustration show the physical hardware, in broad terms. The number of compute hosts and storage hosts is horizontally scalable and will vary from region to region.

4 Chapter 1. Contents iaas Documentation, Release 0.1.0

The illustration shows these types of physical components: Management switch Network ethernet switch used for internal networking, i.e. non-routed RFC1918 addresses. These are only used for management tasks. Public switch A switch that has access to Internet. These switches also perform layer 3 routing, and are used to provide access to the public services in UH-IaaS. Controller hosts Servers that are running virtual machines manually with libvirt (i.e. not managed by OpenStack). All OpenStack components such as the dashboard, API services etc. are running as virtual machines on these hosts. Compute hosts Servers that are used as compute hosts in OpenStack. Customer’s virtual machines are running on these servers. Storage hosts Servers that are part of the Ceph storage cluster. They provide storage services to OpenStack (e.g. storage volumes).

1.2.3 Networking overview

Physical networking connections of each site: BGO

1.2. Design 5 iaas Documentation, Release 0.1.0

OSL

6 Chapter 1. Contents iaas Documentation, Release 0.1.0

1.2.4 Virtual machines

The illustration below shows the various virtual machines running on the controller hosts.

Some of the virtual machines have a purely administrative purpose, some provide internal infrastructure services, and some are running OpenStack components. Some virtual machines are scaled out horizontally, typically one on each controller host, mostly this applies on Open- Stack services. This is done for efﬁciency and redundancy reasons.

OpenStack components

These VMs are purely running OpenStack components. image-01 Runs the OpenStack Image component, Glance. dashboard-01 Runs the OpenStack Dashboard component, Horizon. novactrl-[01..n] Usually three VMs in a redundant setup, runs the controller part (e.g. API) of the OpenStack Com- pute component, Nova. volume-01 Runs the OpenStack Volume component, Cinder. telemetry-01 Runs the OpenStack metering component, Ceilometer.

1.2. Design 7 iaas Documentation, Release 0.1.0

network-[01..n] Usually three VMs in a redundant setup, runs the OpenStack Network component, Neutron. identity-01 Runs the OpenStack Identity component, Keystone.

Infrastructure services

These VMs are running various infrastructure services, that are used by either the OpenStack components, or other infrastructure or administrative services, or both. proxy-01 Proxy service for Internet access for the infrastructure nodes that are not on any public network. ns-01 Autoritative DNS server.Available publicly as ns..uh-iaas.no. resolver-0{1,2} Resolving (caching) DNS servers. These servers are running in a redundant setup via the Anycast protocol. Available publicly as resolver..uh-iaas.no. api-01 Runs HAProxy for all API services. Available publicly as api.uh-iaas.no. db-global-01 MariaDB (MySQL) database that is cross-regional. This database is synced to the other region. db-regional-01 MariaDB (MySQL) database that is regional. Contains only data that is localized to this region. console-01 Provides console services for instances in OpenStack, available via the dashboard. mq-01 Runs a message queue (RabbitMQ) that OpenStack uses for internal communication and messaging. access-01 Provides authentication services via Dataporten. metric-01 Collects data points for measuring performance counters and other things. status-01 Provides a graphical view of performance counters and other metric data. Available publicly as https: //status.uh-iaas.no/ nat-linux-0{1,2} Two nodes that functions as a NAT-ing proxy for instances, in order to give instances with a IPv6 only network access to the outside world via their internal IPv4 address. cephmon-[01..n] Runs the cephmon service for the Ceph storage cluster.

Administrative services

These VMs are running on a separate controller host, because they need to be up and running during maintenance on other VMs. admin-01 Runs Foreman for e.g. provisioning tasks, and functions at Puppetmaster for all hosts. monitor-01 Runs Sensu for monitoring tasks. logger-01 Log receiver for all hosts. builder-01 Runs our builder service, for building OpenStack images.

8 Chapter 1. Contents iaas Documentation, Release 0.1.0

1.2.5 Development hardware requirements

A key point is that each location is built from the same hardware speciﬁcation. This is done to simplify and limit inﬂuence of external variables as much as possible while building the base platform. The spec represents a minimal baseline for one site/location.

Networking

4x Layer 3 routers/switches • Connected as routed leaf-spine fabric (OSPF) • At least 48 ports 10gb SFP+ / 4 ports 40gb QSFP • Support for ONIE/OCP preferred 1x L2 management switch • 48 ports 1GbE, VLAN capable • Remote management possible Cabling and optics • 48x 10GBase-SR SFP+ tranceivers • 8x 40GBase-SR4 QSFP+ tranceivers • 4x QSFP+ to QSFP+, 40GbE passive copper direct attach cable, 0.5 meter • 4x 3 or 5 meter QSFP+ to QSFP+ OM3 MTP ﬁber cable

Servers

3x management nodes • 1u 1x12 core with 128gb RAM • 2x SFP+ 10gb and 2x 1gbE • 2x SSD drives RAID1 • Room for more disks • Redundant PSUs 3x compute nodes • 1u 2x12 core with 512Gb RAM • 2x SFP+ 10Gb and 2x 1GbE • 2x SSD drives RAID1 • Room for more disks • Redundant PSUs 5x storage nodes • 2u 1x12 core with 128gb RAM • 2x SFP+ 10Gb and 2x 1GbE • 8x 3.5” 2tb SATA drives

1.2. Design 9 iaas Documentation, Release 0.1.0

• 4x 120gb SSD drives • No RAID, only JBOD • Room for more disks (12x 3.5” ?) • Redundant PSUs Comments • Management and compute nodes could very well be the same chassis with different specs. Possibly even higher density like half width would be considered, but not blade chassis (it would mean non-standard cabling/connectivity) • Important key attribute for SSD drives is sequential write performance. SSDs might be PCIe connected. • 2tb disks for storage nodes to speed up recovery times with Ceph

1.2.6 Node overview

Warning: This page is OBSOLETE node = a virtual machine running on a physical controller box with libvirt This overview shows the different nodes, which network the nodes have access to and where the Openstack and other services are running.

10 Chapter 1. Contents iaas Documentation, Release 0.1.0

1.3 Security

Warning: This document is currently under review/construction.

This document is an attempt to write up all the security measures that can, will or should be implemented. The basis is the OpenStack Security Guide on openstack.org. We use the sections in the security guide, and try to answer the following questions: 1. Is this security measure implemented? and if not: 2. What are the potential security impact? 3. Other concerns? 4. Should this be implemented? For each recommendation, there is at least one check that can have one of four different values: • [PASS] This check has been passed • [FAIL] This check is failed • [----] This check has not been considered yet • [N/A] This check is not applicable • [DEFERRED] This check has been postponed, will be considered at a later time

1.3.1 [2021] System documentation

REVISION 2021-01-26

Contents

• [2021] System documentation – System Inventory

Impact Low Implemented percent 75% (3/4)

System Inventory

From OpenStack Security Guide: System documentation: Documentation should provide a general description of the OpenStack environment and cover all systems used (production, development, test, etc.). Documenting system components, networks, services, and software often provides the bird’s-eye view needed to thoroughly cover and consider security concerns, attack vectors and possible security domain bridging points. A system inventory may need to capture ephemeral resources such as virtual machines or virtual disk volumes that would otherwise be persistent resources in a traditional IT system.

1.3. Security 11 iaas Documentation, Release 0.1.0

The UH-IaaS infrastructure is, from hardware and up, managed completely by the UH-IaaS group, and therefore independent of each institution. Except for networking interface and physical hardware management, there are no dependencies on the institutions. Links to infrastructure documentation: [PASS] Hardware inventory A high-level view of the hardware inventory is outlined in the document Physical hardware. [PASS] Software inventory A high-level view of the software inventory is outlined in the document Virtual machines. [PASS] Network topology A high-level view of the network topology is outlined in the document Networking overview. [DEFERRED] Services, protocols and ports FIXME

1.3.2 [2021] Management

REVISION 2021-01-26

Contents

• [2021] Management – Continuous systems management

* Vulnerability management * Conﬁguration management * Secure backup and recovery * Security auditing tools – Integrity life-cycle

* Secure bootstrapping * Runtime veriﬁcation * Server hardening – Management interfaces

* Dashboard * OpenStack API * Secure shell (SSH) * Management utilities * Out-of-band management interface

Impact Medium Implemented percent 76% (13/17)

12 Chapter 1. Contents iaas Documentation, Release 0.1.0

Continuous systems management

From OpenStack Security Guide: Management - Continuous systems management: A cloud will always have bugs. Some of these will be security problems. For this reason, it is critically important to be prepared to apply security updates and general software updates. This involves smart use of conﬁguration management tools, which are discussed below. This also involves knowing when an upgrade is necessary.

Vulnerability management

Updates are announced on the OpenStack Announce mailing list. The security notifications are also posted through the downstream packages, for example, through Linux distributions that you may be sub- scribed to as part of the package updates. [PASS] Triage When we are notified of a security update, this is discussed at the next morning meeting. We will then decide the impact of the update to our environment, and take proper action. [PASS] Testing the updates We have test clouds in each location (currently OSL and BGO) which in most respects are identical to the production clouds. This allows for easy testing of updates. [PASS] Deploying the updates When testing is completed and the update is verified, and we are satisfied with any performance impact, stability, application impact etc., the update is deployed in production. This is done via the Patching policy.

Conﬁguration management

Deployment of both physical and virtual nodes in NREC is done using Ansible playbooks, which are maintained on GitHub. The conﬁguration managements is completely automated via Puppet. The Puppet code and hieradata is maintained on GitHub. All changes are tracked via Git. [PASS] Policy changes Policy changes are tracked in Git and/or our KanBan board

Secure backup and recovery

If we at some point decide to take backup of the infrastructure or instances, we should include the backup procedures and policies in the overall security plan. [PASS] Backup procedure and policy We do not take regular, incremental backups. Important data are replicated within the NREC infrastructure to mitigate information loss.

Security auditing tools

We should consider using SCAP or similar security auditing tools in combination with conﬁguration management. [FAIL] Security auditing tools Security auditing tools such as SCAP adds complexity and signiﬁcant delays in the pipeline. Therefore, this is not a priority at this time.

1.3. Security 13 iaas Documentation, Release 0.1.0

Integrity life-cycle

From OpenStack Security Guide: Management - Integrity life-cycle: We define integrity life cycle as a deliberate process that provides assurance that we are always running the expected software with the expected configurations throughout the cloud. This process begins with secure bootstrapping and is maintained through configuration management and security monitoring.

Secure bootstrapping

The Security Guide recommends having an automated provisioning process for all nodes in the cloud. This includes compute, storage, network, service and hybrid nodes. The automated provisioning process also facilitates security patching, upgrades, bug ﬁxes, and other critical changes. Software that runs with the highest privilege levels in the cloud needs special attention. [PASS] Node provisioning We use PXE for provisioning, which is recommended. We also use a separate, isolated network within the management security domain for provisioning. The provisioning process is handled by Ansible. [FAIL] Veriﬁed boot It is recommended to use secure boot via TPM chip to boot the infrastructure nodes in the cloud. TPM adds unwanted complexity and we don’t use it. [PASS] Node hardening We do general node hardening via a security baseline which we maintain via Puppet. The security baseline is based on best practice from the OS vendor, as well as our own experience. All nodes are using Mandatory Access Control (MAC) via SELinux.

Runtime veriﬁcation

From OpenStack Security Guide: Once the node is running, we need to ensure that it remains in a good state over time. Broadly speaking, this includes both conﬁguration management and security monitoring. The goals for each of these areas are different. By checking both, we achieve higher assurance that the system is operating as desired. [FAIL] Intrusion detection system We are not running an Intrusion detection system (IDS).

Server hardening

This mostly includes file integrity management. [FAIL] File integrity management (FIM) We should consider a FIM tool to ensure that files such as sensitive system or application configuration files are no corrupted or changed to allow unauthorized access or malicious behaviour. • While we don’t run a specific FIM tool, our configuration management system (Puppet) functions as a watchdog for most important files.

14 Chapter 1. Contents iaas Documentation, Release 0.1.0

Management interfaces

From OpenStack Security Guide: Management - Management interfaces: It is necessary for administrators to perform command and control over the cloud for various operational functions. It is important these command and control facilities are understood and secured. OpenStack provides several management interfaces for operators and tenants: • OpenStack dashboard (horizon) • OpenStack API • Secure shell (SSH) • OpenStack management utilities such as nova-manage and glance-manage • Out-of-band management interfaces, such as IPMI

Dashboard

[PASS] Capabilities The dashboard is configured via Puppet, and shows only capabilities that are known to work properly. Buttons, menu items etc. that doesn’t work or provides capabilities that NREC doesn’t offer are disabled in the dashboard. [PASS] Security considerations There are a few things that need to be considered (from OpenStack Security Guide: Management - Management interfaces): • The dashboard requires cookies and JavaScript to be enabled in the web browser. – (FIXME FIXME FIXME) The cookies are only used for the dashboard and are not used for tracking the user’s activities beyond NREC. • The web server that hosts the dashboard should be configured for TLS to ensure data is encrypted. – (pass): TLS v1.2 or later is enforced. • Both the horizon web service and the OpenStack API it uses to communicate with the back end are susceptible to web attack vectors such as denial of service and must be monitored. – (pass) We have monitoring in place • It is now possible (though there are numerous deployment/security implications) to upload an image file directly from a user’s hard disk to OpenStack Image service through the dashboard. For multi-gigabyte images it is still strongly recommended that the upload be done using the glance CLI. – (pass) Image uploading is done directly to Glance via a redirect in dashboard. • Create and manage security groups through dashboard. The security groups allows L3-L4 packet filtering for security policies to protect virtual machines. – (pass) The default security group blocks everything. Users can edit security groups through the dashboard.

1.3. Security 15 iaas Documentation, Release 0.1.0

OpenStack API

[PASS] Security considerations There are a few things that need to be considered (from OpenStack Security Guide: Management - Management interfaces): • The API service should be conﬁgured for TLS to ensure data is encrypted. – (pass): TLS v1.2 or later is enforced. • As a web service, OpenStack API is susceptible to familiar web site attack vectors such as denial of service attacks. – (pass) We have monitoring in place

Secure shell (SSH)

[N/A] Host key fingerprints Host key fingerprints should be stored in a secure and queryable location. One partic- ularly convenient solution is DNS using SSHFP resource records as defined in RFC-4255. For this to be secure, it is necessary that DNSSEC be deployed. • Host keys are wiped periodically to avoid conflicts and ensure that reinstalled hosts function correctly. SSH access is done through a single entry point and host keys are not important.

Management utilities

[PASS] Security considerations There are a few things that need to be considered (from OpenStack Security Guide: Management - Management interfaces): • The dedicated management utilities (*-manage) in some cases use the direct database connection. – (pass) We don’t use dedicated management utilities unless strictly necessary • Ensure that the .rc ﬁle which has your credential information is secured. – (pass) Credential information is stored securely.

Out-of-band management interface

[PASS] Security considerations There are a few things that need to be considered (from OpenStack Security Guide: Management - Management interfaces): • Use strong passwords and safeguard them, or use client-side TLS authentication. – (pass) We have strong passwords that are stored securely • Ensure that the network interfaces are on their own private(management or a separate) network. Segregate management domains with firewalls or other network gear. – (pass) OOB interfaces are on a private network • If you use a web interface to interact with the BMC/IPMI, always use the TLS interface, such as HTTPS or port 443. This TLS interface should NOT use self-signed certificates, as is often default, but should have trusted certificates using the correctly defined fully qualified domain names (FQDNs). – (n/a) OOB interfaces are on a closed network and trusted CA is not necessary.

16 Chapter 1. Contents iaas Documentation, Release 0.1.0

• Monitor the trafﬁc on the management network. The anomalies might be easier to track than on the busier compute nodes. – (n/a) Not necessary due to closed network.

1.3.3 [2021] Secure communication

REVISION 2021-01-27

Contents

• [2021] Secure communication – Certiﬁcation authorities – TLS libraries – Cryptographic algorithms, cipher modes, and protocols

Impact High Implemented percent 83% (5/6)

From OpenStack Security Guide: Secure communication: There are situations where there is a security requirement to assure the confidentiality or integrity of network traffic in an OpenStack deployment. This is generally achieved using cryptographic measures, such as the Transport Layer Security (TLS) protocol. In a typical deployment all traffic transmitted over public networks is secured, but security best practice dictates that internal traffic must also be secured. It is insufficient to rely on security domain separation for protection. If an attacker gains access to the hypervisor or host resources, compromises an API endpoint, or any other service, they must not be able to easily inject or capture messages, commands, or otherwise affect the management capabilities of the cloud. All domains should be secured with TLS, including the management domain services and intra-service communications. TLS provides the mechanisms to ensure authentication, non-repudiation, confidentiality, and integrity of user communications to the OpenStack services and between the OpenStack services themselves. Due to the published vulnerabilities in the Secure Sockets Layer (SSL) protocols, we strongly recommend that TLS is used in preference to SSL, and that SSL is disabled in all cases, unless compatibility with obsolete browsers or libraries is required. There are a number of services that need to be addressed: • Compute API endpoints • Identity API endpoints • Networking API endpoints • Storage API endpoints • Messaging server • Database server • Dashboard

1.3. Security 17 iaas Documentation, Release 0.1.0

Certiﬁcation authorities

The security guide recommends that we use separate PKI deployments for internal systems and public facing services. In the future, we may want to use separate PKI deployments for different security domains. [PASS] Customer facing interfaces using trusted CA All customer facing interfaces should be provisioned using Certificate Authorities that are installed in the operating system certificate bundles by default. It should just work without the customer having to accept an untrusted CA, or having to install some third-party software. We need certificates signed by a widely recognized public CA. • We use Digicert Terena CA on all customer facing interfaces. [FAIL] Internal endpoints use non-public CA As described above, it is recommended to use a private CA for internal endpoints. • Database connection between regions use non-public CA • Internal connections within regions use private networks and are not secured via a CA (private or otherwise)

TLS libraries

From OpenStack Security Guide: The TLS and HTTP services within OpenStack are typically implemented using OpenSSL which has a module that has been validated for FIPS 140-2. We need to make sure that we’re using an updated version of OpenSSL. [PASS] Ensure updated OpenSSL NREC is based on CentOS, and uses the OpenSSL library from that distro. We need to make sure that OpenSSL is up-to-date. • OpenSSL and all other packages are manually updated once a month.

Cryptographic algorithms, cipher modes, and protocols

The security guide recommends using TLS 1.2, as previous versions are known to be vulnerable: When you are using TLS 1.2 and control both the clients and the server, the cipher suite should be limited to ECDHE-ECDSA-AES256-GCM-SHA384. In circumstances where you do not control both endpoints and are using TLS 1.1 or 1.2 the more general HIGH:!aNULL:!eNULL:!DES:!3DES:!SSLv3:!TLSv1:!CAMELLIA is a reasonable cipher selection. [PASS] Ensure TLS 1.2 Make sure that only TLS 1.2 is used. Previous versions of TLS, as well as SSL, should be disabled completely. [PASS] Limit cipher suite on public endpoints Limit the cipher suite on public facing endpoints to the general HIGH:!aNULL:!eNULL:!DES:!3DES:!SSLv3:!TLSv1:!CAMELLIA. [N/A] Limit cipher suite on internal endpoints Limit the cipher suite on public facing endpoints to ECDHE- ECDSA-AES256-GCM-SHA384. • Not using a internal CA so this doesn’t apply in our case

18 Chapter 1. Contents iaas Documentation, Release 0.1.0

1.3.4 [2021] API endpoints

REVISION 2021-01-27

Contents

• [2021] API endpoints – API endpoint conﬁguration recommendations

* Internal API communications * Paste and middleware * API endpoint process isolation and policy * API endpoint rate-limiting

Impact High Implemented percent 85% (6/7)

From OpenStack Security Guide: API endpoints: The process of engaging an OpenStack cloud is started through the querying of an API endpoint. While there are different challenges for public and private endpoints, these are high value assets that can pose a signiﬁcant risk if compromised.

API endpoint conﬁguration recommendations

Internal API communications

From OpenStack Security Guide: OpenStack provides both public facing and private API endpoints. By default, OpenStack components use the publicly defined endpoints. The recommendation is to configure these components to use the API endpoint within the proper security domain. Services select their respective API endpoints based on the OpenStack service catalog. These services might not obey the listed public or internal API end point values. This can lead to internal management traffic being routed to external API endpoints. [PASS] Configure internal URLs in the Identity service catalog The guide recommends that our Identity service catalog be aware of our internal URLs. This feature is not utilized by default, but may be leveraged through configuration. See API endpoint configuration recommendations for details. • All services have configured admin, internal and public endpoints. [PASS] Configure applications for internal URLs It is recommended that each OpenStack service communicating to the API of another service must be explicitly configured to access the proper internal API endpoint. See API endpoint configuration recommendations. All service to service communication use internal endpoints within a region. This includes: • volume to identity • image to identity • network to identity

1.3. Security 19 iaas Documentation, Release 0.1.0

• compute to identity • compute to image • compute to volume • compute to network

Paste and middleware

From OpenStack Security Guide: Most API endpoints and other HTTP services in OpenStack use the Python Paste Deploy library. From a security perspective, this library enables manipulation of the request filter pipeline through the application’s configuration. Each element in this chain is referred to as middleware. Changing the order of filters in the pipeline or adding additional middleware might have unpredictable security impact. [N/A] Document middleware We should be careful when implementing non-standard software in the middleware, and this should be thoroughly documented. • We are not using any non-standard middleware

API endpoint process isolation and policy

From OpenStack Security Guide: You should isolate API endpoint processes, especially those that reside within the public security domain should be isolated as much as possible. Where deployments allow, API endpoints should be deployed on separate hosts for increased isolation. [N/A] Namespaces Linux supports namespaces to assign processes into independent domains. • All service backends run on different virtual hosts. [PASS] Network policy We should pay special attention to API endpoints, as they typically bridge multiple security domains. Policies should be in place and documented, and we can use ﬁrewalls, SELinux, etc. to enforce proper compartmentalization in the network layer. • The API endpoints are protected via a load balancer and strict ﬁrewalls. SELinux are running in enforced mode. [PASS] Mandatory access controls API endpoint processes should be as isolated from each other as possible. This should be enforced through Mandatory Access Controls (e.g. SELinux), not just Discretionary Access Controls. • SELinux is running in enforced mode on all nodes (virtual and physical) that are involved in API endpoints.

API endpoint rate-limiting

From OpenStack Security Guide: Within OpenStack, it is recommended that all endpoints, especially public, are provided with an extra layer of protection, by means of either a rate-limiting proxy or web application ﬁrewall. [DEFERRED] Rate-limiting on API endpoints FIXME: Add rate-limiting to HAProxy

20 Chapter 1. Contents iaas Documentation, Release 0.1.0

1.3.5 [2021] Identity

REVISION 2021-01-28

Contents

• [2021] Identity – Authentication

* Invalid login attempts * Multi-factor authentication – Authentication methods – Authorization

* Establish formal access control policies * Service authorization * Administrative users – Policies – Checklist

Impact High Implemented percent 95% (17/18)

From OpenStack Security Guide: Identity: Identity service (keystone) provides identity, token, catalog, and policy services for use speciﬁcally by services in the OpenStack family. Identity service is organized as a group of internal services exposed on one or many endpoints. Many of these services are used in a combined fashion by the frontend, for example an authenticate call will validate user/project credentials with the identity service and, upon success, create and return a token with the token service.

Authentication

Ref: OpenStack Security Guide: Identity - Authentication

Invalid login attempts

[PASS] Prevent or mitigate brute-force attacks A pattern of repetitive failed login attempts is generally an indi- cator of brute-force attacks. This is important to us as ours is a public cloud. We need to ﬁgure out if our user authentication service has the possibility to block out an account after some conﬁgured number of failed login attempts. If not, describe policies around reviewing access control logs to identify and detect unauthorized attempts to access accounts. • Users are automatically banned from logging in after a number of authentication requests.

1.3. Security 21 iaas Documentation, Release 0.1.0

Multi-factor authentication

[PASS] Multi-factor authentication for privileged accounts We should employ multi-factor authentication for network access to privileged user accounts. This will provide insulation from brute force, social engineering, and both spear and mass phishing attacks that may compromise administrator passwords. • While authentication to service accounts is possible from the “outside”, administrative actions are not possible unless connecting from the “inside”. In order to access the “inside”, 2-factor authentication is required.

Authentication methods

Ref: OpenStack Security Guide: Identity - Authentication methods [N/A] Document authentication policy requirements We should document (or provide link to external documentation) the authentication policy requirements, such as password policy enforcement (password length, diversity, expiration etc.). • Regular users are set up after autentication through Dataporten. Their password are auto-generated and random, the logic used is currently only documented in code (github:nocams-himlar-db-prep).

Authorization

Ref: OpenStack Security Guide: Identity - Authorization The Identity service supports the notion of groups and roles. Users belong to groups while a group has a list of roles. OpenStack services reference the roles of the user attempting to access the service. The OpenStack policy enforcer middleware takes into consideration the policy rule associated with each resource then the user’s group/roles and association to determine if access is allowed to the requested resource.

Establish formal access control policies

[PASS] Describe formal access control policies The policies should include the conditions and processes for cre- ating, deleting, disabling, and enabling accounts, and for assigning privileges to the accounts. • Enabling accounts and giving account privileges (such access to projects, ﬂavors, images) are done automatically using the self-service portal, or by the NREC administrators. [PASS] Describe periodic review We should periodically review the policies to ensure that the conﬁguration is in compliance with approved policies. • The policy is reviewed in this document. The compliance is reviewed often, during regular daily meetings.

Service authorization

[PASS] Don’t use “tempAuth” file for service auth The Compute and Object Storage can be configured to use the Identity service to store authentication information. The “tempAuth” file method displays the password in plain text and should not be used. • tempAuth is not used. [DEFERRED] FIXME Use client authentication for TLS The Identity service supports client authentication for TLS which may be enabled. TLS client authentication provides an additional authentication factor, in addition to the user name and password, that provides greater reliability on user identification.

22 Chapter 1. Contents iaas Documentation, Release 0.1.0

[PASS] Protect sensitive files The cloud administrator should protect sensitive configuration files from unauthorized modification. This can be achieved with mandatory access control frameworks such as SELinux, including /etc/keystone/keystone.conf and X.509 certificates. • SELinux is running in enforcing mode.

Administrative users

We recommend that admin users authenticate using Identity service and an external authentication service that supports 2-factor authentication, such as a certiﬁcate. This reduces the risk from passwords that may be compromised. This recommendation is in compliance with NIST 800-53 IA-2(1) guidance in the use of multi-factor authentication for network access to privileged accounts. [PASS] Use 2-factor authentication for administrative access Administrative access is provided via a login service that requires 2-factor authentication.

Policies

Ref: OpenStack Security Guide: Identity - Policies [PASS] Describe policy configuration management Each OpenStack service defines the access policies for its resources in an associated policy file. A resource, for example, could be API access, the ability to attach to a volume, or to fire up instances. The policy rules are specified in JSON format and the file is called policy.json. Ensure that any changes to the access control policies do not unintentionally weaken the security of any resource. • We are using default policies, with overrides to disable certain capabilities.

Checklist

Ref: OpenStack Security Guide: Identity - Checklist See the above link for info about these checks. [PASS] Check-Identity-01: Is user/group ownership of config files set to keystone? Ownership set to root:keystone or keystone:keystone [PASS] Check-Identity-02: Are strict permissions set for Identity configuration files? Not all files in check list exists, the rest is OK [N/A] Check-Identity-03: is TLS enabled for Identity? Endpoint runs on the load balancer [PASS] Check-Identity-04: Does Identity use strong hashing algorithms for PKI tokens? Yes, set to bcrypt [PASS] Check-Identity-05: Is max_request_body_size set to default (114688)? Yes [N/A] Check-Identity-06: Disable admin token in /etc/keystone/keystone.conf Enabled in keystone.conf, but the service itself is disabled. [PASS] Check-Identity-07: insecure_debug false in /etc/keystone/keystone.conf Yes [PASS] Check-Identity-08: Use fernet token in /etc/keystone/keystone.conf Yes

1.3. Security 23 iaas Documentation, Release 0.1.0

1.3.6 [2021] Dashboard (FIXME)

REVISION 2021-02-05

Contents

• [2021] Dashboard (FIXME) – Domain names, dashboard upgrades, and basic web server conﬁguration

* Domain names * Basic web server conﬁguration * Allowed hosts * Horizon image upload – HTTPS, HSTS, XSS, and SSRF

* Cross Site Scripting (XSS) * Cross Site Request Forgery (CSRF) * Cross-Frame Scripting (XFS) * HTTPS * HTTP Strict Transport Security (HSTS) – Front-end caching and session back end

* Front-end caching * Session back end – Static media – Secret key – Cookies – Cross Origin Resource Sharing (CORS) – Debug – Checklist

Impact High Implemented percent 67% (23/34)

From OpenStack Security Guide: Dashboard: The Dashboard (horizon) is the OpenStack dashboard that provides users a self-service portal to provision their own resources within the limits set by administrators. These include provisioning users, deﬁning instance ﬂavors, uploading virtual machine (VM) images, managing networks, setting up security groups, starting instances, and accessing the instances through a console.

24 Chapter 1. Contents iaas Documentation, Release 0.1.0

Domain names, dashboard upgrades, and basic web server conﬁguration

Ref: OpenStack Security Guide: Dashboard - Domain names, dashboard upgrades, and basic web server conﬁguration

Domain names

From OpenStack Security Guide: We strongly recommend deploying dashboard to a second-level domain, such as https://example.com, rather than deploying dashboard on a shared subdomain of any level, for example https://openstack.example.org or https://horizon.openstack.example.org. We also advise against deploying to bare internal domains like https://horizon/. These recommendations are based on the limitations of browser same-origin-policy. [FAIL] Use second-level domain We are not given our own second-devel domain. The dashboard is available as “dashboard.nrec.no”. [DEFERRED] Employ HTTP Strict Transport Security (HSTS) If not using second-level domain, we are advised to avoid a cookie-backed session store and employ HTTP Strict Transport Security (HSTS) • We need to revisit this as soon as possible.

Basic web server conﬁguration

From OpenStack Security Guide: The dashboard should be deployed as a Web Services Gateway Interface (WSGI) application behind an HTTPS proxy such as Apache or nginx. If Apache is not already in use, we recommend nginx since it is lightweight and easier to conﬁgure correctly. [PASS] Is dashboard deployed as a WSGI application behind an HTTPS proxy? Yes, dashboard is deployed using mod_wsgi on an Apache server.

Allowed hosts

From OpenStack Security Guide: Configure the ALLOWED_HOSTS setting with the fully qualified host name(s) that are served by the OpenStack dashboard. Once this setting is provided, if the value in the “Host:” header of an incoming HTTP request does not match any of the values in this list an error will be raised and the requestor will not be able to proceed. Failing to configure this option, or the use of wild card characters in the specified host names, will cause the dashboard to be vulnerable to security breaches associated with fake HTTP Host headers. [FAIL] Is ALLOWED_HOSTS configured for dashboard? The NREC dashboard should be available on the In- ternet. As such, using ALLOWED_HOSTS would defeat the purpose of the dashboard.

1.3. Security 25 iaas Documentation, Release 0.1.0

Horizon image upload

It is recommended that we disable HORIZON_IMAGES_ALLOW_UPLOAD unless we have a plan to prevent resource exhaustion and denial of service. [N/A] Is HORIZON_IMAGES_ALLOW_UPLOAD disabled? Image uploads works through the Glance API and not through dashboard. The API has rate-limiting turned on.

HTTPS, HSTS, XSS, and SSRF

Ref: OpenStack Security Guide: Dashboard - HTTPS, HSTS, XSS, and SSRF

Cross Site Scripting (XSS)

From OpenStack Security Guide: Unlike many similar systems, the OpenStack dashboard allows the entire Unicode character set in most ﬁelds. This means developers have less latitude to make escaping mistakes that open attack vectors for cross-site scripting (XSS). [N/A] Audit custom dashboards Audit any custom dashboards, paying particular attention to use of the mark_safe function, use of is_safe with custom template tags, the safe template tag, anywhere auto escape is turned off, and any JavaScript which might evaluate improperly escaped data. • We are not using custom dashboards

Cross Site Request Forgery (CSRF)

From OpenStack Security Guide: Dashboards that utilize multiple instances of JavaScript should be audited for vulnerabilities such as inappropriate use of the @csrf_exempt decorator. [N/A] Audit custom dashboards We are not using custom dashboards

Cross-Frame Scripting (XFS)

From OpenStack Security Guide: Legacy browsers are still vulnerable to a Cross-Frame Scripting (XFS) vulnerability, so the OpenStack dashboard provides an option DISALLOW_IFRAME_EMBED that allows extra security hardening where iframes are not used in deployment. [PASS] Disallow iframe embed DISALLOW_IFRAME_EMBED it set.

26 Chapter 1. Contents iaas Documentation, Release 0.1.0

HTTPS

From OpenStack Security Guide: Deploy the dashboard behind a secure HTTPS server by using a valid, trusted certificate from a recognized certificate authority (CA). [PASS] Use trusted certificate for dashboard We are using a trusted CA [PASS] Redirect to fully qualified HTTPS URL HTTP requests to the dashboard domain are configured to redirect to the fully qualified HTTPS URL.

HTTP Strict Transport Security (HSTS)

It is highly recommended to use HTTP Strict Transport Security (HSTS). [DEFERRED] Use HSTS FIXME: Revisit this ASAP

Front-end caching and session back end

Ref: OpenStack Security Guide: Dashboard - Front-end caching and session back end

Front-end caching

[PASS] Do not use front-end caching tools We are not using front-end caching.

Session back end

It is recommended to use django.contrib.sessions.backends.cache as our session back end with memcache as the cache. This as opposed to the default, which saves user data in signed, but unencrypted cookies stored in the browser. [PASS] Consider using caching back end Memcache is used as caching backend.

Static media

Ref: OpenStack Security Guide: Dashboard - Static media The dashboard’s static media should be deployed to a subdomain of the dashboard domain and served by the web server. The use of an external content delivery network (CDN) is also acceptable. This subdomain should not set cookies or serve user-provided content. The media should also be served with HTTPS. [FAIL] Static media via subdomain The amount of static media served from the NREC dashboard is next to noth- ing. We don’t see any need to move this to a subdomain. [N/A] Subdomain not serving cookies or user-provided content Not using subdomain. [N/A] Subdomain via HTTPS Not using subdomain.

1.3. Security 27 iaas Documentation, Release 0.1.0

Secret key

Ref: OpenStack Security Guide: Dashboard - Secret key The dashboard depends on a shared SECRET_KEY setting for some security functions. The secret key should be a randomly generated string at least 64 characters long, which must be shared across all active dashboard instances. Compromise of this key may allow a remote attacker to execute arbitrary code. Rotating this key invalidates existing user sessions and caching. Do not commit this key to public repositories. [DEFERRED] Randomly generated string at least 64 characters long Randomly generated, but much shorter than 64 chars (FIXME - TODO) [PASS] Not in public repo We have internal stores for secret keys.

Ref: OpenStack Security Guide: Dashboard - Cookies [PASS] Session cookies should be set to HTTPONLY Conﬁgured in /etc/openstack-dashboard/local_settings:

OPENSTACK_SESSION_COOKIE_HTTPONLY= True

[PASS] Never conﬁgure CSRF or session cookies to have a wild card domain with a leading dot Conﬁgured in /etc/openstack-dashboard/local_settings:

CSRF_COOKIE_SECURE= True

[PASS] Horizon’s session and CSRF cookie should be secured when deployed with HTTPS Conﬁgured in /etc/openstack-dashboard/local_settings:

SESSION_COOKIE_SECURE= True

Cross Origin Resource Sharing (CORS)

Ref: OpenStack Security Guide: Dashboard - Cross Origin Resource Sharing (CORS) Conﬁgure your web server to send a restrictive CORS header with each response, allowing only the dashboard domain and protocol [DEFERRED] Restrictive CORS header FIXME - TODO

Debug

It is recommended to set debug to false in production environments. [PASS] Disable the debug ﬂag Conﬁgured in /etc/openstack-dashboard/local_settings:

DEBUG= False

28 Chapter 1. Contents iaas Documentation, Release 0.1.0

Checklist

Ref: OpenStack Security Guide: Dashboard - Checklist See the above link for info about these checks. [FAIL] Check-Dashboard-01: Is user/group of config files set to root/horizon? The “horizon” group does not exist in our case, we’re using the group “apache”. The local_settings file has user/group “apache apache” (FIXME - TODO):

# ls -l /etc/openstack-dashboard/local_settings -rw-r-----.1 apache apache 32004 Dec3 13:21/etc/openstack-dashboard/local_

˓→settings

[PASS] Check-Dashboard-02: Are strict permissions set for horizon configuration files? The “horizon” group does not exist in our case, we’re using the group “apache”. The local_settings file has mode 0640:

# ls -l /etc/openstack-dashboard/local_settings -rw-r-----.1 apache apache 32004 Dec3 13:21/etc/openstack-dashboard/local_

˓→settings

[PASS] Check-Dashboard-03: Is DISALLOW_IFRAME_EMBED parameter set to True? Yes. [PASS] Check-Dashboard-04: Is CSRF_COOKIE_SECURE parameter set to True? Yes [PASS] Check-Dashboard-05: Is SESSION_COOKIE_SECURE parameter set to True? Yes [PASS] Check-Dashboard-06: Is SESSION_COOKIE_HTTPONLY parameter set to True? Yes [PASS] Check-Dashboard-07: Is PASSWORD_AUTOCOMPLETE set to False? Yes [PASS] Check-Dashboard-08: Is DISABLE_PASSWORD_REVEAL set to True? Yes [PASS] Check-Dashboard-09: Is ENFORCE_PASSWORD_CHECK set to True? Yes [N/A] Check-Dashboard-10: Is PASSWORD_VALIDATOR conﬁgured? We use external authentication [FAIL] Check-Dashboard-11: Is SECURE_PROXY_SSL_HEADER conﬁgured? FIXME - TODO

1.3.7 [2021] Compute

REVISION 2021-02-28

Contents

• [2021] Compute – Hypervisor selection – Hardening the virtualization layers

* Physical hardware (PCI passthrough) * Minimizing the QEMU code base * Compiler hardening * Mandatory access controls – How to select virtual consoles – Checklist

1.3. Security 29 iaas Documentation, Release 0.1.0

Impact High Implemented percent 78% (7/9)

From OpenStack Security Guide: Compute: The OpenStack Compute service (nova) is one of the more complex OpenStack services. It runs in many locations throughout the cloud and interacts with a variety of internal services. The OpenStack Compute service offers a variety of configuration options which may be deployment specific. In this chapter we will call out general best practice around Compute security as well as specific known configurations that can lead to security issues. In general, the nova.conf file and the /var/lib/nova locations should be secured. Controls like centralized logging, the policy.json file, and a mandatory access control framework should be implemented. Additionally, there are environmental considerations to keep in mind, depending on what functionality is desired for your cloud.

Hypervisor selection

Ref: OpenStack Security Guide: Compute - Hypervisor selection We are using KVM.

Hardening the virtualization layers

Ref: OpenStack Security Guide: Compute - Hardening the virtualization layers

Physical hardware (PCI passthrough)

Many hypervisors offer a functionality known as PCI passthrough. This allows an instance to have direct access to a piece of hardware on the node. For example, this could be used to allow instances to access video cards or GPUs offering the compute unified device architecture (CUDA) for high performance computation. This feature carries two types of security risks: direct memory access and hardware infection. [N/A] Ensure that the hypervisor is configured to utilize IOMMU Not applicable as PCI passthrough is disabled. [PASS] Disable PCI passthrough PCI passthrough is disabled. We may enable PCI passthrough for special compute nodes with GPU etc., but these will be confined in spesialized availability zones and not generally available.

Minimizing the QEMU code base

Does not apply. We are using precompiled QEMU.

30 Chapter 1. Contents iaas Documentation, Release 0.1.0

Compiler hardening

Does not apply. We are using precompiled QEMU.

Mandatory access controls

[PASS] Ensure SELinux / sVirt is running in Enforcing mode SELinux is running in enforcing mode on all hypervisor nodes.

How to select virtual consoles

Ref: OpenStack Security Guide: Compute - How to select virtual consoles [PASS] Is the VNC service encrypted? Yes. Communication between the customer and the public facing VNC service is encrypted.

Checklist

Ref: OpenStack Security Guide: Compute - Checklist See the above link for info about these checks. [PASS] Check-Compute-01: Is user/group ownership of conﬁg ﬁles set to root/nova? Yes, except for /etc/nova which has “root root”:

# stat -L -c "%U %G" /etc/nova/{,nova.conf,api-paste.ini,policy.json,rootwrap.

˓→conf} root root root nova root nova root nova root nova

[PASS] Check-Compute-02: Are strict permissions set for conﬁguration ﬁles? Yes:

# stat -L -c "%a" /etc/nova/{nova.conf,api-paste.ini,policy.json,rootwrap.conf} 640 640 640 640

[PASS] Check-Compute-03: Is keystone used for authentication? Yes [FAIL] Check-Compute-04: Is secure protocol used for authentication? Communication is completely on the inside on a private network, which we consider to be an acceptible risk. [FAIL] Check-Compute-05: Does Nova communicate with Glance securely? Communication is completely on the inside on a private network, which we consider to be an acceptible risk.

1.3. Security 31 iaas Documentation, Release 0.1.0

1.3.8 [2021] Block Storage

REVISION 2021-03-06

Contents

• [2021] Block Storage – NREC block storage description – Checklist

Impact High Implemented percent 55% (5/9)

From OpenStack Security Guide: Block Storage: OpenStack Block Storage (cinder) is a service that provides software (services and libraries) to self- service manage persistent block-level storage devices. This creates on-demand access to Block Storage resources for use with OpenStack Compute (nova) instances. This creates software-deﬁned storage via abstraction by virtualizing pools of block storage to a variety of back-end storage devices which can be either software implementations or traditional hardware storage products. The primary functions of this is to manage the creation, attaching and detaching of the block devices. The consumer requires no knowledge of the type of back-end storage equipment or where it is located.

NREC block storage description

We have deployed a cinder backend based on Ceph, the clustered ﬁle system. Every compute node is given read/write access to a pool where instance block volumes are stored. The connection is made with the ceph rbd client.

Checklist

Ref: OpenStack Security Guide: Block Storage - Checklist See the above link for info about these checks. [PASS] Check-Block-01: Is user/group ownership of conﬁg ﬁles set to root/cinder? Yes, except for /etc/cinder which has “root root”:

# stat -L -c "%U %G" /etc/cinder/{,cinder.conf,api-paste.ini,policy.json,rootwrap.

˓→conf} root root root cinder root cinder stat: cannot stat ‘/etc/cinder/policy.json’: No such file or directory root cinder

[PASS] Check-Block-02: Are strict permissions set for conﬁguration ﬁles? Yes:

# stat -L -c "%a" /etc/cinder/{cinder.conf,api-paste.ini,policy.json,rootwrap.

˓→conf} 640 640 (continues on next page)

32 Chapter 1. Contents iaas Documentation, Release 0.1.0

(continued from previous page) stat: cannot stat ‘/etc/cinder/policy.json’: No such file or directory 640

[N/A] Check-Block-03: Is keystone used for authentication? Deprecated as of Stein release. [FAIL] Check-Block-04: Is TLS enabled for authentication? Communication is completely on the inside on a private network, which we consider to be an acceptible risk. [FAIL] Check-Block-05: Does cinder communicate with nova over TLS? Communication is completely on the inside on a private network, which we consider to be an acceptible risk. [FAIL] Check-Block-06: Does cinder communicate with glance over TLS? Communication is completely on the inside on a private network, which we consider to be an acceptible risk. [N/A] Check-Block-07: Is NAS operating in a secure environment? We do not have a NAS in our environment. [PASS] Check-Block-08: Is max size for the body of a request set to default (114688)? Yes [FAIL] Check-Block-09: Is the volume encryption feature enabled? We do not offer encrypted volumes at this time.

1.3.9 [2021] Image Storage

REVISION 2021-05-12

Contents

• [2021] Image Storage – Checklist

Impact Medium Implemented percent 80% (4/5)

From OpenStack Security Guide: Image Storage: *OpenStack Image Storage (glance) is a service where users can upload and discover data assets that are meant to be used with other services. This currently includes images and metadata deﬁnitions. Image services include discovering, registering, and retrieving virtual machine images. Glance has a RESTful API that allows querying of VM image metadata as well as retrieval of the actual image.*

Checklist

Ref: OpenStack Security Guide: Image Storage - Checklist See the above link for info about these checks. [PASS] Check-Image-01: Is user/group ownership of conﬁg ﬁles set to root/glance? Yes, except for /etc/glance which has “root root”:

1.3. Security 33 iaas Documentation, Release 0.1.0

# stat -L -c "%U %G" /etc/glance/{,glance-api-paste.ini,glance-api.conf,glance-

˓→cache.conf,glance-manage.conf,glance-registry-paste.ini,glance-registry.conf,

˓→glance-scrubber.conf,glance-swift-store.conf,policy.json,schema-image.json,

˓→schema.json} root root stat: cannot stat ‘/etc/glance/glance-api-paste.ini’: No such file or directory root glance root glance stat: cannot stat ‘/etc/glance/glance-manage.conf’: No such file or directory stat: cannot stat ‘/etc/glance/glance-registry-paste.ini’: No such file or

˓→directory root glance root glance stat: cannot stat ‘/etc/glance/glance-swift-store.conf’: No such file or directory root glance root glance stat: cannot stat ‘/etc/glance/schema.json’: No such file or directory

[FAIL] Check-Image-02: Are strict permissions set for configuration files? Yes, all files have permissions 640:

# stat -L -c "%a" /etc/glance/{,glance-api-paste.ini,glance-api.conf,glance-cache.

˓→conf,glance-manage.conf,glance-registry-paste.ini,glance-registry.conf,glance-

˓→scrubber.conf,glance-swift-store.conf,policy.json,schema-image.json,schema.json} 755 stat: cannot stat ‘/etc/glance/glance-api-paste.ini’: No such file or directory 640 640 stat: cannot stat ‘/etc/glance/glance-manage.conf’: No such file or directory stat: cannot stat ‘/etc/glance/glance-registry-paste.ini’: No such file or

˓→directory 640 640 stat: cannot stat ‘/etc/glance/glance-swift-store.conf’: No such file or directory 640 640 stat: cannot stat ‘/etc/glance/schema.json’: No such file or directory

[N/A] Check-Image-03: Is keystone used for authentication? Deprecated as of Stein release. [FAIL] Check-Image-04: Is TLS enabled for authentication? Communication is completely on the inside on a private network, which we consider to be an acceptible risk. [N/A] Check-Image-05: Are masked port scans prevented? The Glance v1 API is disabled.

1.3.10 [2021] Shared File Systems

REVISION 2021-03-06

Contents

• [2021] Shared File Systems

From OpenStack Security Guide: Shared File Systems: The Shared File Systems service (manila) provides a set of services for management of shared ﬁle systems in a multi-tenant cloud environment, similar to how OpenStack provides for block-based storage manage-

34 Chapter 1. Contents iaas Documentation, Release 0.1.0

ment through the OpenStack Block Storage service project. With the Shared File Systems service, you can create a remote file system, mount the file system on your instances, and then read and write data from your instances to and from your file system.

Note: Does not apply. We are not using Manila.

1.3.11 [2019] Networking

REVISION 2019-03-14

Contents

• [2019] Networking – Networking services

* L2 isolation using VLANs and tunneling * Network services – Networking services security best practices

* OpenStack Networking service conﬁguration – Securing OpenStack networking services

* Networking resource policy engine * Security groups * Quotas – Checklist

Impact High Implemented percent 85% (12/14)

From OpenStack Security Guide: Networking: OpenStack Networking enables the end-user or tenant to define, utilize, and consume networking resources. OpenStack Networking provides a tenant-facing API for defining network connectivity and IP addressing for instances in the cloud in addition to orchestrating the network configuration. With the transition to an API-centric networking service, cloud architects and administrators should take into consideration best practices to secure physical and virtual network infrastructure and services.

1.3. Security 35 iaas Documentation, Release 0.1.0

Networking services

Ref: OpenStack Security Guide: Networking - Networking services

L2 isolation using VLANs and tunneling

Does not apply. We’re using Calico, in which L2 isn’t employed at all.

Network services

[PASS] Use Neutron for security groups The calico neutron network plugin provides a rich security feature set. Calico uses neutron security groups and implements the rules with iptables on the compute hosts. Thus, security rulesets can be described down to instance level.

Networking services security best practices

Ref: OpenStack Security Guide: Networking - Networking services security best practices [PASS] Document how Calico is used in UH-IaaS infrastructure We enable the calico plugin as the neutron core plugin system wide. Thus, no L2 connectivity is provided between instances, and as a design feature, no project isolation on L3 connectivity. In other words, there is no such thing as a private network, even for RFC 1918 address spaces. This design relies on security groups to provide isolation and pr project security. [N/A] Document which security domains have access to OpenStack network node As a consequence of our network design, no network nodes are deployed. [N/A] Document which security domains have access to SDN services node We do not use SDN service nodes.

OpenStack Networking service conﬁguration

[PASS] Restrict bind address of the API server: neutron-server Neutron API servers is bound to interal network only.

Securing OpenStack networking services

Ref: OpenStack Security Guide: Networking - Securing OpenStack networking services

Networking resource policy engine

From OpenStack Security Guide: A policy engine and its configuration file, policy.json, within OpenStack Networking provides a method to provide finer grained authorization of users on tenant networking methods and objects. The OpenStack Networking policy definitions affect network availability, network security and overall OpenStack security. [PASS] Evaluate network policy User creation of networks, virtual routers and networks is prohibited by policy. Only administrator created networking resources are available for projects and users.

36 Chapter 1. Contents iaas Documentation, Release 0.1.0

Security groups

``nova.conf`` should always disable built-in security groups and proxy all security group calls to the OpenStack Networking API when using OpenStack Networking. [PASS] Set firewall_driver option in nova.conf firewall_driver is set to nova.virt.firewall.NoopFirewallDriver so that nova-compute does not perform iptables-based filtering itself. [FAIL] Set security_group_api option in nova.conf It is recommended that security_group_api is set to neutron so that all security group requests are proxied to the OpenStack Networking service. We do not set the security_group_api option at all.

Quotas

[N/A] Document choices wrt. networking quotas As users can not create networking resources, no quotas apply.

Checklist

Ref: OpenStack Security Guide: Networking - Checklist See the above link for info about these checks. [PASS] Check-Neutron-01: Is user/group ownership of config files set to root/neutron? Yes [PASS] Check-Neutron-02: Are strict permissions set for configuration files? Yes [PASS] Check-Neutron-03: Is keystone used for authentication? Yes [PASS] Check-Neutron-04: Is secure protocol used for authentication? Yes [FAIL] Check-Neutron-05: Is TLS enabled on Neutron API server? The negative implications for the user experience by implementing this is considered to outweight the extra security gained by this.

1.3.12 [2019] Object Storage

REVISION 2019-03-14

Contents

• [2019] Object Storage

From OpenStack Security Guide: Object Storage: OpenStack Object Storage (swift) is a service that provides software that stores and retrieves data over HTTP. Objects (blobs of data) are stored in an organizational hierarchy that offers anonymous read- only access, ACL deﬁned access, or even temporary access. Object Store supports multiple token-based authentication mechanisms implemented via middleware.

Note: Does not apply. We are not using Swift.

1.3. Security 37 iaas Documentation, Release 0.1.0

1.3.13 Message queuing

Last changed: 2021-09-14

Contents

• Message queuing – Messaging security

* Messaging transport security * Queue authentication and access control * Message queue process isolation and policy

Impact High Implemented percent 0% (0/8)

From OpenStack Security Guide: Message queuing: Message queues effectively facilitate command and control functions across OpenStack deployments. Once access to the queue is permitted no further authorization checks are performed. Services accessible through the queue do validate the contexts and tokens within the actual message payload. However, you must note the expiration date of the token because tokens are potentially re-playable and can authorize other services in the infrastructure. OpenStack does not support message-level conﬁdence, such as message signing. Consequently, you must secure and authenticate the message transport itself. For high-availability (HA) conﬁgurations, you must perform queue-to-queue authentication and encryption.

Note: We are using RabbitMQ as message queuing service back end.

Messaging security

Ref: OpenStack Security Guide: Message queuing - Messaging security

Messaging transport security

From OpenStack Security Guide: We highly recommend enabling transport-level cryptography for your message queue. Using TLS for the messaging client connections provides protection of the communications from tampering and eavesdrop- ping in-transit to the messaging server. [DEFERRED] Ensure TLS is used for RabbitMQ • TLS is NOT used for the messaging service. Should be considered. [DEFERRED] Use an internally managed CA • No CA as TLS is not used [DEFERRED] Ensure restricted file permissions on certificate and key files

38 Chapter 1. Contents iaas Documentation, Release 0.1.0

• No CA as TLS is not used

Queue authentication and access control

From OpenStack Security Guide: We recommend configuring X.509 client certificates on all the OpenStack service nodes for client connections to the messaging queue and where possible (currently only Qpid) perform authentication with X.509 client certificates. When using user names and passwords, accounts should be created per-service and node for finer grained auditability of access to the queue. [DEFERRED] Configure X.509 client certificates on all OpenStack service nodes • Currently no TLS/user certificates set up [DEFERRED] Any user names and passwords are per-service and node • Currently common password. ?????

Message queue process isolation and policy

[----] Use network namespaces Network namespaces are highly recommended for all services running on Open- Stack Compute Hypervisors. This will help prevent against the bridging of network trafﬁc between VM guests and the management network. • FIXME: Ensure and document [DEFERRED] Ensure queue servers only accept connections from management network FIXME: Ensure and document [DEFERRED] Use mandatory access controls FIXME: SELinux in enforcing mode on all nodes

1.3.14 [2019] Data processing

REVISION 2019-03-14

Contents

• [2019] Data processing

From OpenStack Security Guide: Data processing: The Data processing service for OpenStack (sahara) provides a platform for the provisioning and management of instance clusters using processing frameworks such as Hadoop and Spark. Through the Open- Stack dashboard or REST API, users will be able to upload and execute framework applications which may access data in object storage or external providers. The data processing controller uses the Orches- tration service to create clusters of instances which may exist as long-running groups that can grow and shrink as requested, or as transient groups created for a single workload.

Note: Does not apply. We are not using Sahara.

1.3. Security 39 iaas Documentation, Release 0.1.0

1.3.15 Databases

Last changed: 2021-09-14

Contents

• Databases – Database back end considerations – Database access control

* Database authentication and access control * Require user accounts to require SSL transport * Authentication with X.509 certiﬁcates * Nova-conductor – Database transport security

* Database server IP address binding * Database transport

Impact High Implemented percent | 44% (4/9)

From OpenStack Security Guide: Databases: The choice of database server is an important consideration in the security of an OpenStack deployment. Multiple factors should be considered when deciding on a database server, however for the scope of this book only security considerations will be discussed. OpenStack supports a variety of database types (see OpenStack Cloud Administrator Guide for more information). The Security Guide currently focuses on PostgreSQL and MySQL.

Note: We are using MariaDB 10.1 with packages directly from upstream repo.

Database back end considerations

Ref: OpenStack Security Guide: Databases - Database back end considerations [DEFERRED] Evaluate existing MySQL security guidance See link above for details. • FIXME: Evaluate and document

40 Chapter 1. Contents iaas Documentation, Release 0.1.0

Database access control

Ref: OpenStack Security Guide: Databases - Database access control

Database authentication and access control

From OpenStack Security Guide: Given the risks around access to the database, we strongly recommend that unique database user accounts be created per node needing access to the database. [PASS] Unique database user accounts per node Each service run on different host, and each host has a unique user. [PASS] Separate database administrator account The root user is only used to provision new databases and users. [DEFERRED] Database administrator account is protected FIXME: Document this

Require user accounts to require SSL transport

[DEFERRED] The database user accounts are conﬁgured to require TLS All databases support TLS, but only DB replication between location requires TLS.

Authentication with X.509 certiﬁcates

[DEFERRED] The database user accounts are conﬁgured to require X.509 certiﬁcates FIXME: Document this

Nova-conductor

[PASS] Consider turning off nova-conductor OpenStack Compute offers a sub-service called nova-conductor which proxies database connections over RPC. We use nova conductor, and nova compute have access to it over the message bus. The RPC messaging bus are not encrypted, but run on a private network. This is acceptable risk.

Database transport security

Ref: OpenStack Security Guide: Databases - Database transport security

Database server IP address binding

[PASS] Database access only over an isolated management network Database replication is done over public network, with TLS and ﬁrewall to restrict access.

1.3. Security 41 iaas Documentation, Release 0.1.0

Database transport

[DEFERRED] The database requires TLS All databases support TLS transport, but only DB replication between locations requires TLS.

1.3.16 Tenant data privacy

Last changed: 2021-09-14

Contents

• Tenant data privacy – Data privacy concerns

* Data residency * Data disposal · Data not securely erased · Instance memory scrubbing · Cinder volume data · Image service delay delete feature · Compute soft delete feature · Compute instance ephemeral storage – Data encryption

* Volume encryption * Ephemeral disk encryption * Block Storage volumes and instance ephemeral ﬁlesystems * Network data – Key management

Impact High Implemented percent 0% (0/?)

From OpenStack Security Guide: Tenant data privacy: OpenStack is designed to support multitenancy and those tenants will most probably have different data requirements. As a cloud builder and operator you need to ensure your OpenStack environment can address various data privacy concerns and regulations.

42 Chapter 1. Contents iaas Documentation, Release 0.1.0

Data privacy concerns

Ref: OpenStack Security Guide: Tenant data privacy - Data privacy concerns

Data residency

From OpenStack Security Guide: Numerous OpenStack services maintain data and metadata belonging to tenants or reference tenant information. Tenant data stored in an OpenStack cloud may include the following items: - Object Storage objects - Compute instance ephemeral ﬁlesystem storage - Compute instance memory - Block Storage volume data - Public keys for Compute access - Virtual machine images in the Image service - Machine snapshots - Data passed to OpenStack Compute’s conﬁguration-drive extension Metadata stored by an OpenStack cloud includes the following non-exhaustive items: - Organization name - User’s “Real Name” - Number or size of running instances, buckets, objects, volumes, and other quota- related items - Number of hours running instances or storing data - IP addresses of users - Internally generated private keys for compute image bundling

Data disposal

From OpenStack Security Guide: OpenStack operators should strive to provide a certain level of tenant data disposal assurance. Best practices suggest that the operator sanitize cloud system media (digital and non-digital) prior to disposal, release out of organization control or release for reuse. Sanitization methods should implement an appro- priate level of strength and integrity given the speciﬁc security domain and sensitivity of the information. The security guide states that the cloud operators should do the following: [DEFERRED] Track, document and verify media sanitization and disposal actions • OSL: Media are shredded before being disposed • BGO: unknown [DEFERRED] Test sanitation equipment and procedures to verify proper performance • OSL: Equipment has been properly tested • BGO: unknown [PASS] Sanitize portable, removable storage devices prior to connecting such devices to the cloud infrastructure

• Portable, removable media are never connected to the cloud infrastructure [DEFERRED] Destroy cloud system media that cannot be sanitized • OSL: Media are destroyed using a shredder • BGO: unknown

1.3. Security 43 iaas Documentation, Release 0.1.0

Data not securely erased

Regarding erasure of metadata, the security guide suggests using database and/or system conﬁguration for auto vacuuming and periodic free-space wiping. [DEFERRED] Periodic database vacuuming Not implemented at this time. We will revisit this at a later time. [FAIL] Periodic free-space wiping of ephemeral storage We’re not doing this, as we consider this to be an acceptable risk.

Instance memory scrubbing

As we’re using KVM, which relies on Linux page management, we need to consult the KVM documentation about memory scrubbing. [----] Consider automatic/periodic memory scrubbing FIXME: Consult KVM doc, consider if this is needed and document

Cinder volume data

From OpenStack Security Guide: Use of the OpenStack volume encryption feature is highly encouraged. This is discussed in the Data Encryption section below. When this feature is used, destruction of data is accomplished by securely deleting the encryption key. [DEFERRED] Consider volume encryption Nice to have, but adds complexity. We will revisit this. [FAIL] Secure erasure of volume data We’re not doing this, as we consider this to be an acceptable risk.

Image service delay delete feature

From OpenStack Security Guide: OpenStack Image service has a delayed delete feature, which will pend the deletion of an image for a deﬁned time period. It is recommended to disable this feature if it is a security concern [PASS] Consider disabling delayed delete Considered, we don’t think this is a security concern.

Compute soft delete feature

From OpenStack Security Guide: OpenStack Compute has a soft-delete feature, which enables an instance that is deleted to be in a soft- delete state for a deﬁned time period. The instance can be restored during this time period. [PASS] Consider disabling compute soft delete Considered, we don’t think this is a security concern.

44 Chapter 1. Contents iaas Documentation, Release 0.1.0

Compute instance ephemeral storage

From OpenStack Security Guide: The creation and destruction of ephemeral storage will be somewhat dependent on the chosen hypervisor and the OpenStack Compute plug-in. [DEFERRED] Document ephemeral storage deletion FIXME: Document how this works in our environment

Data encryption

From OpenStack Security Guide: Tenant data privacy - Data encryption: The option exists for implementers to encrypt tenant data wherever it is stored on disk or transported over a network, such as the OpenStack volume encryption feature described below. This is above and beyond the general recommendation that users encrypt their own data before sending it to their provider.

Volume encryption

[DEFERRED] Consider volume encryption Postponed.

Ephemeral disk encryption

[PASS] Consider ephemeral disk encryption Considered.

Block Storage volumes and instance ephemeral ﬁlesystems

[DEFERRED] Consider which options we have available FIXME: Document [PASS] Consider adding encryption Considered.

Network data

[PASS] Consider encrypting tenant data over IPsec or other tunnels Considered. Not a security concern in our case.

Key management

From OpenStack Security Guide: Tenant data privacy - Key management: The volume encryption and ephemeral disk encryption features rely on a key management service (for example, barbican) for the creation and secure storage of keys. The key manager is pluggable to facilitate deployments that need a third-party Hardware Security Module (HSM) or the use of the Key Management Interchange Protocol (KMIP), which is supported by an open-source project called PyKMIP. [DEFERRED] Consider adding Barbican FIXME: Consider and document

1.3. Security 45 iaas Documentation, Release 0.1.0

1.3.17 [2019] Instance security management

REVISION 2019-03-14

Contents

• [2019] Instance security management – Security services for instances

* Entropy to instances * Scheduling instances to nodes * Trusted images * Instance migrations * Monitoring, alerting, and reporting

Impact High Implemented percent 67% (4/6)

From OpenStack Security Guide: Instance security management: One of the virtues of running instances in a virtualized environment is that it opens up new opportunities for security controls that are not typically available when deploying onto bare metal. There are several technologies that can be applied to the virtualization stack that bring improved information assurance for cloud tenants. Deployers or users of OpenStack with strong security requirements may want to consider deploying these technologies. Not all are applicable in every situation, indeed in some cases technologies may be ruled out for use in a cloud because of prescriptive business requirements. Similarly some technologies inspect instance data such as run state which may be undesirable to the users of the system.

Security services for instances

Ref: OpenStack Security Guide: Instance security management - Security services for instances

Entropy to instances

From OpenStack Security Guide: The Virtio RNG is a random number generator that uses /dev/random as the source of entropy by default, however can be conﬁgured to use a hardware RNG or a tool such as the entropy gathering daemon (EGD) to provide a way to fairly and securely distribute entropy through a distributed system. [PASS] Consider adding hardware random number generators (HRNG) We do not consider HRNG necessary for a deployment of this scale. This may be revisited in the future.

46 Chapter 1. Contents iaas Documentation, Release 0.1.0

Scheduling instances to nodes

From OpenStack Security Guide: Before an instance is created, a host for the image instantiation must be selected. This selection is performed by the nova-scheduler which determines how to dispatch compute and volume requests. [PASS] Describe which scheduler and filters that are used For normal workloads, we use the default nova scheduling filters, and all compute hosts are considered equal in features and performance. For specialized resources such as HPC workloads we have different filters.

Trusted images

From OpenStack Security Guide: In a cloud environment, users work with either pre-installed images or images they upload themselves. In both cases, users should be able to ensure the image they are utilizing has not been tampered with. [PASS] Maintain golden images We provide updated upstream cloud images for popular linux distributions, as well as the latest Windows Server versions. [FAIL] Enable instance signature veriﬁcation This is not something that we will prioritize at this time. It also requires the setup and management of additional services.

Instance migrations

[FAIL] Disable live migration While live migration has its risks, the beneﬁts of live migration for outweigh the disadvantages. We have live migration enabled.

Monitoring, alerting, and reporting

[PASS] Aggrgate logs, e.g. to ELK Compute host logs are sent to an ELK stack.

1.4 Howtos and guides

This is a collection of howtos and documentation bits with relevance to the project.

1.4.1 Build docs locally using Sphinx

This describes how to build the documentation from norcams/iaas locally

1.4. Howtos and guides 47 iaas Documentation, Release 0.1.0

RHEL, CentOS, Fedora

You’ll need the python-virtualenvwrapper package from EPEL sudo yum -y install python-virtualenvwrapper # Restart shell exit

Ubuntu (trusty) sudo apt-get -y install virtualenvwrapper make # Restart shell exit

Build docs

# Make a virtual Python environment # This env is placed in .virtualenv in $HOME mkvirtualenv docs

# activate the docs virtualenv workon docs # install sphinx into it pip install sphinx sphinx_rtd_theme

# Compile docs cd iaas/docs make html

# Open in modern internet browser of choice xdg-open _build/html/index.html

# Deactivate the virtualenv deactivate

1.4.2 Git in the real world

Fix and restore a “messy” branch http://push.cwcon.org/learn/stay-updated#oops_i_was_messing_around_on_

1.4.3 Install KVM on CentOS 7 from minimal install

See http://mwiki.yyovkov.net/index.php/Linux_KVM_on_CentOS_7

48 Chapter 1. Contents iaas Documentation, Release 0.1.0

1.4.4 Conﬁgure a Dell S55 FTOS switch from scratch

This describes how to build conﬁgure a Dell Powerconnect S55 switch as management switch for our iaas from scratch.

Initial conﬁg

You will need a laptop with serial console cable. Connect the cable to the rs232 port in front of the switch. Open a console to ttyUSBx using screen, tmux, putty or other useable software. Then power on the switch. After the switch has booted, you can now enter the enable state:

> enable

The switch will default to jumpstart mode, trying to get a conﬁg from a central repository. We will disable it by typing:

# reload-type normal

Now we need to provide an ip address, create user with a passord and set enable password in order to provide ssh access:

# configure (conf)# interface managementethernet 0/0 (conf-if-ma-0/0)# ip address 10.0.0.2 /32 (conf-if-ma-0/0)# no shutdown (conf-if-ma-0/0)# exit (conf)# management route 0.0.0.0 /0 10.0.0.1 (conf)# username mylocaluser password 0 mysecretpassword (conf)# enable password 0 myverysecret (conf)# exit # write # copy running-config startup-config

Now you can ssh to the switch using your new user from a computer with access to the switch’s management network.

Conﬁgure the switch itself

Let’s conﬁgure the rest! We start by shutting down all ports:

> enable # configure (conf)# interface range gigabitethernet 0/0-47 (conf-if-range-gi-0/0-47)# switchport (conf-if-range-gi-0/0-47)# shutdown (conf-if-range-gi-0/0-47)# exit

If you want to use a port channel (with LACP) for redundant uplink to core you can create one. If you don’t, omit all references to it later in the document:

(conf)# interface port-channel 1 (conf-if-po-1)# switchport (conf-if-po-1)# no shutdown (conf-if-po-1)# exit

Assign interfaces to the port channel group:

1.4. Howtos and guides 49 iaas Documentation, Release 0.1.0

(conf)# interface range gigabitethernet 0/42-43 (conf-if-range-gi-0/42-43)# no switchport (conf-if-range-gi-0/42-43)# port-channel-protocol LACP (conf-if-range-gi-0/42-43)# port-channel 1 mode active (conf-if-range-gi-0/42-43)# no shutdown (conf-if-range-gi-0/42-43)# exit

Deﬁne in-band and out-of-band VLANs:

(conf)# interface vlan 201 (conf-if-vl-201)# description "iaas in-band mgmt" (conf-if-vl-201)# no ip address (conf-if-vl-201)# untagged GigabitEthernet 0/22-33,38-41 (conf-if-vl-201)# tagged Port-channel 1 (conf-if-vl-201)# exit (conf)# interface vlan 202 (conf-if-vl-201)# description "iaas out-of-band mgmt" (conf-if-vl-201)# no ip address (conf-if-vl-201)# untagged GigabitEthernet 0/0-10 (conf-if-vl-201)# tagged Port-channel 1 (conf-if-vl-201)# exit (conf)# exit

Congratulations! Save the conﬁg and happy server provisioning:

# write # copy running-config startup-config

1.4.5 Install cumulus linux on ONIE enabled Dell S4810

The project will be using Dell PowerConnect S4810 switches with ONIE installer enabled by default instead of FTOS. This enables easy installation of cumulus linux to the switches.

Conﬁgure dhcpd and http server

You will need a running http server with a copy of the cumulus image:

# ls /var/www/html CumulusLinux-2.5.0-powerpc.bin onie-installer-powerpc

“onie-installer-powerpc” is a symlink to the bin-ﬁle. The symlink is used by ONIE to identify an image to download. Read here about the order ONIE tries to download the install ﬁle: http://opencomputeproject.github.io/onie/docs/user-guide/

Now, for the dhcp server to serve out an IP address and URL for ONIE to download from, dhcp option 114 (URL) is used. This example utilizes ISC dhcpd: option default-url="http://192.168.0.1/onie-installer-powerpc";

This option can be host, group, subnet or system wide. Read more about different dhcp servers and other methods here:

50 Chapter 1. Contents iaas Documentation, Release 0.1.0

https://support.cumulusnetworks.com/hc/en-us/articles/203771426-Using-ONIE-to-Install-

˓→Cumulus-Linux

When you power up the switch, it will by default be a dhcp client and accept an offered IP address, after which you can ssh to the ONIE installer with user root without password. However, if option 114 is speciﬁed, it will download the image and immediatly install it, and then reboot the switch. When the installation is complete, you can ssh to the switch using default cumulus login.

1.4.6 Create Cumulus VX vagrant boxes for himlar dev

This describes how to create (or update) the norcams/net vagrant box which is based on the Cumulus VX test appliance.

Requirements

• An account with access to the norcams organisation on Hashicorp Atlas system at https://atlas.hashicorp.com/ norcams • An account on cumulusnetworks.com to download the vagrant appliance from https://cumulusnetworks.com/ cumulus-vx/download/ • A current vagrant installation with virtualbox and libvirt providers working

Prepare virtualbox and libvirt box ﬁles

Install the vagrant-mutate plugin: vagrant plugin install vagrant-mutate

Download and rename the cumulus vagrant box, then add and convert it: mv Downloads/CumulusVX*virtualbox.box /path/to/norcams-net-2.5.6-virtualbox.box vagrant box add norcams/net /path/to/norcams-net-2.5.6-virtualbox.box vagrant mutate norcams/net libvirt

Verify that the box is available for both providers: vagrant box list

Repackage the libvirt box (this command takes a while to complete . . . ): vagrant box repackage norcams/net libvirt0 mv package.box norcams-net-2.5.6-libvirt.box

You should now have two box ﬁles, one for libvirt and one for virtualbox. ls *.box norcams-net-2.5.6-libvirt.box norcams-net-2.5.6-virtualbox.box

1.4. Howtos and guides 51 iaas Documentation, Release 0.1.0

Publish to Atlas

In order for vagrant autoupdate to work we need to publish both these files on a webserver somewhere and point to their locations from a provider and version configuration on Atlas. • Publish both box files somewhere where they can be downloaded from a public URL. • Log in at https://atlas.hashicorp.com/norcams • Find the norcams/net box at https://atlas.hashicorp.com/norcams/boxes/net • Add a new version, if needed • Create providers for “virtualbox” and “libvirt”. The URL should point at the location of the respective box file, e.g http://somewhere/files/norcams-net-2.5.6-virtualbox.box

1.4.7 Routed, virtual network interfaces for guest VMs on controllers

This describes how to setup a routed network interface for a guest VM running on a controller host. This is an adaptation of the general calico way of setting up networks from neutron data and some information from https: //jamielinux.com/docs/libvirt-networking-handbook/custom-routed-network.html

Requirements

• BIRD running on the controller host pointed at one or more route reﬂector instances and a bird.conf similar to the one on the compute nodes • A VM running in libvirt on the controller host with eth0 connected to the br0 host bridge (mgmt network).

Prepare the outgoing default gateway interface

Trafﬁc originating from inside the guest need to have a gateway to send packets to. This will be a dummy interface with the same IP on each of the controller hosts. In this example we’ll generate a random MAC address in the format libvirt expects and use that to create a dummy dev01 service network IP interface om the host that we will later route to from within the guest. modprobe dummy mac=$(hexdump -vn3 -e '/3 "52:54:00"' -e '/1 ":%02x"' -e '"\n"' /dev/urandom) ip link add virgw-service address $mac type dummy ip addr add 172.31.16.1/24 dev virgw-service ip link set dev virgw-service up

This will bring up a virtual gateway interface that will be able to receive trafﬁc from inside the guest instances on this controller host and deliver it to the kernel to be routed. However, we only want this interface to be used for outgoing trafﬁc FROM the guests. But there is a problem - when we “up” the interface in the last step above an entry for the 172.31.16.0/24 network will be made in the kernel routing table:

[root@dev01-controller-03 ~]# ip route | grep virgw-service 172.31.16.0/24 dev virgw-service proto kernel scope link src 172.31.16.1

This leads to all and any trafﬁc to that network being routed back over the virgw-service interface, we don’t want that. To ﬁx this (and this is what Calico does, too) we remove the route that was created ip route del 172.31.16.0/24

52 Chapter 1. Contents iaas Documentation, Release 0.1.0

We’ve now prepared the virgw-service interface on the controller host to act as a dummy gateway for the service network on guest instances.

Add a tunnel interface connecting the host with a guest VM

First we make a tap interface on the controller host and give it a recognizable name - it seems like only a single dash is allowed in the name? The settings for the device are derived from what calico does on the compute nodes: ip tuntap add dev tap-dev01db02 mode tap one_queue vnet_hdr ip tuntap ip tuntap help

Next, we need to define this tap device in the libvirt domain config for the guest VM. Make sure the domain is not running first. virsh shutdown dev01-db-02

Generate an xml block describing the new guest network interface with a new random mac address - target device should be the tap device we just created on the host