GECEM Work Package 4 – Secure Remote Execution Report

Chris I Dalton ([email protected]) Trusted Systems Lab / HP Labs October 2004

DESCRIPTION WP4 will enable an investigation into the properties necessary for secure remote execution of applications on grid type infrastructures. Specifically, WP4 will consider the security requirements of protecting both application code (algorithms) as well as application data and results from remote parties hosting the application computations. We will also consider protection from arbitrary third parties.

DELIVERABLES A report describing the properties necessary for secure remote execution of applications with the specific security requirements of WP4 in mind.

The report will also look at whether those properties can be achieved with current technology. If those properties cannot be achieved with current technology, the report will explore how the security requirements of WP4 might be relaxed so that the properties can be satisfied. We will look at the implications of any relaxation of security requirements proposed.

1 Introduction The GECEM (Grid-enabled computational electromagnetics) project is part of the UK e-science program aimed at exploring the use of for advanced collaborative simulation-and- visualization in aerospace and defence design. The project collaborators are BAE Systems, the Singapore Institute of High Performance Computing (iHPC), Cardiff University, University of Wales, Swansea and Hewlett-Packard.

This report focuses on the enhanced security requirements of the GECEM Grid environment, specifically the mechanisms necessary for enabling the secure remote execution of highly-proprietary applications. This is seen as an important Grid enabler, ensuring the protection of commercial organizations’ sensitive intellectual property, while still allowing them to take advantage of the Grid.

We start by describing the GECEM demonstration architecture, showing how the need for secure remote execution fits into this architecture. We then give a brief overview of the current state of the Grid Security space, specifically the security framework of Globus GT2/GT3. This is the most popular Grid Middleware toolkit and is the middleware being used as a base for the GECEM architecture implementation.

We move on to describe in more detail the specific requirements for secure remote execution. We map those requirements onto functionality provided by the Globus security framework and highlight the areas in which the framework is insufficient in meeting the needs of secure remote execution. The main body of the report then explores possible architectures and security mechanisms that would meet the needs of secure remote execution. We show that some of these requirements are easier to achieve than others and some are currently not possible to satisfy effectively without further research work.

Finally, we take an alternative look at the issues around secure remote execution from a trust perspective. In the absence of sufficiently strong security mechanisms we discuss to what extent more pragmatic trust mechanisms may be used towards the goal of satisfying our secure remote execution needs. In particular we look at the concept of a virtual organization (VO) and possible technological contributions that can be used to strengthen that concept sufficiently to meet our needs.

2 GECEM Architecture In this section we describe the demonstration architecture of the GECEM project by way of an example workflow. We show where the requirement for secure remote execution fits into that architecture.

The envisaged GECEM workflow process starts with engineers at BAE Systems generating a geometry file for the particular electromagnetic problem they are working on. A meshing service located at the University of Wales, Swansea is used to generate a mesh based on that geometry file generated by BAE Systems. A solver service is then invoked which passes both the mesh data and a solver executable application to a remote site (iHPC Singapore). The solver application is then run on a machine at Cardiff using the mesh data. Output files are then archived. Figure 1 shows the main trust boundaries and data flows based on the GECEM demonstration architecture.

BAE USER Systems client Geometry Generator

start(geometryURI) return solverURI

Geometry Portal Data Service Service solve(meshURI) return solverURI Cardiff mesh(geometryURI) return meshURI Solver Solver Data Archive Service Service

Mesher ? Service Swansea

Mesher Data Simulation Service Service Singapore Swansea

BAE data flows Input/Output control messages Swansea data (code) flows

Figure 1 - GECEM Trust Boundaries and Data Flows

The need for a secure remote execution capability relates to the step of migrating the solver application executable code from Swansea to a machine in iHPC Singapore where it is actually run. The application itself contains valuable intellectual property belonging to Swansea and it is vital that it can be afforded sufficient protection as it is transferred and executed outside of the Swansea domain of trust.

3 Globus Grid Security Overview In this section we look briefly at the Globus GT2/GT3 security framework [1] and the functionality it offers in the context of the GECEM architecture. GT2/GT3 is grid middleware toolkit used by the GECEM project.

The heart of the GT2/GT3 security framework is GSI (Grid security infrastructure). The functionality provided by GSI broadly spans the security areas of authentication, authorization and confidential communication. GSI allows for security relationships to be set up that cross multiple organisational boundaries. It also allows for the delegation of security credentials so that Grid computations comprising of a number of different Grid resources can be carried out whilst only requiring a single “sign-on” by the initiating Grid user. GSI uses cryptography, notably certificate based public key cryptography to achieve its aims.

In the context of the GECEM architecture shown in figure 1, GSI can be employed to solve a number of security problems. It can be used to ensure that communications between the various trust domains remain confidential. The mutual-authentication capability of GSI can be used to ensure that, for example, the geometry data from BAE Systems is actually being passed to the mesher service in Swansea and not some impostor trying to steal the sensitive BAE Systems data. The authorization and delegation features of GSI can be used to ensure, for example, that Swansea has the right (on behalf of an engineer in BAE Systems) to run their simulation code on a machine in iHPC Singapore.

The GT2/GT3 security framework built around GSI is clearly a useful base for easing some of the main GECEM security concerns but as we shall see in the next section it not sufficient in itself for meeting all the needs of secure remote execution.

4 Secure Remote Execution In this section we go through an analysis of the GECEM security requirements surrounding the outsourcing of computation to remote computing infrastructures. We consider the needs of both the guest party (the owner of the computations) and the host party (the party that will run the computations on behalf of the guest).

Having outlined both the guest and host party security requirements we discuss some possible architectures that go some way to addressing those issues. Using the services offered by the Globus GT2/GT3 security framework as a base, we suggest what additional components are required to provide a reasonably secure solution.

We assume that the guest party and the host party infrastructures are independent and are separately owned and managed. We also assume that the host party does not provide dedicated resources to the guest party – that the guest party computations share the infrastructure used by the host party for host party’s own needs. Finally, to keep the discussion concrete we assume is used as the base platform by the host party1.

4.1 Security requirements for secure remote computation For analysis purposes, the security requirements of remote computation can be broken down into two areas:

• The guest party requires that the remote computation be carried out in a secure manner. By that we mean that the remote hosting party should not be able to inspect application code (algorithms), the application data and also the results produced by the application whilst it is running on platforms under control of the host party.

• The host party’s own computations, data manipulation, communications, etc. must be protected from the guest party when the guest party is running code on the host party’s computing infrastructure. It must be possible to contain the behaviour and resource access of the guest party within the host party infrastructure. For example, it should not be possible for the guest party to inject a virus into the host party’s infrastructure, or to access sensitive or private host party data.

Some of these requirements are easier to achieve than others and some are currently not possible to satisfy effectively without further research work.

1 Although the same arguments apply to other OSes such as HP-UX, Sun Solaris, win2k, etc.

Possible Architectures Any overall solution architecture to allow secure remote computation must encompass at least the following four aspects:

• The remote platforms (i.e. physical computer and operating system) used to actually carry out the computations must have the right properties to satisfy the given guest and host party security requirements.

• There must be some way for the guest party to verify that the host party platform(s) have the required properties.

• The host party should be able to verify that the guest party is legitimate user of its resources.

• The system as a whole should be protected from third party attacks.

The next section looks individually at how these four main aspects of a solution architecture for secure remote execution might be realised.

4.2 Remote Platform properties The security properties of the remote platform where the computations will run are obviously a key part of any solution. The basic security properties of a platform are determined largely by the operating system that runs on the platform. In our case we are considering platforms running the Linux operating system. Multi-user operating systems like Linux and Windows 2000 use the hardware protection features of the CPU chip to provide sharing and protection of the hardware resources such as memory and disk on the platform.

applications 3 2 1 0 kernel

Figure 2 - CPU protection mechanisms

CPUs such as the Intel IA-32, Itanium or Sun Sparc can run in either privileged or non-privileged mode. The CPU only allows direct access to the hardware resources of the machine when the CPU is in privileged mode.

Normal programs and applications only get to run in non-privileged (user) mode. The operating system code itself (the kernel) runs in privileged (kernel) mode and hence can access all the platform hardware resources. For a user program to get access to the hardware resources of the system it must use the kernel as an intermediate layer. This allows the kernel to act as a security reference monitor to mediate and control access to the platform’s resources.

Figure 2 shows the Intel IA-32 and Itanium ring-based implementation of hardware protection mechanisms. The OS kernel runs at ring 0. Ring 0 is privileged and has direct access to the system

hardware. Applications run at ring 3 – they have to go through the kernel via a system call to access the hardware. Rings 1 and 2 are not used by most OSes running on IA-32 or Itanium.

Operating system kernels layer software protection mechanisms on top of the hardware mechanisms to provide user level security features. Typically, these include user process separation and discretionary file access control mechanisms.

Satisfying the remote platform properties Standard Linux (Unix based) systems out of the box do not give us the platform properties we desire for secure remote execution [2]. However, various groups (commercial and non-commercial) have produced security extensions for Linux that go some way towards satisfying our requirements. To see what extensions can help we need to include in our analysis the type of access each of the parties will need to the remote platform used for the computations:

• The guest requires only normal user access to the remote platform though it may need to run privileged software on that system.

• The host party requires normal user access to the platform (assuming the platform is concurrently used by the members of the host party for the host party’s own purposes).

• The host party also requires administrative access to the platform since it is part of its local infrastructure.

The fact that the host party has administrative access to the platform carrying out the computations makes it much harder to satisfy the properties of the remote platform required by the guest party.

Satisfying the properties required by the Host party Largely, the host properties can be achieved using the well-understood techniques and methods of containment [3]. It should be noted that we are talking about containment mechanisms that are implemented in and enforced by the kernel but that apply to user mode processes / services. With such containment properties we can set strict bounds on just what resources (such as files, processes and network access) are available to any user mode software running on the platform, whether it is privileged or not. NSA SE-Linux and Amon Ott’s RSBAC Linux are two open source extensions to Linux that can be configured to provide such a containment property. Commercial products are available such as HP-UX 11iv2 [4]2 or the Grid MP workstation that offer the ability to contain (or sandbox) Grid services running on a host platform out of the box.

Satisfying the properties required by the Guest party These are more difficult to satisfy since the guest party has no control over the administration (or super user) rights of the remote host party platform. The techniques of containment can again be used to keep normal users separate on the same platform, i.e. to stop normal host party users spying or interfering with the guest party computations. However, it is potentially possible for an administrator on the system to subvert the containment controls. If the guest party considers this a risk then a more complex platform setup will be required.

The administrative user on a system can typically reconfigure the platform operating system at will. This can include adding arbitrary kernel modules (code) to the system. The administrator can also often gain access to the sensitive kernel resources of the system via privileged interfaces3.

These facilities can allow the administrator of a system to spy and interfere with any process (application) running on the system. The basic problem is that there is no containment at the kernel level. Code running in kernel mode can access all resources on the system without constraint. Since administrators can typically get any code to run in kernel mode it is difficult to establish any firm platform security properties when the administrator of that platform is not trusted.

2 HP-UX 11iv2 has enhanced security containment functionality based on [3] 3 Such as /dev/kmem on Linux

Solutions to protecting remote processing from administrator level attack are still the subject of research but there are some potential solutions on the horizon. Probably the most interesting area is around platform virtualization. Figure 1 shows a virtualization strategy for the IA-32 or Itanium processor architecture4. A new software component, the VMM (Virtual Machine Monitor), runs at level 0. The operating system kernel such as Linux is moved to run at level 1 instead of 0. User space code remains running at level 3. Since the operating system is now running at level 1 it no longer has direct access to the machine hardware.

applications

3 kernel 2 1 0 VMM

Figure 3 - Platform virtualization

Virtualization allows the same compute hardware to run multiple operating systems simultaneously [5]. The VMM provides each operating system with an abstraction of the real machine hardware called a Virtual Machine (VM). The operating systems effectively run on top of a VM. Using virtualization technology a single compute platform can provide several VMs. The VMM is used to co-ordinate and share access to the real hardware between the various VMs running on top of it. Each operating system running on top of a VM is kept separate from other OSes running on other VMs on the same system (figure 4). The VMM acts as an extra control point where we can restrict and control what an operating system can do with the hardware resources of the system such as memory. One implication of this is that the administrator of one OS on the system does not have control of any other OS running on the same system.

Apps Apps Apps

Linux Linux Windows Virtual Machine Monitor (VMM) Hardware

Figure 4 - running multiple OSes on a single machine

Academic and commercial teams are already looking at the applicability of virtualization to Grid computation [8]. Virtualization offers an alternative way to satisfy the host party and some of the guest

4 The new virtualization hardware assist from Intel (VTi) [6] and AMD (Pacifica) [7] provide an additional privileged operating mode that avoids the need to adjust the ring levels of virtualized OSes.

party requirements mentioned earlier. With virtualization, guest access to remote host compute resources can be contained at the VM level. For example, a grid service can be run in its own dedicated operating system running on top of a VM. Other users can be safely given their own VM and operating system on the same machine safely separated from the grid service.

Whilst offering useful guest containment properties, current virtualization systems do not totally address the problem of un-trusted administrator access to the machine. Current virtualization systems assume that the owner of the platform, i.e. the host party, also has complete administrative control over the VMM layer. Additionally, current architectures do little to counter physical attacks on the system. Since the remote platform physically resides within an environment controlled by the host party then this is a significant risk.

A potential solution lies in the ability to treat the VMM as a trusted third-party that acts on behalf of both the guest and host entities and has a certain set of mandatory properties that can not be violated by either side. The upcoming Intel LaGrande chipset [9] for IA-32 and Itanium (and AMD [7] equivalent) greatly enhances the virtualization and hardware physical protection facilities (such as encrypted bus traffic) available for a virtual machine monitor to make use of. HP Labs and Microsoft amongst others are actively researching the this area.

4.3 Remote platform property verification This maybe as simple as the guest being able to authenticate using the Globus GT2/GT3 mechanisms that the remote platform belongs to the host party and then trusting it has the right properties. A stronger approach is to mandate that the remote platforms contain a TPM/TCG [10] component. This would allow the guest party to carry out rigorous query and determination of remote platform properties before submitting computations to the host party infrastructure. Computing platforms containing a TPM component are already available from IBM and HP amongst others.

4.4 Remote platform access control It is possible for the remote host party to authenticate the guest party to make sure that they are authorized to run computations on the platform using well-understood crypto techniques. Globus GT2/GT3 includes support for this.

4.5 Third Party attack prevention This can be largely taken care of by a secure communications link between the guest party and the host party and a sufficiently resilient remote host platform that can prevent unauthorized access. Globus GT2/GT3 includes support for a secure communications link.

5 Trust and Secure remote execution As the last section noted, strictly satisfying the needs of secure remote execution is currently a research issue though there are promising technologies on the horizon. In this section we look at an alternative, perhaps more pragmatic, solution to the problem of secure remote execution that is based on trust mechanisms rather than absolute security mechanisms.

Referring back to figure 1, the need for secure remote execution occurs because of the migration and running of executable code outside of the trust domain of the owner of the code, namely Swansea. An alternative approach to solving the problem is to consider extending the Swansea trust domain so that it encompasses the machine that will run the code. This reduces the problem of secure remote execution to one of deciding on what basis to extend trust to another organization.

The notion of extending trust boundaries across organizational boundaries is commonly referred to as forming a Virtual Organization. The grid concept itself can be viewed loosely as giving the ability to form a virtual organization [11]. Mechanisms already exist for extending trust across organizational boundaries [11],[12]. However, reasonable mechanisms for establishing a basis on which to extend trust to another organization are currently somewhat lacking. Likewise, mechanisms for the on-going verification that an extension of trust is justifiable are also lacking.

This is an important consideration for GECEM. Whilst extending trust to an organization allows us to side-step the problems associated with secure remote execution, blindly trusting other organizations

could lead to the leakage of the valuable intellectual property contained in the executable applications we wish to protect.

The concepts of risk management seem to be very appropriate in aiding the decision to extend a trust boundary across organizations. There are at least two areas where technology may help here. The push towards the ability to create virtual overlay networks [13][14] offer an infrastructure level way (as opposed to a middleware software such as Globus GT2/GT3 way) of flexibly limiting the exposure of an organizations IT infrastructure to other operational partners. Secondly, model based assurance and verification reporting frameworks that work at the level of the business risk that an organization worries about can be used to carry out a `what-if?` analysis to establish the consequences of sharing part of a computing infrastructure with other organizations[15].

6 Conclusion Achieving truly secure remote execution where both data and application executables can be hidden from remote parties during computation is currently not completely achievable. The approach of applying trust rather than security mechanisms to address the secure remote execution problem looks like it may provide an acceptable (in terms of business risk) alternative solution.

7 References

[1] V. Welch, F. Siebenlist, I. Foster, J. Bresnahan, K. Czajkowski, J. Gawor, C. Kesselman, S. Meder, L. Pearlman, S. Tuecke. 2003. Security for Grid Services Twelfth International Symposium on High Performance Distributed Computing (HPDC-12), IEEE Press.

[2] Dalton, C.I., Choo, T.H. and Norman, A.P. 2002. Design of Secure Unix, Elsevier Information Security Technical Report, Volume 7 Number 1, ISSN 1363-4127.

[3] Dalton, C.I. and Choo, T.H., 2001. An operating system approach to securing E-Services. Communications of the ACM, Volume 44 Issue 2, ISSN: 001-0782, February 2001.

[4] HP-UX 11iv2 Security Containment (http://h18000.www1.hp.com/products/quickspecs/12184_div/12184_div.HTML)

[5] http://www.vmware.com/

[6] http://www.intel.com/technology/computing/vptech/

[7] http://enterprise.amd.com/Enterprise/serverVirtualization.aspx

[8] http://www.cl.cam.ac.uk/Research/SRG/netos/xeno/

[9] http://www.intel.com/technology/security/

[10] http://www.trustedcomputinggroup.org

[11] I. Foster, C. Kesselman, S. Tuecke. The Anatomy of the Grid: Enabling Scalable Virtual Organizations International J. Supercomputer Applications, 15(3), 2001.

[12] Ivan Djordjevic, Theo Dimitrakos, Chris Philips An Architecture for Dynamic Security Perimeters of Virtual Collaborative Networks Proceedings of the 9th IEEE/IFIP Network Operations and Management Symposium (NOMS 2004) IEEE CS, (April 2004)

[13] Xuxian Jiang, Dongyan Xu, "VIOLIN: Virtual Internetworking on OverLay INfrastructure", Department of Computer Sciences Technical Report CSD TR 03-027, Purdue University, July 2003 (to appear in LNCS Vol. 3xxx, Springer).

[14] Mahesh Kallahalla, Mustafa Uysal, Ram Swaminathan, David E. Lowell, Mike Wray, Tom Christian, Nigel Edwards, Chris I. Dalton, and Frederic Gittler. SoftUDC: A Software-Based Data Center for Utility Computing. IEEE computer. Novemember 2004

[15] Baldwin, A, Beres, Y, Plaquin, D and Shiu, S. Trust Record: High-level Assurance and Compliance. 3rd International Conference on Trust Management (iTrust 2005). May 2005