Data Movement in the Grasshopper Operating System Basser
Total Page:16
File Type:pdf, Size:1020Kb
Data Movement in the Grasshopper Operating System Technical Report Number 501 December, 1995 Rex di Bona ISBN 0 86758 996 5 Basser Department of Computer Science University of Sydney NSW 2006 1 1 Introduction Computer environments have changed dramatically in recent years, from the large centralised mainframe to the networked collections of workstation style machines. The power of workstations is increasing, but the common paradigms used to manage data on these workstations are still fundamentally the same as those employed by the earliest machines. Persistent systems adopt a radically different approach. Unlike conventional systems which clearly distinguish between short-term computational storage and long-term ®le storage, persistent systems abstract over the longevity of data and provide a single abstraction of storage. This thesis examines the issues involved with the movement of data in persistent systems. We are concerned with the differences in the requirements placed on the data movement layer by persistent and non-persistent operating systems. These differences are considered in relation to the Grasshopper operating system. This thesis investigates the movement of data in the Grasshopper Operating System, and describes approaches to allow us to provide the functionality required by the operating system. Grasshopper offers the user a seamlessly networked environment, offering persistence and data security unparalleled by conventional workstation environments. To support this environment the paradigms employed in the movement of data are radically different from those employed by conventional systems. There is no concept of ®les. Processes and address spaces become orthogonal. The network is used to connect machines, and is not an entity in itself. We shall show that the adoption of a persistent environment allows us a fundamental advantage over conventional systems in the movement of data through the storage hierarchy: from disk to memory, and from memory on one node to memory on another node. We shall investigate the hurdles that arise through the use of persistence, and show how each hurdle may be overcome. 2 Data Movement in the Grasshopper Operating System We answer the following questions: Do the methods used to move data on non- persistent systems translate to persistent systems, or are other methods more applicable? Are there features of persistent systems that, when exploited, make the task of data movement easier? Are there requirements of persistent systems that, when satis®ed, complicate the task of data movement? This thesis examines the two fundamental areas of data movement on modern computer systems: the movement of data on a single node, and the movement of data between nodes. On a single node, data moves between permanent storage, usually implemented as disk drives, and volatile storage, usually implemented as the RAM of the machine. Between multiple nodes, data moves as a series of packets over some form of network. The fundamental differences between persistent and non-persistent systems are outlined. How these affect the choices available for the implementation of data movement is investigated. This thesis presents solutions to the problems of data movement in persistent systems that capitalise on the advantages of persistence, and also satisfy the additional requirements of persistent systems. The major contribution of this thesis is the design of three new protocols for data movement, one for data movement between main memory and backing store, and two protocols that together provide ef®cient, reliable and causal movement of data between networked nodes. The ®rst protocol is implemented as a stackable module protocol which allows manipulation of pages between disk storage and main memory. The second is an ef®cient and reliable peer to peer network protocol that, combined with the third, a routing protocol, allows causal message delivery. In Grasshopper, and most persistent systems, disks are considered as a repository for pages of data. These pages are identical in length to memory pages, and the only movement of data from disks is to memory and vice-versa. This constraint on the dimensions of data block sizes allows a more uniform and ef®cient protocol to be implemented. Persistent systems place a strict requirement on network traf®c - that it arrives in causal order. This requirement is not found in most non-persistent systems, and causes considerable complications to the routing protocol. Attempts to solve this problem usually involved using a conventional network structure and adding 1. Introduction 3 additional information to each message to enable causality violations to be detected and corrected. This thesis presents a novel design for a network that manipulates the network topology instead of the messages. Messages may only travel along certain constrained paths, which removes the requirement for additional causality information. The thesis is organised as follows. Chapter 2 outlines how conventional systems perform computations, both from a process' and user's perspective. We contrast this in Chapter 3 with how persistent systems are organised, using the Napier persistent language and environment as an example. Chapter 4 introduces Grasshopper, concentrating on the basic abstractions provided by the operating system, and how they relate to those provided by both conventional and current persistent environments. Chapter 5 details how movement of data is performed on a single node. It concentrates on disk to memory movement of data, and the structures allowed to manipulate the passage of data between disk and memory. Chapter 6 is concerned with the movement of data between two nodes in a Grasshopper network. It describes how data is described in a network independent manner. Chapter 7 describes the changes to the model to allow data to be located, once it has moved across the network. Chapter 8 is concerned with the causality considerations necessary to ensure that computation occurs in a sensible order on a distributed system. Finally, Chapter 9 discusses the implementation of the algorithms presented in Grasshopper, and evaluates their effectiveness. 4 Data Movement in the Grasshopper Operating System 1. 5 The Computational Model 2 of Conventional Systems This chapter examines the computational model supported by conventional (non- persistent) systems. Our main focus is the models used for data movement, concentrating on disk storage and network movement of data. Since Unix [56] is the most popular workstation operating system, it is used as an example throughout this chapter. Where other systems have substantially different models these differences will be highlighted. The Unix system supports different models at different levels. The model made available to users is different to that presented to processes in the system, most notably in the area of networking. We shall examine both of these views, as a comparison provides an important insight into the viability of persistent systems in the design space of computer systems. 2.1 The Process View of Unix A process in a Unix system exists in a transient virtual address space. The only permanent storage paradigm in Unix is the ®le system. Processes usually communicate through either the ®le system or a network system. The Unix Signal system can be used as a primitive communication mechanism, and, as we shall see later, newer versions of Unix allow memory to be shared between processes. In the Unix model the memory, ®les, and network each have separate namespaces. Each is speci®ed using different interfaces and different programming abstractions. Memory is accessed using standard processor operations. Files are accessed using the ®le name as a key and ®le-based read and write system calls. The network is accessed using a network address and port number pair and network-based read and write system calls. 6 Data Movement in the Grasshopper Operating System Memory for a process in a Unix system is a purely ephemeral structure. A process is created with a new address space into which is placed the text and data on which the process is to operate. In addition, a stack is created within the address space and is used to maintain procedure call linkage information. A process functions by performing memory reads and writes, and performing computations on the data. Memory in a Unix process is tagged with permission attributes; a process can read, or write various portions of the address space depending upon the attributes for that area of memory. Usually the text of a process is tagged as read only, and the data and stack areas are tagged as read/write. If a process wants to interact with permanent storage the process has to perform ®le reads and writes. Entities represented in the ®le system have types, including ®les, directories, and devices. Different system calls are used to access each type of entity in a ®le system. The disks on a Unix system are used to hold the ®le system. The namespace of the ®le system is different to that of memory. In memory, the namespace is the addressing range of the processor, but in the ®le system the namespace becomes a hierarchical collection of textual names. The ®le system is a kernel mediated object that associates portions of the disk with textual names. All accesses to ®les are through a kernel based interface. The namespace that a disk presents to a process is a collection of these ®les. If a different view of the disk is desired, the kernel of the operating system has to be changed. Networking was added to Unix in the Berkeley [32] versions of the operating system. In the Berkeley system the network was viewed as a collection of byte streams by a process. Each of these streams could be connected to other machines, and data passed along these streams. The namespace via which a process accessed the network was different from that of both memory and ®les. The network namespace was the union of the namespaces of the various protocols the kernel implements.