The Walnut Kernel

A PasswordCapability Based Op erating System

Thesis submitted for examination

for the Degree of

of Philosophy Do ctor

Department of Computer Science

Monash University

Jan uary

Maurice David Castro

BScHons Monash University

i

Abstract

This thesis describ es b oth a capability based kernel the Walnut Kernel and

hardware designed to supp ort that kernel The kernel provides an environment in

which programs written and op erated by mutually antagonistic users can coexist

securely A PasswordCapability mechanism is used to provide access control to

ob jects The hardware is designed to supp ort cost ecient expansion of the numb er

of pro cessors within a multipro cessor

The Walnut Kernel was designed to b e p ortable to a wide range of micropro ces

features sors and memory architectures The kernel avoids using pro cessor sp ecic

where there are more widely available mechanisms Paged memory management

was adopted as the basic access control mechanism b ecause of the large numb er

of pro cessors which supp ort it This resulted in a page sized protection granular

ity The architecture of the kernel was designed to scale from unipro cessors to large

multipro cessors To accommo date this design requirement device drivers on unipro

cessor systems use a shared memory page to communicate with the kernel This is

equivalent to sp ecial purp ose pro cessors sharing memory with a master pro cessor on

a multipro cessor The eciency of multipro cessor implementations was increased

by decreasing interpro cessor communication Page tables and other system data are

p erio dically expired and new tables constructed This practice of timing out data

ensures that lo cal data is kept uptodate and only a minimal amount of lo cal state

urthermore this practice eliminates the need to inform other pro cessors is retained F

of changes in lo cal information

version Two implementations of the Walnut Kernel are currently in service One

op erates in an emulated environment under a host op erating system The other

version op erates on i based IBM PCs The p erformance of the latter version

compares well with contemp orary op erating systems

from earlier passwordcapability based systems in that it intro The kernel diers

duces op erators for the selective removal of rights from a capability and mechanisms

which restrict the use of a capability to a sp ecic pro cess Messagepassing and sub

pro cess mechanisms have b een intro duced to enhance the handling of asynchronous

events The changes were motivated by the requirements of programmers using the

ii

system

Application programs have b een written to demonstrate the features of the ker

y diskette nel Included among these user level programs are managers for opp

drives and for the screen and keyb oard The programs and the programming

techniques used are describ ed

The prop osed hardware has eliminated centralised switching devices in favor of

distributing the pro cessor interconnection hardware across the no des of the mul

h pro cessor no de has its own clo ck which removes the physical tipro cessor Eac

constraints asso ciated with a centralised clo ck

Archi The kernel and the prop osed hardware are b oth part of the Secure RISC

tecture pro ject of the Department of Computer Science Monash University

iii

knowledgments Ac

I would like to thank Professor Chris Wallace for sup ervising the work describ ed

assistance suggestions and guidance throughout the pro ject is in this thesis His

gratefully acknowledged

Many of the sta and p ostgraduate students of the Department of Computer

Science have contributed to work related to the Walnut Kernel and the prop osed

hardware In particular I would like to acknowledge Mr Glen Pringle for program

Mr Carlo ming work conducted on b oth the Walnut Kernel and user level programs

Kopp for his work on user level libraries Dr Ronald Pose for work relating to the

prop osed hardware and a constant stream of suggestions for kernel features and Mr

Gerhard Fries and Mr David Duke for their technical assistance in areas relating to

hardware

The nancial assistance provided by an Australian Postgraduate Research Award

has b een appreciated

RISC Architecture pro ject was supp orted by a grant from the Aus The Secure

tralian Research Council A

iv

Declaration

This thesis contains no material which has b een accepted for the award of any

other degree or diploma in any university or other institution

To the b est of my knowledge this thesis contains no material previously pub

lished or written by another p erson except where due reference is made in the text

of the thesis

Where the work in this thesis is based up on joint research this thesis discloses

the relative contributions of the resp ective authors

Maurice D Castro

Department of Computer Science

Monash University

Clayton Victoria

Australia

January

Contents

Intro duction

Overview

Capabilities

Persistent Systems

Threads

A PasswordCapability System

The Kernel

The Hardware

Survey

Conventional Op erating Systems

Current Op erating Systems

Amo eba

Mach

Plan

QNX

Angel

Chorus

Capability Based Op erating Systems

Monads

KeyKOS

Grasshopp er

Opal v

vi CONTENTS

Mungi

Observations and Trends

The Walnut Kernel

The User Persp ective

Volumes Ob jects and Capabilities

Pro cess Address Space

Pro cesses and Subpro cesses

Messages and Mailb oxes

Money

Kernel Calls

Exceptions

Controlling Pro cess Scheduling

Subpro cess Zero

Design of the Walnut Kernel

Design Principles

Avoid features available only on small classes of pro cessors

Avoid stalls in the kernel while waiting on external events

Minimize retained kernel state variables

Ensure the kernel is scalable

Allow for a variety of shared memory architectures

Passive Elements

Disk Structures

Memory Structures

Pro cesses

Kernel Data Structures

Active Elements

Ob ject Memory Management

Capability Management

Pro cess Memory Management

Message Management

CONTENTS vii

Pro cess Scheduler

Persistent Elements

Private Page Tables Small Windows

System Architecture

SystemCall Interface

Subpro cess Zero

Device Drivers

Kernel Scheduler

Discussion

Design Issues

Pages versus Segments

Multiple Pro cessors

Messages

Pro cesses and Subpro cesses

ExecuteOnly Co de

Hardware

System Initialization

Money

Pro cess Scheduler

Implementation

The Standalone Implementation

i Implementation

Loading Programs into the Walnut Kernel

Performance

Test Environment

Software

Hardware

Timing

Walnut Kernel

Program

viii CONTENTS

Results

Program

Results

Program

Results

UNIX

Program

Results

Program

Results

Observations

Walnut Kernel Behavior

Messages IPC

Address Space Management File Management

Ob ject Management File Management

Pro cess Management

Conclusion

User Lev el Programs

Structures

Program Structures

Data Structures

Legacy Co de

Shared Libraries

Initpro c

Glui

Shell

Wyrm

Security

Ob jects

Restrict

CONTENTS ix

Serial Numb ers

NonRandom Passwords

SRMULTILOAD Right

Protected Freeze and Thaw

are Prop osed Hardw

Design Goals

Architecture

No de Design

Functional Description

Op erational Description

Design Features

Arbitration

Continuing Future Work

Software

Kernel

User Co de

Hardware

Conclusion

A User Level Programmer s Guide

A Overview

A Ob jects

A Capabilities

A View

A User Rights

A System Rights

A Deriving Capabilities

A Pro cess Structure

A Pro cess Address Space

A Parameter Page

x CONTENTS

A The Wall

A Pro cess Structure Conventions

A The Pro cess Ob ject

A The Pro cess

A Pro cess Creation

A Making Pro cesses

A Initial Pro cess State

A Subpro cess Zero

A Freeze

A Thaw

A Wakeup

A Co o ee

A Protected Freeze

A Protected Thaw

A Subpro cesses

A Anatomy of a Subpro cess

A Op erations on Subpro cesses

A Scheduling

A Messages and Mailb oxes

A Sending Messages

A Receiving Messages

A Mailb oxes

A Exceptions

A Typ es of Exception

A Trap Handling Subpro cesses

A The Trap Message

A System calls

A Pro cedure

A Available System Calls

B Formal Description of Restrict

CONTENTS xi

C Hardware Description

Intro duction

Design Goals

Design Decisions

Satisfying the Design Criteria

Design Criteria

Consequences

Multipro cessor No de

Functional Description

Op erational Description

Design Features

The Arbiter

Conclusion

D Glossary

xii CONTENTS

List of Figures

A Capability in the PasswordCapability System

Blo ck Diagram of the PasswordCapability Systems Hardware

Logical Address to IAS Translation

IAS Address to Physical Address Translation

Comp onents of an Amo eba Capability

Logical Representation of an Amo eba Directory

Basic Amo eba Directory Hierarchy

A Mach Task

The Initial Name Space of a Pro cess

QNX Multipart Messages

The Chor us Nucleus

A Chorus Capability

Mapping Monads Segments to Pages

Addressing Memory Under Monads

Mapping containers under Grasshopp er

Invo cation under Grasshopp er

Capability Trees Under Mungi

An ob ject

Structure of a Walnut Kernel Capability

Derivation

Pro cess Address Space

Map of the Pro cess Ob ject

ATree of Capabilities xiii

xiv LIST OF FIGURES

Structure of Parameter Blo ck

Pages Ob ject Header Blo cks

The Header Data Structure

The Captab ent Data Structure

The VolTabEnt Data Structure

The Fizen t Fizentt Data Structure

The Pro chd Data Structure

The Tlcent Data Structure

The Subpro cent Data Structure

t Data Structure The Messen

Layout of the Pro cess Ob ject

The Scratch Data Structure

The usage of data structures by kernel functions

Disk Layout

Windows and Ob jects

Comp onents of the Walnut Kernel

Organization of the Walnut Kernel

Prop osed Scheduling Scheme

Timer and Interrupt Hardware in the IBMPCAT

Comparison of External Send to Write Op erations

Comparison of Transfer Time Between Two Pro cesses

Comparison of File Op eration Times to Ob ject Op eration Times

Comparison of File Creation Time to Ob ject Creation Time

Comparison of File Deletion Time to Ob ject Destruction Time

Comparison of Fork to Make Pro cess Time

Pseudo co de for a Message Lo op

A Circular Buer

Implementation of Lo cal Storage for Shared Library Co de

Find Capability Index for Executing Co de

Keyb oard and Screen IO

LIST OF FIGURES xv

Comparison of Walnut Kernel and PasswordCapability System Ob jects

Comp onents of the Walnut Kernel Serial Numb er

Bus Structures

Top ology of Pro cessor Interconnections

Blo ck Diagram of Multipro cessor No de

Partitioning of Addresses

Contents of Lo okUp Tables

A APassword Capability

A Pro cess Address Space This diagram describ es the ma jor features

of the address space seen by a pro cess op erating on a system with

kilobyte pages The message area and the parameter blo ck are

collectively known as the parameter page

A Parameter Blo ck Declaration

A System Rights Constants

Kernel Call Constants A Dened

ject This diagram describ es the ma jor features of the A Pro cess Ob

pro cess ob ject

A Subpro cess Zero Functions and Arguments

A Pro cess Status

A Structure of the Failure Message

B A Subtree of a Capability Tree

B A Subtree of a Capability Tree Enhanced Notation

O

0 1 0

B ATree with the Heap Prop erty for C

Blo ck Diagram of Multipro cessor No de

Partitioning of Addresses

Contents of Lo okUp Tables

xvi LIST OF FIGURES

List of Tables

Summary Table for Reviewed Op erating Systems

System Rights

A Error Identier Values xvii

Chapter

Intro duction

This thesis describ es a capability based op erating system develop ed in the Depart

ment of Computer Science at Monash University The op erating system is commonly

known as the Walnut Kernel

In addition to b eing an indep endent pro ject to develop a p ortable capability

based op erating system the Walnut Kernel forms the software basis of a ma jor

pro ject within the Department to develop a scalable multipro cessor system The

Secure RISC Architecture pro ject builds on work done towards the development

of the Monash Multipro cessor also known as the PasswordCapability System

And Pos APW AW which resulted in a shared memory mul APW

tipro cessor system with a novel capability based op erating system The Secure

RISC Architecture and the Walnut Kernel were inspired by the concepts b ehind

the Monash Multipro cessor however due to limitations imp osed on the original de

sign the two new pro jects started with the successful concepts imparted from their

predecessors

Overview

The PasswordCapability System an ancestor of the current pro ject is discussed

in chapter The kernel and the purp ose built hardware used by that system to

supp ort the use of capabilities are discussed

The survey chapter is divided into three typ es of op erating system conven

CHAPTER INTRODUCTION

tional current and capability based Conventional op erating systems are typied

by UNIX employ monolithic kernels and are le based The current op erating

system section describ es six recent op erating systems Although these systems may

use capabilities for some parts of their op eration they do not use capabilities as

their only access control mechanism The op erating systems discussed in this sec

tion include representatives of distributed op eration systems and microkernel based

op erating systems The capability based op erating system section discusses ve

capability based op erating systems The ma jority of these are based on segregated

architectures however two passwordcapability based systems are discussed

The Walnut Kernel is intro duced in chapter

Chapter describ es the Walnut Kernel from b oth the application programmer

and the user p ersp ectives The topics covered include a description of the roles of

volumes ob jects and capabilities the layout of a pro cesss address space interpro

cess communication and controlling pro cess scheduling A detailed list of kernel

calls and their parameters is contained in App endix A

The design of the kernel is discussed in chapter This chapter identies the

design principles that motivated design decisions describ es the partitioning of the

kernel into functional comp onents and identies key design decisions and their im

pacts The architecture of the system is discussed in detail covering b oth functional

decomp osition and data structures The subpro cess mechanism an innovation not

present in the PasswordCapability System is describ ed in that chapter and the

motivation for its inclusion identied

The Walnut Kernel currently exists in two forms One version of the kernel

is hosted by a conventional op erating system while the second interacts directly

with hardware Chapter discusses the two versions and mechanisms for loading

programs into a system without a host op erating system

A series of measurements was taken on equivalent hardware platforms for the

Walnut Kernel and a BSD based UNIX These measurements are used to compare

the sp eed of op erations common to b oth op erating systems Chapter provides a

guide to the p erformance of the Walnut Kernel compared to a mature op erating

1

UNIX is a trademark of XOp en

OVERVIEW

system It is exp ected that the relative p erformance of the Walnut Kernel will

improve after optimisations are investigated and implemented The control a BSD

based UNIX was optimized by its implementors

The techniques employed in writing programs for the environment provided by

the kernel are examined in chapter The implementation of shared libraries and

access to legacy co de from UNIX systems are also discussed Four examples of user

level programs are presented to complete the picture of the environment

A numb er of features that aect the security of the system has b een intro duced

into the design of the Walnut Kernel Examples of departures from the Password

Capability System intro duced into the Walnut Kernel include the ability to derive

capabilities with known passwords and the restrict system call Chapter de

scrib es the features which have the p otential to aect the security of the system

and provides an analysis of the eect of those changes

The Secure RISC Architecture pro ject involves elements of hardware and soft

ware Chapter describ es the hardware mechanisms devised by Dr Ronald Pose

and the author This work forms the basis of a design for a scalable multipro cessor

with a distributed switch Use of deep FIFO buers to enhance throughput and

to allow each b oard in the system to have its own clo ck generator enhances the

redundancy and reliability of the design

Chapter describ es work b eing p erformed by others based on the concepts

outlined in this do cument It covers continuing work in the area of software run

ning under the Walnut Kernel enhancements to the kernel and the development of

hardware based on the concepts outlined in

In the conclusion chapter the Walnut Kernel and the hardware prop osed to

supp ort it are discussed in the context of the PasswordCapability System chapter

and the existing systems describ ed in chapter Arguments are summarised and

conclusions are drawn in relation to the success of the work as a whole

Walnut Kernel User Level Programmers Guide App endix A is a copy of The

by Maurice Castro Cas This do cument provides a detailed description of the

environment provided by the kernel to programmers

2

This system call allows rights to b e removed from a capability after the capability has b een created

CHAPTER INTRODUCTION

A formal description of the restrict op eration is given in app endix B The

formal description supplements the informal description in chapter

Secure RISC Multiprocessor Multiple App endix C is a copy of The Monash

al Clock by Maurice Castro and Ronald Pose CP Processors Without a Glob

The pap er outlines the concepts prop osed for the hardware to b e used in the Secure

RISC Architecture pro ject

A glossary of terms used in this work is found in app endix D

The remainder of this chapter intro duces briey the concept of capabilities and

the implementations available the concept of a p ersistent system and claries the

denition of threads

Capabilities

Capabilities provide a uniform mechanism for controlling access to and the protec

tion of resources DVH The extension of the capability mechanism to allow the

viewing of a capability as a form of address Fab allows all system resources to

b e mo deled as memory ob jects

There are several ma jor typ es of capability based systems These systems are

identied by the mechanisms they employ to prevent forgery of a capability The

architectures are

Tagged Architecture employs tag bits on memory lo cations to identify a ca

pability as a sp ecial ob ject within a collection of data The hardware prevents

unprivileged co de from altering the capability

Segregated Architecture separates capabilities from ordinary data Capabil

ities are placed in a page or segment that cannot b e mo died by unprivileged

lists or Clists co de These groups of capabilities are often known as capability

Encrypted Capability Architecture calculates a form of checksum for the

ob ject identier and rights conveyed by the capability known as a signature

GL The capability and the signature are encrypted with a secret key

known only to the kernel This is given to the user pro cess as an encrypted

CAPABILITIES

capability When an encrypted capability is presented it is decrypted by the

kernel and the signature is recalculated The capability is valid if the recal

culated signature matches the signature sent in the encrypted capability The

test allows the system to detect attempts to forge capabilities

PasswordCapability Architecture employs a sparse address space The pass

word comp onent of the capability makes the numb er of valid capabilities small

relative to the total numb er of p ossible capabilities By ensuring that the ca

pabilities are randomly spread throughout the address space a large numb er of

attempts at guessing a capability is required to ensure that a valid capability

is found The passwordcapability mechanism is statistically secure

The Walnut Kernel implements a passwordcapability architecture The archi

tecture has several ma jor advantages over the alternatives

The ma jor disadvantage of the tagged architecture is that it requires sp ecialised

hardware with additional memory bits present which cannot b e used for the storage

of general data which makes the architecture undesirable for b oth economic and

p ortability reasons The presence of additional sp ecialised memory adds to the cost

of the system The need for sp ecialised hardware restricts the choice of system

Segregated architectures can b e implemented on conventional hardware but re

quire every op eration on a capability to b e mediated by the kernel Typically

programs are required to use handles to capabilities to refer to capabilities and

to call the kernel with the handle as a parameter for all op erations relating to a

capability All op erations on capabilities are therefore sub ject to the overhead of a

switch b etween user and kernel address space

The security of an encrypted key system is vested in the security of the encryption

algorithm and the key used by the system Discovery of b oth of these items would

allow the user to generate capabilities avoiding the controls imp osed by the kernel

and thereby compromising the systems security

PasswordCapabilities allow the use of conventional hardware without incurring

the overheads present in segregated architectures and with the exibility of storing

capabilities as ordinary data The scheme is immune from the total compromise of

security to which encrypted key systems are sub ject as there is no algorithm used to

CHAPTER INTRODUCTION

determine the passwords of capability Complete compromise of the system would

require each ob jects master capability to b e guessed

Persistent Systems

Conventional computer systems normally have two distinct views of data short

term data or long term data Short term data is typically stored in RAM or virtual

memory and exists only for the p erio d of time that a program is active Long term

data is often stored in the lesystem of a conventional machine Data stored in

a lesystem is indep endent from the program which created it and exists until it

is explicitly destroyed A p ersistent system eliminates the distinction b etween long

term data and short term data All data p ersists until it is explicitly destroyed

Furthermore the data is manipulated with the to ols used to manipulate short term

data on conventional systems These to ols tend to b e more exible than the limited

range of le op erations usually available

Persistent systems are designed to supp ort p ersistent programming which oers

CAC Per a numb er of advantages over conventional programming ABC

sistent programming allows the elimination of the large comp onent of conventional

programs which is solely concerned with transforming data from lesystem repre

sentation into the volatile representation that is manipulated by the program and

back again Elimination of this co de saves b oth time and space furthermore it elim

inates the need to p erform the transformation which may distract the programmer

from the primary task of the program

Persistent systems are more attractive than conventional systems in that they

provide a uniform environment for all data The uniformity of the mo del simplies

the programming environment thereby easing understanding

THREADS

Threads

The thread of control or thread is a path of execution through an address space The

ma jority of traditional op erating systems p ermit only a single thread of control in

the address space of a pro cess However having multiple threads sharing an address

space in some cases can b e advantageous Tan If the threads of a pro cess share

the virtual address space of the pro cess a thread of a pro cess has no protection from

the actions of the other threads of the pro cess

Threads are typically used in applications with IOb ound comp onents The

writer of the application can place the IOb ound op erations in a separate thread

which uses blo cking IO op erations The IOb ound threads progress until they blo ck

then other threads of the pro cess make use of the remainder of the timeslice Thus

the pro cess can make more ecient use of the allo cated time Without threads a

single blo cked action would halt the progress of the application necessitating the

surrendering of the remainder of the timeslice

On unipro cessor systems threads are scheduled in a time sharing manner within

a pro cesss timeslice Each thread of a pro cess is allo cated time on the pro cessor

Ideally on a multipro cessor system the threads of a pro cess would execute concur

rently however a numb er of systems claim to supp ort threads but supp ort only

the timeslice semantics supp orted on unipro cessors

Threads may b e implemented at either the user BS or the kernel levels

User level packages typically make use of kernel up calls or a yield call built into

the package to allow switches b etween threads On systems which supp ort up calls

an up call is made when the kernel detects an event which would cause a pro cess to

blo ck The up call invokes a management routine which either selects another thread

of the pro cess to run or surrenders the remainder of the timeslice The yield call is

used to switch b etween coop erating threads by indicating that the current thread

is ready to allow another thread to pro ceed Kernel level thread packages make use

of primitives that are implemented in the kernel

3

Also referred to as a lightweight pro cess Mic however this term promotes confusion The

term lightweight pro cess was originally used to distinguish UNIX style pro cesses from Multics style pro cesses

CHAPTER INTRODUCTION

The Walnut Kernel do es not provide supp ort for the concurrent execution of

multiple threads of execution within a pro cess on multipro cessors A mechanism

with timesliced semantics is provided to let user pro cesses handle asynchronous

events

The absence of concurrently executing threads is not regarded as a disadvantage

Walnut Kernel pro cesses can share address spaces in a more exible and controlled

manner than concurrently executing threads and hence they provide similar func

tionality to concurrently executing threads but with the advantage of greater control

of access to memory

The design of the Walnut Kernel recognised that the need for threads is driven by

two factors the high cost of creating a pro cess and the diculty of sharing address

space b etween pro cesses The rst motivation for the use of threads is dealt with in

two ways by the design of the Walnut Kernel The design of the kernel endeavors

to keep the cost of generating a new pro cess to a minimum and also as pro cesses

are p ersistent it is practical to maintain pro cesses as servers which p erform tasks

on demand avoiding the cost of generating a new pro cess each time a function is

required The second motivation for threads is addressed by observing that the

basic mo del of the system capabilities allowing access to ob jects is based on the

interpro cess sharing of co de and data through memory ob jects The shared memory

mo del contrasts with le based systems which share information through les and

pip es necessitating the use of le proto cols File proto cols p erform well for sharing

data b etween pro cesses when data structures which map well onto streams are used

However more complex data structures such as trees and graphs are generally less

well handled

Chapter

A PasswordCapability System

This chapter describ es the PasswordCapability System develop ed in the Depart

ment of Computer Science Monash University circa APW And Pos

APW AW which represents the historical background of the current pro ject

The Monash Multipro cessor Pro ject had b oth hardware and software comp o

nents The hardware develop ed for the pro ject consisted of a collection of pro cessor

and memory b oards on a shared high sp eed bus The pro cessor b oards provided a

set of capability registers to implement the address transformations required To the

d b est of our knowledge this system APW intro duced the concepts of passwor

capabilities and money based garbage collection Subsequently a numb er of systems

adopted the passwordcapability mechanism see chapter

The Kernel

User pro cesses op erated in a uniform virtual memory The address space was di

vided into volumes each of which contained a numb er of ob jects Volumes were

p ermanently asso ciated with pieces of hardware The ma jority of volumes were as

so ciated with storage media such as disk packs Some volumes were asso ciated with

a particular multipro cessor Typically the latter typ e of volume was used to access

physical devices attached to that multipro cessor When an ob ject was created it

was p ermanently asso ciated with a single volume and allo cated a serial numb er on

that volume Serial numb ers were required to identify uniquely a given ob ject on a

CHAPTER APASSWORDCAPABILITY SYSTEM

Volume Serial Password Password

Figure A Capability in the PasswordCapability System

volume for all time and were never reused Serial numb ers were allo cated consecu

tively on a volume The volume and serial numb ers of ob jects were dened as bit

quantities

All ob jects in the system were comp osed of contiguous sections of the virtual

memory Pro cesses gained access to the contents of an ob ject by mapping in part

of the memory ob ject into the address space of the pro cess After mapping bytes X

to Y of an ob ject into addresses A to B of a pro cesss address space reference to

will reference byte X of the ob ject or a word of bytes starting with address A

this byte Mapping in is not copying If a pro cess writes to address A byte

X is altered and the byte will seen to b e altered by any other pro cess which has

the byte mapp ed in

The Monash Multipro cessor Pro ject intro duced a mechanism for providing non

segregated capabilities employing probable security This mechanism diered from

the alternatives of tagged architectures and segregated architectures which b oth

distinguished capabilities from other forms of data Capabilities were bits in

length and had comp onents see gure The volume and serial numb er formed

the name of the ob ject Password and Password uniquely identied a capability

for the named ob ject Passwords were allo cated purely randomly there was no

enco ding of access rights in the password The security of the mechanism was

derived from the small numb er of valid passwords in comparison to the total numb er

of passwords The passwordcapability acted as an identier for a set of access rights

to an ob ject Many capabilities could convey the same access rights to an ob ject but

with diering passwords A table of valid capabilities and their asso ciated rights was

stored on the volume containing the ob ject to which the capabilities referred This

table was known as the catalog and was accessible only to the kernel Capabilities

were revoked by deleting the entry for the set of rights asso ciated with the capability

The passwordcapability scheme had a numb er of advantages over the alterna

tives of tagged architectures and segregated architectures It was not sub ject to the

THE KERNEL

ma jor disadvantages of tagged architectures The rst disadvantage was economic as

tagged architectures must supply physical memory for tag bits which contribute to

the cost of the memory but are unavailable as general purp ose memory The second

problem was that the pro cessor must supp ort tags Sp ecial instructions must b e used

to access capabilities safely within a tagged architecture requiring hardware supp ort

and hence limiting the choice of pro cessor In addition as passwordcapabilities were

stored as ordinary data the mechanism was not sub ject to the limitations of seg

regated systems Segregated systems required the kernel to provide mechanisms

to move copy and communicate capabilities preventing capabilities b eing treated

as ordinary data which made it dicult to pass capabilities to pro cesses or users

outside the computer system

When an ob ject was created the kernel returned a single capability to the creator

This capability was known as the master capability It describ ed the complete

ob ject and held all p ossible access rights to the ob ject Other capabilities with

restricted rights could b e derived from the master capability or from its children

The capabilities for an ob ject were logically organised in a tree structure The

master capability was the ro ot with derived capabilities b eing the other no des of

the tree Each internal no de of the tree is the ro ot of a subtree which has rights

equal to or weaker than the ro ot of the subtree Deletion of any capability results

in the deletion of its descendants Deletion of the master capability results in the

destruction of the ob ject as no further access to the ob ject is p ossible

Any pro cess that held a capability was capable of using it to access the ob ject

referred to by the capability reducing the need to create many capabilities with

equivalent rights

The PasswordCapability system did not supp ort the concepts of ownership or

dep endency b etween ob jects Once an ob ject was brought into existence any pro cess

knowing a capability could make use of it Capabilities were copied passed b etween

pro cesses and stored by pro cesses as ordinary data In addition capabilities could

b e held outside the system by a p eripheral or a user and still refer to a valid ob ject

making it imp ossible to determine if there were any entities which knew a valid

1

The system allowed capabilities to b e stored using any representation Thus an encrypted

capability was as valid as a plain text capability

CHAPTER APASSWORDCAPABILITY SYSTEM

capability for an ob ject These prop erties prevented the use of existing mechanisms

for garbage collection which dep ended on ownership reference counts or reference

tracing Two mechanisms were used to implement garbage collection An ob ject

was destroyed when no valid capabilities for that ob ject could exist that is when

the master capability for the ob ject was deleted The second mechanism relied on a

simple charging scheme Each ob ject had asso ciated with it a store of money from

which it p erio dically paid rent Bankrupt ob jects were deemed to b e garbage and

were destroyed

The money mechanism was generalized to form an economy within the Password

Capability System Both ordinary ob jects and pro cesses had quantities of money

asso ciated with them Money was managed directly by the kernel and was distinct

from normal data items Money could b e transferred b etween ob jects and was con

sumed when services were provided by the kernel Each capability had a monetary

value asso ciated with it The money asso ciated with the master capability repre

sented the total store of money held by the ob ject Derived capabilities were asso

ciated with a value that represented the amount of money that could b e withdrawn

through that capability A withdrawal or dep osit caused the values held by each of

the ancestors of the capability to b e up dated by the amount of the withdrawal or

dep osit A withdrawal would o ccur only if all the values held by the capability and

its ancestors would b e nonnegative after the op eration was p erformed The money

mechanism allowed pro cesses to charge for services to cover the costs of p erforming

work and allowed for the p ossibility of charging for the use of a program unlike the

conventional scheme of paying a license fee for access to a program which may or

may not b e used

Pro cesses had no internal parallelism and hence could run at most on one

pro cessor at a time However pro cesses could b e moved b etween pro cessors This

allowed the PasswordCapability System to schedule pro cesses on the rst suitable

pro cessor that b ecame available

Lockwords were used in the PasswordCapability System to prevent nontrusted

co de from distributing information shared by that co de The connement mech

anism identied two classes of capabilities alter capabilities which could b e used

THE KERNEL

to convey information to third parties and nonalter capabilities which were strictly

readonly The top bit of the password eld was set to indicate that a capability was

an alter capability The remaining bits of the password were chosen randomly

Each pro cess had a bit lo ckword asso ciated with it The lo ckword could not b e

to b e set to the exclusiveor of read Lo cking the pro cess caused the lo ckword L

the original lo ckword L and the argument of the op eration which sets the lo ckword

V

L L V

Passwords were not encrypted in pro cesses with a zero lo ckword To execute an

was created with a nonzero lo ckword L untrusted package another pro cess P

This lo ckword is generated from the creating pro cesss P lo ckword L and an

selected by the creating pro cess arbitrary value V

L

L L V

L

The passwords of alter capabilities C in the new pro cess P are encrypted

pass

using the pro cesss lo ckword b efore b eing used by the kernel

C L C

pass

pass

Nonalter capabilities are not aected by the value of the lo ckword Provided pro cess

P was not lo cked ie had a zero lo ckword this mechanism allowed pro cess P to

pass capabilities to pro cess P by encrypting the capabilities passed using V The

L

mechanism however prevented P from passing on those capabilities as the value

of L was unknown to P Any capability created by a pro cess was encrypted using

the pro cesss lo ckword Pro cess P could create ob jects for its own use or for the use

of any pro cess which knew L Program co de and data was made available without

the risk of data leakage by using nonalter capabilities to map in readonly co de and

data Groups of lo cked co op erating pro cesses could b e created by using the same

lo ckword for each of the pro cesses when they were created

The PasswordCapability System allowed capabilities to confer the following

rights

Money Rights Three separate rights were supp orted to allow access to infor

mation on money The drawing right determined the maximum amount of

CHAPTER APASSWORDCAPABILITY SYSTEM

money that could b e withdrawn through the capability This value was re

duced by withdrawals and increased by dep osits Possession of a balance right

was necessary to determine the amount of money that could b e withdrawn

via a capability The result returned was the minimum of the drawing right

of the capability and its ancestors drawing rights A deposit right allowed

money to b e added to the drawing right A separate right was required to

provide control over a p otential covert channel which functioned by varying

the drawing right

Window Rights These rights allowed the visible region of the capability to b e

sp ecied A window consisted of a set of consecutive words within an ob ject

and extending an integral numb er of starting on a word b oundary oset

words size Derived capabilities were required to have windows which were

ranges of the capability from which they were derived Asso ciated with each

cute window were the access rights read write and exe

Pro cess Rights Pro cesses had four additional rights in addition to the rights

held by other ob jects A message right allowed the holder to send a word

message of arbitrary content to a pro cess If the pro cess was waiting on a

message the pro cess was awakened The suspend right allowed the holder

to susp end and resume the pro cess The internal state of the pro cess could

b e determined by holders of a capability with a status right Holders of a

capability with a condition right could initialize a susp ended pro cess This

allowed the partial mo dication of the state of a pro cess

Suicide Right Possession of a suicide right allowed a capability to b e used to

destroy itself This allowed the distribution of the same capability to antag

onistic pro cesses but prevented one party depriving the others of the use of

the capability Suicide right was unique in that capabilities with suicide right

could b e derived from capabilities without suicide right

THE HARDWARE

are The Hardw

The kernel of the PasswordCapability System relied heavily on many of the features

of the hardware designed to supp ort the system Most notable of these features is the

presence of segment registers which map capabilities onto a paged virtual memory

The presence of sp ecialised hardware resulted in signicant simplications in the

design of the op erating system

The prototyp e hardware was based on the NS pro cessors op erating at

MHz The NS was used to provide oating p oint supp ort Memory man

agement was provided through the custom hardware describ ed in this section The

bus connecting the pro cessor b oards op erated at MHz

Figure illustrates the overall structure of the hardware The system employed

a numb er of pro cessor b oards which communicated with a numb er of shared memory

b oards over a single high sp eed bus Each pro cessor b oard supp orted a set of

capability registers also known as window registers which were used to transform

ddress the addresses generated by the pro cessor into addresses in the Intermediate A

Space IAS The IAS addresses were then used to access data via the bus Each

memory b oard p erformed a check to determine if the address required was present

in the pages stored on the b oard and allowed access if the page was present

The hardware on each pro cessor b oard provided logical address to IAS address

translation see gure The logical address of the pro cessor was comp osed of

parts a window register identier an oset into the window and a byte oset into

a word The top bits of the logical address were used to identify the appropriate

window register from the supp orted The oset and access mo de were checked

against the window size and window rights resp ectively If the access was legal the

Address and the oset into the window was calculated and sum of the IAS Base

used as the IAS Address of the access

The hardware supp orted checking of read write and execute access rights and

limit checking to word granularity

On receiving an IAS address a memory b oard checked its hash table to determine

if the page required was stored on it see gure The check was p erformed by

using the middle bits of the IAS address as an index into the hash table If the

CHAPTER APASSWORDCAPABILITY SYSTEM

Pro cessor

CPU

Window

Window Registers Base Oset

Data Adder

IAS Addr

Bus

Address

Recognition

Physical

Physical Addr

Memory

Translation

Memory

Unit

Mo dule

Figure Blo ck Diagram of the PasswordCapability Systems Hardware

THE HARDWARE

bits bits bits

Window Oset Byte Logical Address

Bits

Rights Window Size

IAS Base Address

Window Registers

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

..

..

.

Adder

bits bits

IAS Address IAS Page Numb er Oset

Figure Logical Address to IAS Translation

bits bits bits

IAS Page Numb er Oset

IAS Address

bits bits bit

V

Physical Order High

a

Page IAS

l

Numb er Address Bits

i

d

Hash Table

Compare

Physical Address

bits bits

Figure IAS Address to Physical Address Translation

CHAPTER APASSWORDCAPABILITY SYSTEM

bits stored in the hash table matched with the high order bits from the IAS

address and the page was marked valid then the page was stored on the memory

b oard If a match o ccurred then the required physical address was generated by

concatenating the physical page number and the oset into the IAS page The

physical address was used to gain access to the appropriate item in memory

Multiple memory mo dules which recognized the addresses of IAS pages stored

within them allowed the near arbitrary allo cation of pages to memory mo dules

This prop erty allowed the distribution of pages across memory mo dules to equalize

the load across the mo dules

As the physical memory of the machine was limited ob jects could not b e retained

in the IAS indenitely The large size of the IAS allowed a simple management

strategy to b e employed Ob jects were retained in the IAS until the space b ecame

exhausted At that p oint all ob jects were ushed from the IAS window registers

were marked invalid and caches were ushed Subsequent accesses bring ob jects

back into the IAS

The PasswordCapability Systems op erating system was strongly supp orted by

the system hardware which provided direct supp ort for capabilities through purp ose

built capability registers and a numb er of intelligent p eripherals The p eripherals

included shared memory with VAX and purp ose built IO controllers

2

The use of a hash table instead of a content addressable memory required that two pages which

hashed to the same value cannot b e stored on the same memory mo dule

Chapter

Survey

This chapter surveys a numb er of op erating systems to provide a historical con

text for the work on the Walnut Kernel discussed in later chapters The chapter

comprises several sections The rst section section provides an overview of a

conventional system UNIX is selected for its ubiquity among current commercial

systems The second section section examines several current op erating sys

tems which display either a distributed or microkernel architecture Microkernel

architectures are signicant in that they bring software engineering practices to

the kernel level while distributed architectures aim to provide scalable mechanisms

for implementing op erating systems on systems with more than one pro cessor A

notable absence from this section is Microsofts Windows NT op erating system

Although the system is likely to b e of great commercial imp ortance the informa

tion available Cus indicates that from the p ersp ective of innovation within the

kernel the op erating system is similar to the other microkernel based architectures

surveyed The third section section covers a numb er of capability based op

erating systems The op erating systems covered in earlier sections may make use

of capabilities in a limited role however those in this section use capabilities for

all access and naming functions The nal section section draws conclusions

ab out trends in the development of op erating systems

The survey uses Tanenbaums Tan four ma jor comp onents of an op erating

system pro cess management inputoutput memory management and le system

1

Windows NT is a trademark of Microsoft Corp oration

CHAPTER SURVEY

to provide a basis for comparing the systems presented

ventional Op erating Systems Con

The UNIX family of op erating systems is based on the work of Ritchie and Thomp

son RT This brief overview covers the gross features of UNIX and some of

their implications UNIX has b een widely used in b oth commercial and research

environments The interest of the research community has prompted many detailed

studies of the characteristics of the op erating system This survey identies some

key features of the op erating system that are either typical of conventional op erat

ing systems or have had signicant inuence on the development of other op erating

systems The features discussed are the style of pro cesses provided protection and

the implementation of the kernel

In direct contrast to Multics and VMS where due to the signicant costs in

volved in creation and destruction pro cesses are reused the UNIX system employs

inexp ensive pro cessesJJc Low cost pro cesses allow the application of more than

one pro cess to the task of solving a problem A problem can therefore b e decom

p osed into a numb er of smaller tasks with the p ossibility of using utilities for some

of those tasks The use of small utilities confers the software engineering advantages

of encouraging b oth the reuse of co de and providing encapsulation of co de Mech

anisms which allow the easy reuse of co de make it desirable to invest time in the

demonstration of correctness and optimisation The high cost of Multics pro cesses

encourages the use of an inprocess mo del of protectionKee Tan Sal Under

the inprocess mo del encapsulation o ccurred at the subroutine or mo dule level The

low cost of UNIX pro cesses and the absence of hardware supp ort for more than two

privilege levels favors the outofprocess protection mechanism adopted by UNIX

Protection on UNIX systems is a combination of two mechanisms The primary

mechanism provides two classes of user ordinary users who have access to a lim

ited selection of kernel op erations and sup er users who have access to all functions

provided by the kernel The functions of the le system comprise the second mech

2

The term ligh tweight pro cess has not b een employed to avoid confusion as the term has b een

more recently acquired by SUN and other vendors to describ e their thread implementations

CONVENTIONAL OPERATING SYSTEMS

anism The UNIX le system asso ciates with each le an owner a group and a set

of p ermission bits The p ermission bits determine the level of access to the le that

the owner the group and all other users are allowed to have In addition to this

executable les have the ability to inherit the p owers of the owner setuid or group

setgid of the le for the duration of the execution of the program The presence

of an omnip otent user the sup er user and the ability to inherit the p owers of

the owner of a pro cess allow privileged functions to b e made available to users in a

controlled way

There are two ma jor families of the UNIX kernels in current use those derived

from the commercial System VGC and those derived from the research op erating

system created by the CSRG at BerkeleyLMKQ JJa Although there are

ma jor dierences b etween the systems for example BSD places the kernel address

space at the top of the pro cess address spaceJJb in contrast to System V which

places it at the b ottom of the address space the systems are suciently similar at

the conceptual level that no further distinction will b e made b etween these versions

of the op erating system in this chapter

The UNIX kernel is monolithic in that it is constructed as a single program

existing in a single address space The kernel contains co de to supp ort virtual

memory user requested system functions devices and the le system A numb er of

consequences ow from this metho d of organising the kernel the most signicant of

these is the absence of protection of the kernel from the actions of device drivers

and the requirement that the system needs to b e rebuilt and restarted to incorp orate

a new device driver or an alteration to an existing device driver

The kernel manages access to the hardware through the device abstraction A

device app ears as a sp ecial le within the le system File op erations and IO

control op erations io ctls are applied to the sp ecial le and translated by the kernel

into op erations on the device

The UNIX system intro duced a numb er of signicant features to mainstream

op erating systems while retaining the common monolithic implementation of the

kernel Development has continued The most signicant change has o ccurred with

3

The various kernelized and emulated versions supp orted over microk ernels are excluded from this discussion

CHAPTER SURVEY

the intro duction of threads by a numb er of vendors

UNIX displays the conventional op erating system characteristics of transitory

pro cesses which use the le system to provide p ersistent storage The use of sp ecial

within the les as interfaces to inputoutput devices provides greater uniformity

system than in other conventional systems which have separate mechanisms for

dealing with devices Finally in a feature typical of the ma jority of conventional

systems most versions of UNIX currently provide only limited supp ort for memory

to b e shared b etween pro cesses

t Op erating Systems Curren

This section describ es a numb er of recently develop ed op erating systems which ex

hibit the features of either distributed op erating systems or a microkernel architec

ture

The advent of p owerful low cost micropro cessors has altered the economics of

the provision of computer resources In general it is now more cost ecient to use

a large numb er of pro cessors than to use a single large pro cessor to provide a given

amount of computational p ower The shift in the cost of provision of service has

b een the ma jor motive for the development of distributed op erating systems

Distributed op erating systems allow many users to work on a collection of pro

cessors linked by a high sp eed network Tanenbaum Tan identies the following

advantages and disadvantages of distributed systems

Advantages

Economics Micropro cessors oer a b etter pricep erformance ratio than main

frames

Sp eed Distributed systems are not sub ject to the fabrication and construc

tion requirements of single pro cessor systems Distributed systems can b e

constructed with more total computing p ower than a mainframe constructed

with similar technology

4

The Input Output Control io ctl mechanism which allows op erations which are not dened

for les to b e p erformed on devices partly negates this advantage

CURRENT OPERATING SYSTEMS

Inherent distribution Some applications involve spatially separated devices

This class of problems has a natural or inherent decomp osition onto a dis

tributed architecture

Reliability The loss of a single CPU or group of CPUs reduces the total

p erformance of the system but should not prevent the system from op erating

In a centralized system the loss of a single comp onent typically results in the

failure of the complete system

Incremental growth As distributed systems are comp osed of computational

units on a high sp eed network computing p ower can b e added in these units or

multiples of them allowing the p ower of the system to b e scaled up gradually

Disadvantages

Software The design and implementation of distributed programs is signif

icantly dierent from sequential programs As a result little software exists

for distributed systems

Networking Distributed systems rely on networks b etween pro cessors to allow

for the transfer of information and the coordination of op erations b etween

pro cessors op erating on a problem These networks can saturate or suer from

other problems

Security It is necessary for a distributed system to promote easy sharing

and access to data among the pro cessors of the system to allow the pro cessors

to work coop eratively on that data Easy access to data applies to secret

information as well

Traditional monolithic kernels provide a large numb er of system functions which

are used directly by user programs This contrasts with the microkernel approach

of providing a small numb er of essential services from within the kernel and employ

ing userlevel servers to provide the bulk of the services exp ected by user pro cesses

Microkernels make for greater exibility in the services provided and in the im

plementation of those services than do monolithic kernels Tanenbaum Tan

CHAPTER SURVEY

identies the four minimal services implemented within a micro kernel as

An interpro cess communication mechanism

Some memory management

A limited amount of lowlevel pro cess management and scheduling

Lowlevel inputoutput

Microkernels also oer advantages to the implementor of an op erating system Be

cause of the limited functionality required from a microkernel the complexity of

the microkernel is considerably less than that of a monolithic kernel In addition

microkernels allow software engineering techniques to b e applied to op erating sys

tems By dividing op erating system services into a numb er of userlevel servers

services can b e encapsulated into logical units reducing design and maintenance

diculties

A numb er of op erating systems intro duced in this section make some use of

capabilities However none of these op erating systems is capability based as they

do not use capabilities as the sole mechanism for determining access to each ob ject

and pro cess under the control of the system

Amo eba

Amo eba Tan vRT MvRT is a microkernel based distributed op erating

system supp orting multiple users in a transparent environment It was initially de

signed by Andrew S Tanenbaum Frans Kaasho ek Sap e J Mullender and Robb ert

van Renesse in at Vrije University in Amsterdam

This distributed op erating system runs on a network of dissimilar workstations

and servers The network is intended to contain a large numb er of CPUs with tens of

megabytes of memory available to each CPU Shared memory if present is exploited

5

Tanenbaum identies the provision of lowlevel inputoutput as a necessary service of a micro

kernel As there are a numb er of microkernels which delegate IO op erations to user level servers

it is clear that this statement requires mo dication A more appropriate statement would b e

Management of access to lowlevel IO

CURRENT OPERATING SYSTEMS

to improve message passing p erformance The network has four classes of no des

workstations p o ol pro cessors sp ecialised servers and wide area gateways There is

one workstation allo cated to each interactive user and it runs tasks which require

fast interactive resp onse These tasks include the window manager and text editors

Po ol pro cessors consist of one or more CPUs which are allo cated as required to a

task At the completion of the task the pro cessor is returned to the p o ol for others

to use A numb er of sp ecialised servers is present to p erform functions which either

need to run on a separate pro cessor or need to b e in continuous op eration to provide

greater eciency Typical sp ecialised servers would include directory le and blo ck

servers and database servers Finally wide area gateways link Amo eba sites into

a seamless system

Each machine in an Amo eba system runs a functionally identical microkernel

The Amo eba microkernel manages pro cesses and threads supp orts lowlevel mem

ory management provides transparent communication b etween threads and handles

IO

Pro cesses dene an address space which may b e shared by a numb er of threads

of execution Each thread has its own register set stack and program counter To

simplify the provision of blo cking IO the threads are scheduled by the microkernel

The microkernel provides memory management services which allo cate and deal

lo cate blo cks of memory known as segments No limitations are placed on how a

pro cess uses a segment and segments may b e mapp ed into and out of a pro cesss

address space as necessary When a segment is mapp ed out of a pro cesss address

space a capability for that segment is returned This capability may b e passed to

other pro cesses to allow them to load the segment into their address space A seg

ment remains in the system memory even when it is not mapp ed into a pro cesss

address space

A client server mo del of distributed pro cessing is implemented through the use

of Remote Pro cedure Calls RPCs This mechanism is used to provide transparent

communication b etween threads The Amo eba RPC mechanism consists of phases

operation send a message to the server and blo ck this thread of execution do

until a reply has b een received This call is issued by a client requesting a

CHAPTER SURVEY

service The destination thread is identied by a bit numb er known as a

p ort

request announce that the current thread is willing to receive a message get

sent to a nominated p ort This call is used by a server to advertise the server

pro cedure attached to a sp ecied p ort

reply send reply back This call is used by the server to reply to the send

client requesting the service On receipt of this message the client thread is

unblo cked

Server Port Ob ject Numb er Rights Check Field

z z z z

bits bits bits bits

Server Port identies thread that manages ob ject

Ob ject Numb er uniquely identies ob ject managed by server

Rights identies op erations p ermitted by holder of capability

Check Field protects capability against forging or tamp ering

Figure Comp onents of an Amo eba Capability

Although Amo eba uses capabilities gure with cryptographic check elds

to identify ob jects within the system Tanenbaum et al do not identify Amo eba as

a capability based op erating system The use of capabilities within the Amo eba

system is not uniform in that all the elds of a capability are used and validated to

load or access an ob ject but only the server p ort eld of the capability is validated

for RPC op erations The other elds are passed to the server The server may

interpret the other elds without restriction

A sparse p ort name space a bit p ort numb er is required to access a p ort

partly protects servers from unwanted access attempts Only clients of a server are

informed of the servers p ort numb ers and hence only valid clients of a server should

b e able to access the server If the p ort numb er is leaked additional mechanisms

CURRENT OPERATING SYSTEMS

are required to ensure that a server is not abused This protection may b e provided

within the server as it may reject messages which are not of the correct form

To protect against intruders emulating servers Amo eba uses a oneway function

to encrypt p ort names Clients of a service are given the p ort name P which is

the secret p ort name known to the server using the oneway function derived from G

Servers use their secret p ort name to advertise their service G is transformed by

either hardware or software an Fb ox into P and the service is made available

from a host to the network Clients request a service using the publicly known p ort

name P and are connected with the server All p ort names advertised by servers are

transformed by an Fb ox and as G is not publicly known intruders cannot emulate

a server

Other services provided on Amo eba systems are not supplied by the microkernel

instead they are made available by user level servers The most critical of these

services are the directory and le system services

The basic le system service provided with an Amo eba system is known as the

Bullet service The Bullet server creates and manages immutable les As the les

cannot b e altered after creation the size of the le is known at the time of access

This feature is exploited to allow les to b e stored contiguously on disk and in the

main memory cache As the les are stored in contiguous blo cks only a single

read op eration is required to recover the le from disk and a client can read a le

completely in a single RPC op eration

ASCI I string Group Group Group

a b c

Mail Cap all Cap ro Null Cap

d

Games Cap all Cap rw Cap ro

Exams Cap all Null Cap Null Cap

a

capability with all rights

b

capability with only read right

c

capability conferring no rights

d

capability with read and write rights

Figure Logical Representation of an Amo eba Directory

The directory server provides a mapping b etween ASCI I strings and capabilities

CHAPTER SURVEY

The directory server organizes the mappings into collections known as directories

gure Directories can b e shared with other users and grant access to the

capabilities contained within the directory in a controlled way The directory is

structured as a two dimensional table consisting of a collection of strings naming

the ob ject and a numb er of capabilities which grant access to the ob ject The

capabilities will frequently allow dierent levels of access to dierent groups As a

directory is an ob ject within the Amo eba system it is represented by a capability

that can b e placed within a directory Directories can b e used to represent an

arbitrary graph structure

User

Environ

bin

dev

etc

home

public cap

usr hosts

p o ol

User

x

Environ

R

bin

SPARC

dev

etc

home

public

usr

Figure Basic Amo eba Directory Hierarchy

Each user is provided with a ro ot directory see gure which contains entries

for the users environment binaries IO servers administrative information home

directory and the public directory The public directory contains the ro ot of the

public shared tree The most signicant entries in this area are the cap directory

which contains capabilities for public servers the hosts directory which contains

CURRENT OPERATING SYSTEMS

capabilities for host sp ecic servers and the pool directory contains capabilities for

p o ol pro cessors

In summary the Amo eba system is a distributed op erating system that provides

transparent access to all the facilities within the system without the user b eing

made aware of the lo cation or nature of the resource Sp ecialised no des can b e

exploited without distorting the mo del as they app ear as servers within the system

Capabilities and sparse address spaces are used within the system to provide a

measure of security however they are not exploited to their fullest extent

h Mac

The Mach microkernel BGJ RJO Lo e Tan forms the basis of a num

b er of op erating systems in current use including OSF many research pro jects

including VSK and has b een suggested as the basis for a p ortable form of

OS Machs direct ancestors were designed to demonstrate that mo dular op erating

systems based on message passing were feasible The earliest versions of Mach were

monolithic and designed to b e compatible with UNIX in order to exploit software

b ecoming available for UNIX systems At that stage Mach provided supp ort for

multiple pro cessors threads and interpro cess communication and although it sup

p orted network op erations it was not envisaged as a distributed op erating system

With the advent of Mach the co de derived from Berkeley UNIX was removed

and Mach was transformed into a microkernel supp orting distributed op erations

The features of the Mach microkernel are examined b elow For convenience

Mach is used to denote Mach in the following text

Mach provides the minimal services required of a microkernel task manage

ment memory management interpro cess communication and device supp ort In ad

dition b oth multipro cessors and multicomputers are supp orted in the microkernel

and system call redirection is provided

Two abstractions are used for task management tasks and threads A task

corresp onds to a collection of resources including the tasks address space and

access to communication facilities to b oth the kernel and the server Threads are

paths of execution within a task More than one thread may b e active within a task

CHAPTER SURVEY

at a given time and on a multipro cessor several threads may op erate concurrently

Both the control of tasks and the scheduling of threads can b e mediated by user

level programs User level programs can have complete control over the scheduling

of pro cesses

Memory management consists of mapping contiguous sections of Mach ob jects

into the address space of Mach tasks The microkernel manages the main memory

as a cache of sections of Mach ob jects A signicant feature of Machs memory

management is that a user level page fault handler may b e sp ecied User level

co de can control the fetching of pages from backing store This feature has b een

exploited in the implementation of p ersistent systems such as the Napier environ

ment VSK

Interpro cess communication is provided through the use of p orts A p ort is a data

structure accessible only to the microkernel which consists of a xed length ordered

list of messages The p ort data structure is used to implement all communication

under Mach When a p ort is created capabilities known as port names are created

Asso ciated with these capabilities are port rights Three typ es of port rights exist

receive right send right and sendonce right Only one task may hold a receive right

for a p ort although many tasks may hold a send right Ports are only destroyed

when the receive right is destroyed A task gains access to a p ort by loading the

port name into its p ort name space The microkernel maintains a count of the

numb er of tasks that have a p ort loaded The p ort abstraction is used to control

every element of the Mach system Op erations on tasks threads and ob jects are

all p erformed by sending messages to the appropriate p ort

Low level device IO is mo deled using the p ort mechanism Messages can b e

sent to p orts to transfer data and control devices Mach is capable of supp orting

b oth synchronous and asynchronous devices as it separates read and write messages

from request and reply messages

To assist in the provision of the binary emulation of op erating system environ

ments Mach incorp orates system call redirection When one of the redirectable

system calls or a redirectable exception o ccurs user mo de co de within the calling

task can b e invoked to handle the call A library of routines emulating the system

CURRENT OPERATING SYSTEMS

calls of the emulated op erating system can b e incorp orated into a task The system

calls are activated seamlessly The redirection need only b e set up once as it is

preserved in child tasks after a fork op eration

Mach supp orts multiple pro cessors through the use of pro cessor sets The mem

b ers of these sets form a p o ol of pro cessors which can b e used to schedule tasks and

threads assigned to the set The use of pro cessor sets provides a mechanism for the

control of the scheduling of threads within a multipro cessor or a multicomputer

User Kernel

Task

Pro cess

Address Space

Port

Thread

Bo otstrap

Port

Thread

Exception

Thread

Port

Registered Other Task Prop erties

Port

Figure A Mach Task

Figure depicts a Mach task In addition to its address space a Mach task

contains default values for the pro cessor group and scheduler parameters to b e used

by threads op erating in its address space and a collection of statistics relevant to the

history of the pro cess The pro cess p ort is used to request services from the micro

kernel The b o otstrap p ort is read by the rst pro cess in the system to determine

the names of kernel p orts The exception p ort is read to determine the nature of any

errors A collection of registered p orts provide access to standard system services

Each Mach thread has a p ort that can b e used to control it As p orts are

accessible to all threads within a task these p orts allow a thread to control itself

and other threads within a task Using this facility it is p ossible to construct user

pro cess managed thread packages

CHAPTER SURVEY

Emulation libraries are the key comp onent that allows Mach to supp ort op erating

system environments Emulation libraries translate program requests into Mach

op erations Eciency is improved by placing the emulation library into the same

address space as the program exp ecting the emulated services b ecause this reduces

the cost of communication b etween the emulation routines and the program As co de

within the emulation library shares the address space of the programs it supp orts

it is unable to protect data that it accesses from the program Thus the emulation

library mechanism cannot b e used to protect sensitive data from programs

The use of emulation libraries allows several p ersonalities to exist on a Mach

system at one time executing in parallel In addition to using Mach functions to

supp ort op erating system functions it is p ossible for the emulation library to use

the facilities of other p ersonalities This feature is most commonly used to allow

op erating environments to use les hosted by other op erating environments

The Mach microkernel has demonstrated that microkernels can supp ort multi

ple op erating system interfaces within a single system Access to pro cess scheduling

and memory management from user level pro cesses has made Mach a p opular to ol

for the implementation of exp erimental op erating environments Mach is not ca

pability based as within it capabilities are used only in a limited way to protect

access to p orts

Plan

Plan PPTT is a distributed multiuser op erating system which b ears a strong

resemblance to UNIX at the user interface level Plan draws on concepts found in

UNIX generalized for the new environment rather than attempting to implement

previously untried concepts The op erating system is aimed at a similar environment

to the Amo eba system see section in that it is comp osed of workstations

known as terminals p o ol pro cessors known as CPU servers and le servers

Resource sharing and reducing administration were priorities in the design of the

system

By using a limited numb er of p owerful abstractions it is p ossible to pro duce a

small kernel that p erforms the functions of larger kernels The microkernel philos

CURRENT OPERATING SYSTEMS

ophy of minimising functionality of the kernel and placing maximal functionality

into user pro cesses diers from Plan s philosophy of minimalism and uniformity

Unlike microkernels Plan incorp orates elements of the le system into the kernel

The le system serves as the ma jor abstraction of the op erating system

Resource sharing is promoted by using identical kernels on each terminal and

CPU server This allows users to cho ose whether to run programs on their terminal

or on a CPU server The distribution of tasks varies with the bandwidth of the link

b etween the terminal and CPU servers On slow links users tend to place programs

so as to reduce the communications cost On faster links users tend to run programs

lo cally unless the program is a data or compute intensive job

All the resources of a Plan system other than program memory are repre

sented as les within the le system The strict tree structure imp osed on the le

system links are not supp orted and the presence of all program accessible resources

makes the le system into a uniform name space Physical devices abstractions

and software concepts are all representable by le systems File systems can b e

implemented within the kernel as a driver as a user level pro cess or as a remote

server Access to le systems outside the kernel is p erformed through the kernels

mount driver The mount driver converts op erations into request messages which

are relayed to either the user program or the remote server which implements the

le system functions

The uniform name space and the ability to implement le systems in either the

kernel or user co de provides a uniform mechanism to access either kernel or user

functions This combined with the use of a uniform data structure for access to les

and devices known as a channel and a uniform set of primitives results in a highly

extensible op erating system with a simple interface paradigm The IO primitives

are

Attach Connect a channel to the ro ot of a le system and notify the le system

6

Complex abstractions are represented as directories containing les representing dierent as

p ects of the abstraction In the case of a pro cess the les present include ones for memory control

and the text le

7

An example of a software concept implemented as a le system would b e Plan s representation

of environment variables as les in the kernel resident le system

CHAPTER SURVEY

of which user is b eing attached

Clone Duplicate a channel creating a new channel that p oints to the same le or

le system as the original

Walk Perform a directory lo okup on a le and set the directory p ointer to the

next le or directory

Stat Get the le attributes of the current le

Wstat Alter the le attributes of the current le

Op en Check p ermissions b efore op ening le to allow IO to b e p erformed on the

channel

Read Read from op en le

Write Write to op en le

Close Close op en le

As more than one le may b e used to represent a device it is p ossible to separate

the control op eration from the data transfer By dividing device information this

way Plan avoids the need for a call similar in nature to the io ctl found in UNIX

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

Binaries

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

for lo cal

bin

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

pro cessor . .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

Devices . .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

for lo cal

dev

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

pro cessor . .

. .

. .

. .

. .

. .

. .

Figure The Initial Name Space of a Pro cess

Plan creates a pro cess group when a user logs into the system This minimal

pro cess group gure contains a ro ot directory some binaries and some lo cal

devices The system calls mount and bind are used to manipulate the name space

CURRENT OPERATING SYSTEMS

The mount call adds new external le systems and the bind call alters the arrange

ment of the name space Directories in the Plan system have the sp ecial prop erty

of b eing mounted behind another directory to yield the union of the two sets of les

Each directory in the union is searched in turn to nd a le with a matching name

The rst le found is returned This feature replaces the UNIX search path concept

in Plan

The system is virtualized through control of the name space The ability to

alter any element of the name space for a pro cess allows pro cesses and ob jects to

b e moved from server to server within the system Remote and lo cal resources are

easily interchanged by altering a pro cesss name space The virtual machine oered

by Plan is exploited when CPU servers are used as accelerators for pro cesses A

daemon on the CPU server answers a request for a pro cess to op erate on the server

by setting up a pro cess group on the server This allows resources lo cal to the CPU

server to b e used by the pro cess to b e accelerated without the pro cess needing to

b e aware that it is running on a CPU server Apart from an increase in sp eed the

pro cesss environment app ears identical to the pro cess regardless of whether it is

op erating on a CPU server or a terminal

Lo cal disk le systems have b een avoided in the design of this op erating system

The designers argue that lo cal le systems require signicant knowledge to admin

ister on the part of the workstation user which is frequently absent In addition

as relatively few workstations exp ort their le systems the use of centralized le

servers promotes le sharing and makes users indep endent of a sp ecic terminal

The need for lo cal systems is removed by using a combination of caching and high

sp eed links where necessary By using high sp eed links b etween CPU servers and

le servers and large memory based caches on the le servers a le access rate of

a similar order of magnitude to the memory access rate has b een achieved Cache

coherence is maintained through the use of a bit le identier Half of this value

is used to identify the le on a le server The other half is a version numb er which

is incremented each time the le is mo died The le identier is returned each time

a le is op ened If the version numb er comp onent do es not match any currently

cached pages are replaced otherwise the cached pages are used Where high sp eed

CHAPTER SURVEY

links are available only pages of executable les are cached On low sp eed links a

lo cal disk is used as a large write through cache memory for b oth executable and

data pages As this lo cal disk acts only as a p ersistent cache of information held

on the le server it requires only limited maintenance When the co de or the user

detects a problem with the disk the disk is reformatted

By using the le system paradigm to generate a uniform name space in which

pro cesses op erate Plan provides a distributed system which uses a single concep

tual mo del allowing access to the full functionality of the system

QNX

The QNX microkernel Hil Var originated in It is currently in use in

a large numb er of real time and distributed applications The small co de size of

the kernel and the ability to build systems without a le system makes QNX well

suited to emb edded applications

The microkernel supp orts system calls These form the interface to the

functions of the QNX kernel interpro cess communication pro cess scheduling and

interrupt dispatching The kernel may optionally contain a network manager which

supp orts lowlevel network communication All system services are accessed through

messages passed by the microkernel

Message passing is implemented using a set of three blo cking functions The

call sends a message to a target pro cess and blo cks the current pro cess until Send

the target pro cess executes a Receive and a Reply call for that message If there

are no p ending messages executing a Receive call causes a pro cess to blo ck until a

message is sent to it The message is transmitted by copying b etween pro cesses No

queuing is employed by these primitives Message queues are supp orted through the

use of IPC servers which are implemented using the three low level communication

primitives Pro cesses can sp ecify that messages b e delivered in priority order rather

than time order and that the pro cess executes at the priority of the highestpriority

blo cked pro cess waiting for service The use of these facilities prevents lower priority

servers preempting a higher priority pro cess by invoking the services of a pro cess

with even higher priority

CURRENT OPERATING SYSTEMS

Message

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

Part Part Part n . .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

Y

o

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

n

MX Table

Figure QNX Multipart Messages

Multipart messaging is supp orted by the low level communication primitives

An MX table see gure indicates where comp onents of a message should b e

fetched from in the sending pro cesss address space or delivered to in the receiving

pro cesss address space Direct supp ort of multipart messages reduces overheads

encountered in other systems where messages must consist of contiguous memory

lo cations Systems which do not supp ort multipart messages typically use memory

copies to gather elements in the transmitter and scatter the elements in the receiver

QNX supp orts the following scheduling p olicies

Preemptive

Prioritized context switching with round robin

First in rst out

Adaptive scheduling

Suciently privileged user pro cesses can connect an interrupt handler to an

interrupt vector within the kernel The handler which runs within the pro cesss

address space is called when an interrupt o ccurs It has access to all the facilities

of the user pro cess On completing its resp onse to an interrupt the handler has

8

These p olicies are implemented in accordance to the draft standard POSIX realtime

CHAPTER SURVEY

the choice of either returning to the kernel or waking the pro cess which provides its

address space The ability to selectively start a pro cess allows signicant events to

b e noted immediately and buering to b e used to improve eciency The micro

kernel conceals the presence of nested and shared interrupts and any hardware

dep endent details from the user level interrupt handler This feature provides a

uniform interface thus simplifying the task of the programmer

When present the network manager is directly connected into the kernel pro

viding ecient access to low level communications b etween QNX kernels Transfers

b etween lo cal pro cesses and remote pro cesses are accomplished by issuing a message

The microkernel identies the message and uses its private interface to the network

manager to queue the message for transmission After the dispatched message is

received by the appropriate no de the network manager on that no de passes the

message to the lo cal microkernel

Multipro cessor supp ort is highly transparent as all services are available in re

sp onse to messages and message passing is handled by a network manager that is

closely b ound to the kernel This simplies the construction of distributed systems

The only mandatory resource manager is the pro cess manager Proc It supp orts

pro cess creation accounting environment inheritance memory management and

rst level pathname management Because a le system is optional within a QNX

system and as there are no other compulsory resource managers proc initially

owns the entire name space Proc may delegate part of the name space to other

managers Name space delegation is conventionally implemented by having proc

maintain a prex tree of delegated pathnames and ensuring that library routines

using pathnames send a message to proc which directs the routine to the manager

corresp onding to the entry with the longest matching prex File systems and

network elements can easily b e incorp orated into this structure by delegating part

of the name space to their managers These managers are resp onsible for parsing

the nonmatching part of a pathname and providing functions corresp onding to

requests directed to them by library routines

The QNX system exhibits the exibility typical of microkernel based op erating

systems This exibility is enhanced further by the use of a manager for network

CURRENT OPERATING SYSTEMS

interfaces and shifting name space management into a user level pro cess The use

of message passing as the basis of all system op erations provides inbuilt network

transparency This has resulted in a small op erating system suited for use in dis

tributed applications

Angel

The Angel microkernel MSWK MWO employs a single bit address space

which is shared completely by the kernel and all pro cesses executing on the system

The Angel kernel was inuenced by p erceived weaknesses and limitations in Meshix

OSW a conventional microkernel The use of message passing within Meshix

was identied as a ma jor p erformance limitation Lightweight RPC mechanisms

were considered however the implementation of these mechanisms would reduce

the strength of interpro cess protection hence reducing the security of the system

Angel replaces RPC mechanisms with a shared address space mo del and implements

secure LRPC mechanisms using shared memory

The microkernel supp orts two ma jor services p ersistent virtual memory and

virtual pro cessor management

The designers of Angel exploit the address space provided by a bit pro cessor

to unify the functions of memory and the le system to pro duce a p ersistent store

accessed through addresses Memory ob jects are xed in the address space The

designers term this a Single Address Space Architecture SASA as each pro cess

nds ob jects at the same place within their address space Improved data sharing

is a ma jor advantage of this style of address usage Data always resides at the same

address allowing data structures to use addresses directly and hence avoiding the

need to enco de data to remove p ointers The single unied interface reduces the

complexity of programs Only addresses are required to identify and use ob jects

The common interface is valuable on Distributed Shared Memory DSM machines

as it eliminates the need for additional mechanisms to use resources lo cated across

the network Shared memory allows LRPC mechanisms to b e implemented through

the use of shared ob jects which allows fast communication b etween coop erating

pro cesses at the security level provided to control access to memory

CHAPTER SURVEY

The kernel supp orts the concept of virtual processors A virtual processor b ehaves

similarly to a UNIX pro cess except that whenever it p erforms an action that would

blo ck it is up called The virtual processor can decide whether it can continue with

some other task rather than surrender its timeslice to another virtual processor

The kernel employs a control structure known as a virtual processor to represent

the virtual processor This data structure incorp orates the state of a pro cess and a

list of upcal ls

An up call is made by the kernel when an event which would blo ck a virtual pro

cessor o ccurs An up call is implemented as a small data structure which conveys the

typ e of the event causing the up call and two further pieces of information sp ecic

to the up call typ e The up call is delivered to the virtual pro cessor if there is su

cient space in the up call list otherwise the up call is ignored The virtual pro cessor

structure sp ecies how an up call is dealt with The options for handling received

up calls are to discard the up call queue the up call for later attention or invoke a

handler asso ciated with the up call typ e

Threads are supp orted by user level co de The kernel has no explicit supp ort for

threads The up call mechanism is used to notify the user level thread package of

events including time based alarms blo cking events and a change in state of lo cks

Angel do es not supp ort the concept of user identiers Instead it determines the

right of a pro cess to access an ob ject based on what a pro cess already has access to

Access to ob jects within the Angel address space is controlled through the use of

ACD They describ e the other ob jects which must b e Access Control Descriptors

accessible to a pro cess b efore this ob ject may b e used A biscuit a unique identier

for an ACD is presented to the ob ject manager to gain access to an ob ject The

ob ject manager conrms that the pro cess has access to any ob jects listed in the

ACD for the ob ject the pro cess wishes to load If the requirements are met the

ob ject manager allows access to the ob ject

Ob jects are comp osed of one or more pages of memory Ob jects must b e distinct

that is ob jects must not overlap and an ob ject may not enclose another ob ject

Ob jects are created with a xed length It is necessary to copy the contents of an

ob ject to a larger ob ject to eectively enlarge its size Angel currently allo cates

CURRENT OPERATING SYSTEMS

backing store to an ob ject when a page is rst mo died This allows sparse ob jects

to b e implemented with a minimum of backing store However it risks having a

write to memory fail for a lack of backing store

The Angel microkernel is a p ersistent system using a large at address space to

hold all ob jects The xed lo cation of ob jects in the address space eliminates the

need to make data structures indep endent of memory lo cation improving access

sp eed and p otentially simplifying the structures The access mechanism of deter

mining the right of a pro cess to access an ob ject based on the ability to access other

ob jects allows the protection domain of a pro cess the set of ob jects it can access

to vary automatically

Chorus

The Chorus distributed op erating system RAA ARG Gie develop ed by

emes uses a message passing based microkernel to supp ort native pro Chorus syst

grams and a UNIX based subsystem Real time application supp ort is incorp orated

into the nucleus

Each site a collection of one or more closely coupled pro cessors within a Chorus

system has a nucleus The Chorus nucleus gure consists of four ma jor parts

The Sup ervisor is machine dep endent and is resp onsible for dealing with

events generated by the hardware The Sup ervisor incorp orates the dispatch

of interrupts traps and exceptions

The Realtime Executive allo cates pro cessors provides negrained synchro

nization and preemptive prioritybased scheduling

The Virtual Memory Manager controls virtual memory hardware and lo cal

memory resources

The IPC Manager provides asynchronous message exchange and RPC func

tions

9

Chorus is a trademark of Chorus systemes

CHAPTER SURVEY

IPC Manager

Realtime Executive

VM Manager

Sup ervisor

Hardware

Machine Dep endent

Portable

Figure The Chorus Nucleus

The four comp onents of the nucleus are indep endent The managers are distributed

with the users of the services b eing unaware of the separation of the comp onents

Access to the functions provided by the managers and b etween managers is by the

standard Chorus IPC mechanism

The nucleus uses a numb er of abstractions to represent ob jects it manages and

op erations on those ob jects Global naming is achieved through the UI unique

identier abstraction The unit of resource allo cation is the actor The address

space of an actor consists of a numb er of regions The thread is the unit of sequential

execution Communication is conducted through messages directed to a port or port

group

A subsystem is comp osed of a numb er of servers op erating co op eratively to pro

vide a coherent op erating system interface Three abstractions are managed by b oth

the nucleus and the subsystem servers The segment is used to encapsulate data a

is used for authentica capability provides access control and a protection identier

tion

UIs are used to identify actors p orts and p ort groups These identities are

CURRENT OPERATING SYSTEMS

unique in b oth space and time global and indep endent of lo cation The Chorus

nucleus provides a service that allows the lo cation of an entity with a UI

A Chorus actor encapsulates an address space divided into regions a set of

p orts and a set of threads The regions are coupled with either lo cal or remote

segments Threads are tied to a single actor and share the resources of that actor

with the other threads of the actor

Actors are tied to a site as are their threads Only physical memory lo cal to a

site is used by an actor The actors on a site have distinct user address spaces and

they share a single system address space The shared address space is lo cal to the

site Three typ es of actors exist

User Actors are not trusted nor privileged

System Actors are trusted but not privileged They may p erform sensitive nu

cleus op erations but cannot execute privileged instructions

Sup ervisor Actors are b oth trusted and privileged They may execute privileged

instructions as well as sensitive nucleus op erations

A p ort may b e attached to only one actor at a time However p orts can migrate

from actor to actor allowing easy reconguration of services Any thread within an

actor may use a p ort held by that actor Any thread that knows the name of a p ort

may send a message to that p ort Port groups are used to provide multicast and

functional addressing Ports may b e added to and removed from p ort groups at

any time A numb er of addressing mo des are provided to supp ort p ort groups It is

p ossible to broadcast to all the p orts in a group to send to any one p ort in a group

to send to any one p ort on a given site and to send to any one p ort on the same

site as the sender

A message consists of a contiguous string of bytes copied from the senders ad

dress space to the receivers address space The copy is optimised If p ossible a page

descriptor is moved b etween the sender and the receiver failing that a copyonwrite

technique is employed

Chorus supp orts b oth asynchronous IPC and synchronous RPC mechanisms

CHAPTER SURVEY

UI of server bits

Key to resource within server bits

Figure A Chorus Capability

A capability see gure comprises the UI of the server which manages the

ob ject and a key which uniquely identies the ob ject Capabilities are used as global

names for ob jects which are not directly implemented by the Chorus nucleus

Each message is stamp ed with the protection identier of the source actorp ort

combination This allows the receiver of the message to verify the identity of the

sender

Deferred copy techniques and lo cal caching are employed by the Chorus micro

kernel to improve p erformance These techniques have a signicant impact on op

erations which copy large amounts of data b etween actors and on IPC and IO

op erations which move small amounts of data b etween segments

The mo del of distributed computing employed by Chorus ties pro cesses to

clusters of tightly coupled pro cessors and employs a p osition indep endent addressing

mechanism to allow the delivery sites of messages to migrate This mechanism

avoids the diculties of shifting pro cesses while it provides a way of distributing the

work load and allowing services to migrate The p erformance of message passing is

improved through the use of caching and deferred copying

Op erating Systems Capability Based

A capability based op erating system uses capabilities to control access to ob jects

and services The capability is used as the primary mechanism for referring to an

ob ject Possession of a capability implies the right to access an ob ject or service

Five capability based op erating systems are covered here Monads KeyKOS

Grasshopp er Opal and Mungi In each of these systems capabilities are the fun

10

KeyKOS is a trademark of Key Logic Inc

CAPABILITY BASED OPERATING SYSTEMS

damental access control mechanism However these systems display widely varying

prop erties and exploit capabilities in diering ways to provide the facilities of the

op erating system

Monads

The Monads pro ject RAa Geh Kee ran from to at Monash

University This pro ject attempted to provide an architecture suitable for the de

velopment of large and complex software systems Sp ecial purp ose hardware was

designed and built to supp ort the Monads op erating system

The designers of the Monads systems applied the principle of information hiding

to the decomp osition of complex systems into mo dules The mo dules communicate

only through the dened mo dule interface and the internal data structures of a

mo dule are inaccessible to other mo dules The mo dule interface consists of a set of

exported functions which may b e called from other mo dules

Under Monads capabilities are viewed as protected p ointers a p ointer which

cannot b e mo died by the pro cess to ob jects which exist in a large uniform address

space Two classes of ob jects are supp orted mo dules and segments The segment

is the base unit of storage and cannot b e shared b etween mo dules A mo dule is a

collection of segments which store all the mo dules information Mo dule capabilities

are used to refer to other mo dules Segment capabilities are used to address within

a mo dule

Every memory access within the Monads architecture makes either explicit or

implicit use of a capability Thus all memory accesses are checked at the instruction

level to ensure that appropriate rights are held to p erform the op eration

A uniform paged virtual memory is used to hold all the data contained in a

Monads system The virtual memory is accessed through the use of large addresses

which identify single bytes Segments are dened by a start address and a limit

They need not b e page aligned see gure

Each segment is intended to contain a single logical entity Segments have ex

11

MonadsPC supp orted bit addresses RAb

12

An array and a pro cedure are examples of logical entities

CHAPTER SURVEY

Page Boundary

Address space Base Limit Segment

Segment List

Virtual Memory

Figure Mapping Monads Segments to Pages

clusive use of the virtual address space they o ccupy They are created in a previously

unused p ortion of the address space and when they are destroyed that p ortion of

the address space b ecomes unusable

Each pro cess is asso ciated with a numb er of base registers which p oint to lists

of capabilities know as segment lists see gure Segments in these lists are

available for use by the pro cess A pro cess cannot directly mo dify either a base

register or a segment list Memory can only b e accessed by sp ecifying a base register

segment numb er and oset A limited numb er of capability registers are provided

to increase eciency This allows memory addresses to b e sp ecied by a capability

register and an oset

The intermo dule call mechanism p erforms several tasks The call instruction

loads the base registers with values appropriate for the called mo dule

creates a lo cal data segment list and asso ciated segments on the stack

loads a base register with the segment list of passed parameters

transfers control to the called mo dule

This limits a mo dule to its own data and any data passed to it The return call

CAPABILITY BASED OPERATING SYSTEMS

Segment List

Base

Segment

Segment List

base segment numb er oset

Figure Addressing Memory Under Monads

removes lo cal segments and restores the previous mo dules environment b efore re

turning control to the calling mo dule

Four typ es of program data are identied in the Monads system co derelated

data lo cal data p ermanent data and retained data Co de related data is created

at compile time and typically consists of constants emb edded in the co de Lo cal

data is used for temp orary storage within a pro cedure It is created on entry to

a pro cedure and destroyed on exit from a pro cedure Permanent data is data

asso ciated with a mo dule Several dierent versions of this data may exist each

b eing asso ciated with a dierent instance of the mo dule typ e The p ermanent data

p ersists until the mo dule is destroyed Retained data is asso ciated with a particular

invo cation of an instance of a mo dule The retained data p ersists b etween calls to

a mo dule and is destroyed when a program terminates

TE and OPEN Each mo dule is required to have two standard pro cedures CREA

These pro cedures are used to supp ort p ermanent data and retained data The

13

Permanent data would b e stored in les in a conventional op erating system

CHAPTER SURVEY

CREATE call is used to generate a new instance of a mo dule Creating a mo dule

is p erformed by making a data segment list and asso ciated data segments This

allo cates space for a permanent data entity which can b e shared by a numb er of

pro cesses The OPEN call is made on the rst invo cation of a mo dule by a program

The call consists of two comp onents the rst creates the retained data segment list

and segments and the second comp onent removes the retained data segments The

second comp onent of the op en call is made when the program terminates

Monads employs an inpro cess architecture The op erating system do es not

run as a separate pro cess to user pro cesses instead op erating system functions are

used by calling privileged pro cedures Monads supp orts only user level pro cesses all

op erating system functions are accomplished through calls on privileged pro cedures

There are no auxiliary op eratingsystem pro cesses

Privileged pro cedures are protected by the same means as normal pro cedures

The data of a pro cedure is inaccessible while the pro cedure is not running While

a pro cedure is executing its data is available to the co de of the pro cedure but the

data must cease to b e available when the pro cedure is no longer executing The

data of a pro cedure includes information held on the stack The Monads hardware

supp orts this mo de of op eration This level of protection is sucient to protect

op erating system functions

The Monads pro ject pro duced a numb er of systems all using sp ecially con

structed or mo died hardware to implement an environment well suited to supp ort

ing small mo dules with strict isolation b etween the mo dules except through the

parameters of a set of exp orted functions Capabilities provide b oth unique names

and a mechanism to implement access restrictions A Clist based architecture is

used to protect capabilities from user pro cess manipulation

OS KeyK

Development of the KeyKOS system BFF Har b egan in and has b een

in use in pro duction systems since KeyKOS is a capability based op erating

CAPABILITY BASED OPERATING SYSTEMS

system with checkp ointing built into the kernel The microkernel was develop ed

to supp ort a secure environment with data sharing high reliability and accurate

pricing An abstract machine interface is presented to each application program

The abstract machine interface may b e exploited to implement either a KeyKOS

service or an op erating system emulation EDX RPS VM a subset of MVS

and UNIX have b een implemented

The original application of KeyKOS was to supp ort British Telecoms Tymnet

service Isolating mutually antagonistic users supp orting highly accurate account

ing and providing hour uninterrupted service were the three ma jor priority re

quirements placed on the op erating system

The architecture of the microkernel is based on concepts a stateless kernel a

singlelevel store and capabilities

All the microkernels state is derived from information held in p ersistent state

The information cached in the microkernel may b e in a dierent format to the

p ersistent information to increase eciency The private information of the micro

kernel may b e discarded at any time as it can b e reconstructed as necessary from

information held in the p ersistent data elements known as no des and pages The

elimination of critical state in the kernel eliminates the need to checkp oint the kernel

and avoids the need for dynamic allo cation of kernel storage

All the data of a KeyKOS system is stored in a p ersistent virtual memory system

Only the microkernel is aware of which pages are present in main memory The

paging system is tied to the administration of checkp oints and p erio dic system

wide checkp oints are used to guarantee the p ersistence of data Both pro cesses and

data are p ersistent Service interruptions app ear to application pro cesses only as

unexplained jumps in the real time clo ck

Capabilities are central to KeyKOSs op eration in that they are used to control

access to ob jects and the sending of messages No other mechanisms are present to

complicate the implementation Ob jects in the system are exclusively referred to by

their capabilities and p ossession of a capability implies the right to use the capability

14

The designers of KeyKOS identify their kernel as a nanokernel architecture but oer no

explanation as to how it diers from a conventional microkernel architecture

15

The authors of KeyKOS uses the term key instead of capability for brevity

CHAPTER SURVEY

and to pass the capability to a third party The right to create a capability is

privileged however the right to duplicate a capability is available to all applications

To prevent forgery capabilities are segregated so that only the microkernel has

access to the capability

Encapsulation of ob jects is enforced by the kernel through the use of message

passing and capabilities Mutually suspicious users are protected from each other

by the capability mechanism For a user to gain access to a service or ob ject it is

necessary for a holder of a service or ob ject to disclose the required capability

Multiple capabilities may refer to a single pro cess An bit eld in a capability

is used to identify the class of capability used to send a message to the pro cess

When a pro cess hands out a capability it can set this eld to a known value This

eld allows the pro cess to partition clients of the pro cess into either service classes

or privilege levels

Six fundamental ob jects are supp orted by the KeyKOS kernel

Device drivers are typically split into two comp onents Low level hard Devices

ware drivers are implemented in privileged co de and p erform the tasks of

message encapsulation and hardware register manipulation The high level

driver is typically implemented as a KeyKOS pro cess

Pages The simplest ob ject in a KeyKOS system is a page The size of a page is

machine dep endent At initialization the numb er of pages that a system can

manage is xed

A page resp onds to read and write messages If a page is mapp ed into a

pro cesses address space then loads and stores on lo cations within a page are

equivalent to the op eration of read and write messages If a page is not present

in memory when a message is sent to it then it is brought into memory b efore

p erforming the op eration on the page

Each page is known by at least one page key and the page has at least one

p ersistent lo cation on disk known as its home location

16

Except where p erformance would b e inadequate

17

In all current implementations of KeyKOS pages are kilobytes in size

CAPABILITY BASED OPERATING SYSTEMS

No des KeyKOS segregates capabilities from direct scrutiny by user pro cesses It

stores capabilities in no des A node key is a capability that gives access to a

no de It is used to add and remove capabilities from a no de

Segments Address spaces are dened through the use of segments which represents

a collection of pages or other segments Segments are sparse and are not

required to b e contiguous They are implemented as a tree of no des with

pages as the leaves of the tree

Meters A meter key entitles the holder to the amount of CPU time held in the

meter corresp onding to the capability There is no guarantee that the CPU

time will b e allo cated in a contiguous unit

Domains A domain consists of general key slots a numb er of sp ecial key slots

and all the nonprivileged state of the hardware available to a KeyKOS pro cess

When a slot in a domain is loaded with a capability the pro cess executing

within the domain is deemed to hold the key The sp ecial slots for a domain

include an address slot which holds the capability for the segment acting as

the address space for the domain and a slot for a meter key which provides

execution time for pro cess executing within the domain

Under KeyKOS a message consists of a parameter word a string of up to

bytes and four capabilities Only capabilities held by the sender can b e sent There

are mechanisms available for sending messages cal l fork and return The fork

mechanism sends a message to the recipient and do es not wait for a reply The

cal l mechanism generates a resume key for the sender and dispatches the message

to the recipient The sender is susp ended and refuses messages until it receives a

message sent using the resume key The return mechanism sends a message leaving

the sender able to receive messages

Messages are not buered If the recipient of a message is unable to handle a

message immediately then the sender of the message is deferred until the receiver

is ready

18

In all current implementations of KeyKOS no des have sixteen slots

CHAPTER SURVEY

On receipt of a message the receiver of the message may cho ose which parts

parameter word string or keys of the message it accepts discarding the remainder

of the message

Perio dically a KeyKOS system checkp oints At this stage all pro cesses and IO

activities are stopp ed any dirty pages are transferred to the current checkp oint

area on disk the alternate checkp oint area is made current and the pro cesses are

p ermitted to resume Pages are then migrated from the rst checkp oint area back

to their home lo cations By ensuring that a second checkp oint do es not o ccur until

the rst checkp oint is handled the system remains in a noncorrupt state

Exceptions are managed through the use of keepers which are asso ciated with

domains segments and meters A message is sent to the appropriate domain keeper if

an exception o ccurs The keeper may either terminate the program supply an answer

and allow execution to continue or restart the instruction Segment keepers are

invoked if an invalid op eration or a protection violation is p erformed on a segment

A meter keeper is invoked when the asso ciated meter runs out This can b e used to

implement complex thread and pro cess scheduling mechanisms

The use of capabilities for access protection and checkp ointing for ensuring

system consistency allows KeyKOS to provide an environment where applications

are secure from b oth external failures and internal attacks

er Grasshopp

The Grasshopp er op erating system DdBF b DdBF a LDdB DLR is

an exp erimental op erating system that seeks to provide a scalable and ecient

p ersistent environment using conventional workstation hardware The op erating

system is b eing develop ed by researchers in the computer science departments at

the University of Sydney and the University of Adelaide

Unlike Monads see section Grasshopp er is designed to run on conven

tional hardware using the existing page translation hardware to supp ort access con

trol and memory mapping A direct consequence is that access control and memory

structuring o ccurs in multiples of pages

The designers of Grasshopp er identify two principles which dene orthogonal

CAPABILITY BASED OPERATING SYSTEMS

p ersistence The requirements are that ob jects exist for the same p erio d as the

ob ject is required and that ob jects are manipulated in the same manner regardless

of their longevity To provide an environment which supp orts orthogonal p ersistence

three fundamental abstractions are employed by the op erating system Containers

are the abstract representation of storage loci represent actions within the system

and capabilities are used to represent a right of access to an ob ject

C

C

C C

Figure Mapping containers under Grasshopp er

Containers provide all access to storage within the system They are p ersistent

and may b e of arbitrary size The contents of containers are derived from sources

sections of other containers and information supplied by a manager Containers

may map in segments of other containers see gure to construct a directed

acyclic graph of dep endencies on other containers The elimination of cycles ensures

that there exists a container that is resp onsible for the provision of data When a

reference is made to a lo cation within a container resulting in a page fault the

kernel determines which container is resp onsible for the delivery of the information

and calls the appropriate manager to make the information available

Managers provide data when it is not resident in memory They are normal user

level programs which reside and execute within their own containers Managers

provide pages of data which are stored in a container resp ond to access faults and

handle data removed from memory by the kernel It is the resp onsibility of the

manager to maintain the coherence and integrity of the data of the containers that

they manage

Manipulative managers store data in a dierent form on p ermanent store to

CHAPTER SURVEY

the form that is present to a lo cus in a container Three examples of manipulative

managers are

Swizzling managers may b e used to supp ort a container that is larger than the

address space of the host hardware When address fault o ccurs in a container

managed by a swizzling manager a page of data is provided which contains

a set of addresses When one of these addresses is dereferenced the swizzling

manager maps in the correct data

Encrypting managers encrypt the data placed on the p ermanent store

Compressing managers compress the data placed on the p ermanent store result

ing in storage savings

The presence of manipulative managers clearly distinguishes the container concept

from the concept of an address space

Lo ci are the active elements which manipulate the contents of containers In prin

ciple a lo cus always executes within a single container known as its host container

The virtual addresses generated by the lo cus are used to access the contents of the

host container By using the mapping mechanism the lo cus can make use of the

contents of other containers Any numb er of lo ci are p ermitted to execute within a

container allowing the op erating system to supp ort multithreaded programs Lo ci

are maintained as p ersistent entities by the Grasshopp er kernel

A lo cus can change its host container by invoking another container see gure

The lo cus enters the container at a lo cation known as the invocation point

which is sp ecied as an attribute of the container The single entry p oint forces the

lo cus to execute co de that is under the control of the invoked container This allows

the invoked container to ensure its own security A parameter blo ck is available

which allows a small amount of information to b e carried b etween the containers by

the lo cus Invo cation is a low cost op eration as the minimal parameter blo ck is the

only context transferred to the invoked container Larger quantities of information

may b e conveyed through the use of an intermediate container

Invo cation is analogous to pro cedure calls in that lo ci may make invoke other

containers and issue a return which takes the lo cus back to the invoking container

CAPABILITY BASED OPERATING SYSTEMS

. . .

......

.. .. .

.

......

. . . . .

.

......

. . . . .

.

......

. . . . .

.

. . . . .

.

......

. . . . .

.

......

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. .. ..

. .. ..

. .. ..

. . . C

C

. . .

......

......

......

......

......

......

......

. . . . .

.

......

. . . . .

.

......

. . . . .

.

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. .. ..

. .. ..

. .. ..

C

. . .

Invoke

C

C

. . .

......

......

......

......

......

......

. . . . .

.

......

. . . . .

.

......

. . . . .

.

......

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. .. ..

. .. ..

. .. ..

. . . C

Invo cation Point

. . .

......

......

......

......

. . . . .

.

......

. . . . .

.

......

. . . . .

.

......

. . . . .

.

. . . . .

......

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. .. ..

. .. ..

. .. ..

. . .

Lo cus

Figure Invo cation under Grasshopp er

CHAPTER SURVEY

The kernel maintains a call chain of invo cations As some lo ci may not require to

return to the container from which they were invoked a mechanism exists to inform

the kernel that no return chain need b e kept

Locus private mappings allow lo ci to share their host containers while retaining

private data which is visible only to a single lo cus These mappings take precedence

over host container mappings Lo cus private mappings simplify the implementation

of multithreaded programs and provide a mechanism for a lo cus to keep information

secret from other threads executing in the same container This feature is typically

used to implement stacks without the need to ensure that stacks are separated

throughout the address space p erceived by the lo ci in a container

Grasshopp er employs capabilities to provide unique naming and access control

Capabilities are stored in list structures segregated from containers and lo ci Seg

regating the capabilities simplies the task of garbage collection A reference count

is maintained on each ob ject and when all capabilities relating to the ob ject are

deleted the ob ject is removed Segregation of the capabilities allows the kernel to

keep an accurate count of valid capabilities for an ob ject

Conventional hardware is used to supp ort Grasshopp ers protection system The

use of capabilities to refer to coarse grained ob jects sp ecically containers and lo ci

allows conventional hardware to provide adequate security without an excessive

overhead

A lo cus has access to a numb er of sets of capabilities

The set of capabilities contained in the lo cus private list of capabilities

The set of capabilities contained in the list of capabilities asso ciated with the

host container

A lo cus can move capabilities into and out of the lists or p erform an op eration

using a capability through functions provided by the op erating system Sp ecic

capabilities are identied by nominating a capability list and a key

Managers are capable of and resp onsible for the maintenance of a consistent

recoverable state of their containers

Grasshopp er provides an environment where pro cesses and ob jects are p ersis

tent The abstractions provided by the Grasshopp er system decouple the address

CAPABILITY BASED OPERATING SYSTEMS

space from the active agents within the system and provides a mechanism for trans

parently transforming the contents of an address space based on the actions of an

agent

Opal

Opal CLFL is a capabilitybased single address space op erating system SASOS

It is under development in the Department of Computer Science and Engineering

University of Washington Seattle Like Angel see section this op erating sys

tem exploits the app earance of bit address space architectures to provide a single

address space in which ob jects are uniquely identied by their addresses Password

Capabilities are used to protect access to ob jects Opal is currently implemented as

a prototyp e based on the Mach microkernel

The Opal system is based up on the principle that addresses have a unique in

terpretation for all applications That interpretation is indep endent of the user of

the address hence a thread may name any data item Protection the control of

access to an ob ject is managed through protection domains which p ermit a thread

to have access to a sp ecic set of pages at a given instant

Single address space op erating systems have the advantage of ease of sharing

datastructures containing p ointers b etween threads pro cesses For private address

space op erating systems to share structures containing p ointers it is necessary to

convert the data to an intermediate representation which is shared or to co erce

the pro cesses using the shared datastructure to load the data at a xed known

lo cation in their address space Po or compromises typical of conventional systems

b etween protection p erformance and integration are avoided by SASOS These

b enets follow from the twin factors eliminating the need to transform data b eing

shared b etween threads and employing protection based on protection domains

which give sp ecic p ermission to a thread to access shared data

Under Opal a segment is dened as a continuous extent of virtual pages The

virtual address of the segment is xed at the time of allo cation and remains un

changed during its existence Segments are the base unit of storage and protection

Nontransitory data is held in segments marked persistent

CHAPTER SURVEY

A protection domain consists of a set of access rights to segments at a given

instant in time A thread executes in a single protection domain and has the access

rights conferred by that domain Threads may execute simultaneously in a given

protection domain and hence have access to the same collection of segments

Passwordcapabilities are used to control access to resources such as protection

domains and segments Opal capabilities are bits in size and confer p ermission

to op erate on an ob ject in sp ecic ways

Given a capability a thread can attach that segment to its protection domain

making the segment represented by the capability available to all the threads of the

protection domain A segment can b e made inaccessible by detaching it

Calls are made across domains through portals A portal consists of a xed entry

p oint into a domain identied by a unique bit portalID Any thread knowing a

p ortalID can transfer control into the domain asso ciated with the p ortal

Opal implements passwordcapabilities as an extension of the interdomain com

munication mechanism A capability consists of a bit p ortalID a bit ob ject

address and a bit random check eld The p ortalID is used to make a call on

the server which manages the segment or domain represented by the capability The

check eld p erforms the functions of identifying the set of rights conferred by the

capability preventing forgery and p ermitting revo cation The server either grants

or denies access to the resource represented by the capability

In a passwordcapability system it is not p ossible to distinguish an instance

of a capability from ordinary data Thus these systems cannot determine when

there are no references to an ob ject and hence determine a time when it is safe to

destroy the ob ject and reclaim resources devoted to it Opal uses a reference count

asso ciated with each ob ject to determine when it is safe to destroy an ob ject The

reference count is incremented each time a segment is attached and decremented

when a segment is detached The reference count keeps track of the numb er of

protection domains which can make direct use of a segment rather than the numb er

of capabilities existent for a segment In addition a segment can b e made persistent

causing it to remain after the last detach

19

An alternative interpretation is that the reference count indicates the numb er of entities

that have registered an interest in a resourceCLFL

CAPABILITY BASED OPERATING SYSTEMS

The basic reclamation mechanism is sub ject to errors It is p ossible to arrange

b oth the premature release of resources and the p ermanent commitment of resources

cts and resource groups to overcome these errors Opal implements reference obje

Reference objects allow private reference counts to b e created for each group of

entities who wish to access a shared ob ject These groups are typically comp osed

of mutually trusting threads The groups are suspicious of other groups which have

access to the shared ob ject Only when all the reference counts are exhausted will

the ob ject b e deleted The reference counts ensure that the ob ject p ersists until

all suspicious groups have relinquished an interest in the ob ject Resource groups

are used to release resources and reference counts held by an entity when the entity

ceases to exist The capability for a resource group is passed as a hidden argument

in any call that creates a resource or increments a reference count When a resource

group is destroyed the references are released A thread may alter its resource

group at any time This mechanism ensures that all resources can b e reclaimed

even if the normal reclamation mechanism fails

The Opal system provides a single address space in which threads are able to

eciently share structures containing p ointers Access control is p erformed using

passwordcapabilities supplemented with resource groups to assist in garbage col

lection and reference ob jects to assist in sharing b etween mutually distrustful

threads

Mungi

Mungi HERV is a SASOS under development in the Scho ol of Computer Science

and Engineering at the University of New South Wales The goals of their system

are similar to Opal see section in that b oth supp ort direct sharing of ob jects

containing p ointers through the use of a bit address p ersistent address space

Mungi is designed to supp ort a medium sized network of homogeneous machines

It provides a single virtual address space transparently distributed over the no des

Passwordcapabilities are used for naming and access control of ob jects Ob jects

have a paged sized granularity No sp ecial hardware is required by the system to

supp ort the op erating system as the existing page translation hardware provided by

CHAPTER SURVEY

the pro cessor is used

Under Mungi ob jects are dened as contiguous sequences of pages Ob jects are

the base unit of allo cation and protection The op erating system lo cates ob jects via

the Object Table OT The OTs entries list an ob jects address size accounting

information and passwords with corresp onding access rights The OT is distributed

and is organised as a B tree The OT is constructed from a numb er of ordinary

ob jects These ob jects are distributed across no des by partitioning the address

space according to no de and placing the lo cal no des OT bucket in the address range

asso ciated with the lo cal no de By partially replicating the no des of the tree with

readonly copies the numb er of accesses required b etween no des can b e reduced

Pages may exist in one of six states on a given no de resident ondisk remote

zeroonuse unallo cated and unknown The no de holding the master copy of an

allo cated page is dened as the owner of the page Additional readonly copies of the

page may exist Pages can migrate and hence their owner may change A location

hint is stored for nonlo cal pages If no location hint is available then the kernel

may infer one based on the location hints for surrounding pages When referring to

a nonlo cal page the no de contained in the location hint is contacted If the page is

not at that lo cation either the no de uses its lo cation hint to forward the message

or a broadcast message is sent to lo cate the no de

Attempting to p erform a read op eration on a page which is not present on

the no de results in a readonly copy of the page b eing transferred to the no de

Attempting to p erform a write op eration on a page which is not present on the

no de or present as a read only page results in the pages contents and ownership

b eing transferred to the requesting no de

The global address space is partitioned A partition is mounted on a no de which

is made resp onsible for the creation and deletion of ob jects which app ear within that

partition This mechanism is employed to simplify the management of memory by

ensuring that knowledge of unallo cated pages is only required on the creating no de

The creating no de has no information ab out the lo cation of pages from the segments

it manages apart from that gained through the reference mechanism available to all no des

CAPABILITY BASED OPERATING SYSTEMS

To improve eciency Mungi supp orts classes of data ob jects

Transient and unshared

Transient and shared

Persistent

Each data class is allo cated in a separate partition of the address space By ensuring

that transient ob jects are allo cated in a section of the OT that is lo cal to the no de

and will remain so the sp eed of creation and destruction of ob jects is enhanced

Capabilities under Mungi consist of a bit address and a bit password

Asso ciated with each capability is an ob ject and a set of rights p ermitted over the

ob ject drawn from the collection read write execute and destroy Capabilities

can b e derived and revoked

Every kernel in the system has access to an ob ject which contains the capability

tree Ctree see gure A Ctree is constructed from protection nodes Pno de

A Pno de may contain a p ointer to a list of capabilities Clist andor a p ointer to

a protection fault hand ler Each user is assigned a p ointer to a Pno de which is used

to dene the pro cesss regular protection domain RPD This p ointer p oints to a

leaf Pno de All capabilities found in Clists on the direct path b etween the leaf no de

and the ro ot are accessible to the user Protection fault handlers can b e supplied

by the user to supply an alternative search strategy These handlers either return a

capability or a failure indication If a capability is returned then the kernel makes

use of that capability otherwise the search towards the ro ot of the tree is continued

Protection is managed through the use of Active Protection Domains APD

Each APD consists of a list of Clists and protection fault handlers It is constructed

when a user logs in from their RPD User pro cesses can mo dify APDs by adding or

removing Clists or handlers Pro cesses can b e created which op erate in a limited

protection domain by locking the APD in which the pro cess op erates This allows

untrusted co de to b e used in a controlled environment

Mungi also provides a facility to allow co de to temp orarily change protection

domains This feature is analogous to the UNIX setuid facility Protection Domain

Extension PDX o ccurs when pro cedure in a PDX ob ject is called The kernel

CHAPTER SURVEY

R

R

R

Clist

Protection fault handler

Pno de

Figure Capability Trees Under Mungi

OBSERVATIONS AND TRENDS

veries that the address of the pro cedure matches a valid entry p oint and pro ceeds

to push a Clist p ointer asso ciated with the PDX ob ject onto the APD This gives

the PDX pro cedure access to a set of capabilities unavailable to the pro cedures in

the original APD On return from the PDX pro cedure the Clist p ointer is p opp ed

o the APD

Mungi provides a sophisticated capability based protection system on a SASOS

By combining the features of restricting an APD b efore invoking a pro cedure and

using PDX pro cedures it is p ossible to construct environments with protection

domains tailored to minimise access b etween mutually nontrusting pro cesses

Observations and Trends

Op erating system design is inuenced by a large numb er of often conicting factors

Most prominent among these factors are the choice of abstractions made by the

system designer the functionality of the target hardware and user exp ectations

Examination of these factors provides insight into trends in the design of op erating

systems

The following trends are identied

The trend towards supp orting large numb ers of platforms and the impact of

this design decision on b oth hardware and software design

The emergence of small kernels and microkernels as the preferred design path

over monolithic kernels

The use of capabilities to supp ort other paradigms

An increase in supp ort for distributed computing environments

An increase in research into p ersistent systems contrasted with the low rate of

acceptance of p ersistent environments in commercial computing

The provision of memory ob jects as directly manipulable and shareable ob jects

b etween pro cesses

CHAPTER SURVEY

Op erating Addrs Kernel Paradigm Platform

System size Typ e

UNIX M F x Alpha Rx

y

Sparc VAX etc

Amo eba u cDOT MicroVax I I Sparc

Mach u cDOT IBM IBM RTPC

y

NeXT Sun VAX etc

Plan S FD SGI Power Series

Nextstation MIPS Magnum

Sun SLC ATT Safari PC

QNX u FD

z

Angel u SDPOt SunOS Tadp ole MK

Chorus u cDOt x COMPAQ

R Sparc Inmos TT

Monads M CPO Custom Hardware

KeyKOS u CPO x x IBM

Grasshopp er CPOT Sun Alpha

Opal CSOT R Alpha

Mungi CSDPOT

1

Minimum address size required in native implementations

2

Kernel Typ e u microkernel S small kernel M monolithic

3

Paradigm S single address space C capability based c uses capabilities

based F access control via le system D distributed P p ersistent O Ob ject

T supp orts threads with concurrent execution on multipro cessor

t supp orts threads with single thread p er pro cess active

4

Partially supp orted pro cessor families have the least capable suitable memb er listed

y

Many other systems

z

Hardware is emulated on SunOS based server

Hosted by a Mach server

Information not available

Table Summary Table for Reviewed Op erating Systems

OBSERVATIONS AND TRENDS

The emergence of op erating systems supp orting the thread paradigm

The ma jority of the op erating systems reviewed in this chapter are supp orted

on a large numb er of platforms see table Supp ort of a wide range of plat

forms requires the op erating system to conceal the dierences b etween platforms by

presenting the application develop ers with a virtual machine indep endent of the im

plementation platform The trend of providing uniform environments over dissimilar

hardware is not recent although it has signicant implications for b oth hardware

platforms and op erating system software The prime consequence is a tendency for

op erating system designers to use only features found on the least capable of the

target pro cessors As the system needs to run acceptably on all target pro cessors

there is an additional requirement on the designer to make the system run well in the

absence of any pro cessor sp ecic features which might improve p erformance This

has led to a self reinforcing trend in b oth hardware and software design where the

software designer exp ects only a minimal set of features provided by the hardware

and the hardware designer provides only a minimal set of features Hardware which

supp orts segments evidences this trend Only of the commercial pro cessor typ es

listed supp ort segments IBM and IBM RTPC The remainder of the

architectures supp ort only paging A numb er of the op erating systems in this survey

may b enet from the ne grain access control provided by segments however the

lack of wide spread architectural supp ort has led designers to use coarser grained

paging based schemes to provide access control and to typically ignore segmentation

hardware even when present

New op erating systems tend to b e Microkernels or small kernels There are

several reasons for this trend

Verication users are now requiring verication of software in mission critical

applications Monolithic kernels are not well suited to verication as they

tend to b e large and complex In addition op erating system comp onents tend

to have multiple p oints of interface This increases the complexity of the

analysis required to prove the system Small kernels oer a limited numb er

of services and tend to b e less complex This reduces the amount of critical

20

Forerunner of the PowerPC architecture

CHAPTER SURVEY

co de which needs to b e veried at the core of the system simplifying the

task of proving that the basic functions of the op erating system are correct

By delegating functions found in monolithic kernels to user level servers for

example management of the lesystem which have well dened interfaces

the task of verifying these functions is simplied

Cost By reducing the size of the kernel co de the task of writing and maintaining

the co de is reduced

y As only essential services are provided by system level co de it is Flexibilit

p ossible for services to b e written which suit a given work environment without

encountering any loss in p erformance compared to a standard system service

Multiple Personalities Microkernels allow the use of multiple personalities of

op erating systems on a single system As the op erating environment p erceived

by the user is provided by user level co de it is p ossible to run several dierent

environments on a single system

The trend towards developing small kernels is likely to continue as it reduces de

velopment costs and increases exibility The trend is not without a price as there

is a p otential loss of eciency when servers communicate Research is b eing con

ducted into reducing the cost of communication in microkernel systems however

it is clear that if similar techniques are applied to monolithic kernels any sp eed

advantage gained by microkernels would b e negated This has led to the devel

opment of a class of op erating systems which employ a microkernel architecture

These op erating systems are essentially monolithic kernels with a mo dular design

discipline In principle this simplies the task of writing and verication to the

same level as a microkernel In practice unless a strongly typ ed language with

rigorously checked p ointers is used the absence of address space separation b etween

mo dules of the op erating system allows unforeseen interactions to o ccur negating

some of the advantage over a standard monolithic kernel

Table indicates that the capability paradigm has b een taken up by a numb er

of current op erating systems The use of capabilities is likely to continue as they

OBSERVATIONS AND TRENDS

provide a mechanism which solves a numb er of problems Capabilities oer the

following advantages

storage allo cation and proto cols are simplied as capabilities are of xed size

verication of the sender of a capability is typically unnecessary which simpli

es the implementation of system security

unique naming scheme simplies the generation of names

Op erating systems supp orting distributed computing environments are gaining

imp ortance This p oints to a p erceived need to provide facilities which allow pro

cesses to coop erate over multiple machines The tendency towards multiple pro

cessors is cost driven as it is typically less exp ensive to purchase a numb er of less

capable machines rather than a single highly capable machine

Persistence is a current research topic The KeyKOS op erating system is the only

commercial p ersistent system reviewed here Other commercial p ersistent op erating

systems exist such as IBMs AS ST However there is relatively little

do cumentation relating to their hardware and op erating systems publicly available

It is also p ossible to build p ersistent systems over a Mach microkernel Persistent

systems are highly appropriate for high reliability applications but the uptake of

these systems into other environments has b een slow

Many of the op erating systems in table provide memory ob jects as a class

of items which can b e directly manipulated and shared by user programs This

trend is consistent with observations that applications on conventional le based

systems sp end a signicant amount of their time in converting le based structures

to memory based structures and viceversa Among the le based op erating systems

UNIX has addressed this problem by providing a mechanism which allows les to

mapp ed into a pro cesss memory This mechanism approximates the provision of

memory ob jects to user programs

Many of the op erating systems surveyed have implemented a mechanism which

allows more than one thread of execution commonly known as threads to exist

21

The mmap call was originally sp ecied for BSD although it was not implemented in the

shipp ed version of that releaseLMKQ

CHAPTER SURVEY

within the address space of a pro cess There are two ma jor variants The simpler

mechanism allows several threads to share the timeslices given to a pro cess This

mechanism do es not allow more than one thread to execute at a time It inter

leaves the execution of the threads within the timeslice The mechanism can b e

implemented at user level using up calls to switch b etween threads when an event

o ccurs Angel uses this metho d or it can b e managed by the kernel Chorus uses

this metho d as actors are tied to a pro cessor and several threads may inhabit an

actor The more complex variant allows threads to execute in parallel by using

dierent pro cessors to execute the parallel threads It should b e noted that SASOS

like Mungi and Opal do not require explicit supp ort for threads as all pro cesses have

full access to the address space

The thread paradigm is b ecoming more p opular in b oth its implementations as

it provides mechanisms that simplify the task of application programmers who wish

to p erform op erations in parallel The shared address space provided by threads

frequently comp ensates for diculty in sharing data b etween pro cesses Combining

this with the lower cost of switching b etween threads of a single pro cess than b etween

pro cesses provides incentive for programmers to use threads to pro duce programs

which p erform actions seemingly simultaneously The negative asp ects of the use

of threads is that interactions b etween comp onents of a single program cease to

b e deterministic and the absence of protection oered by using separate pro cesses

allows threads to readily interfere with each other

Chapter

The Walnut Kernel

The Walnut Kernel intends to achieve the b enets of the PasswordCapability Sys

tem without requiring the sp ecial hardware used by the PasswordCapability System

That is supp orts the use of passwordcapabilities to provide protection and sharing

among multiple users ob ject p ersistence and the free mixing of capabilities with

other user data but using only the hardware likely to b e available on most general

purp ose computers

The survey chapter identied a numb er of trends in the features of op erating

systems The Walnut Kernel exhibits several of these features Few of the systems

surveyed required sp ecial hardware It b ecame clear that to gain wide acceptance

the kernel must b e supp orted on a wide range of platforms and must rely only on

the lowest common denominator of adequate hardware

The Walnut Kernel follows the trend towards small kernels or microkernels The

Walnut Kernel and the small kernel and microkernel systems surveyed were moti

vated by reducing the size of the task of implementing the kernel of the op erating

system multiple p ersonalities and easing the task of verication Small kernels can

b e successfully implemented by small groups with limited time and resources the

Walnut Kernel was pro duced by one such group The task of implementing the p er

sonalities of the op erating system can b e left to others outside the kernel development

group as p ersonalities are user level programs which make calls on kernel functions

The presence of multiple p ersonalities running on a single machine is attractive as

it allows users to start working in a familiar environment and migrate to the native

CHAPTER THE WALNUT KERNEL

environment as they require the features provided by the kernel Verication and

the requirement for strong security were common themes of the implementors of the

systems surveyed As with the PasswordCapability System the security mo del of

the Walnut Kernel is simple and can easily b e shown to b e probabilistically secure

providing the rst step in verifying the security of the system

Protection and security are ma jor features of b oth the Walnut Kernel and the

PasswordCapability System The passwordcapability mechanism allows the imple

mentation of tight security and access control without imp osing a hierarchy of access

privileges and without the concept of the ownership of ob jects Implementors of

p ersonalities on top of the Walnut Kernel have the freedom to construct arbitrary

protection mechanisms including schemes based on ownership and hierarchies Fur

thermore the exibility and security provided by the kernel are gained at very low

cost The kernel requires capabilities to b e p erio dically revalidated Revalidation

is accomplished by invalidating a capability and the pages of the ob ject at regular

intervals The capability is validated again when a page fault o ccurs when accessing

a page of the ob ject This adds only a small overhead to the cost of servicing page

faults when compared with other systems

The Walnut Kernel is designed for use by most pro cessors that supp ort virtual

memory A parallel pro ject in the Department of Computer Science Monash Uni

versity was to develop hardware to supp ort a general purp ose multipro cessor which

can b e scaled from a single pro cessor to a massively parallel multipro cessor Chapter

describ es this hardware The Walnut Kernel and other op erating systems may

b e supp orted by that hardware design

Chapter

The User Persp ective

This chapter provides an overview of the features of the op erating system visible to

a user pro cess App endix A provides detailed information for writing user pro cesses

including a detailed description of the contents of the wall see section a map

of the pro cess address space and the arguments of kernel calls Chapter provides

the rationale for the design features describ ed in this chapter

The Walnut Kernel draws on exp erience gained in the PasswordCapability Sys

tem and hence has adopted many of the ideas found in that system after adapting

them for use in the new environment

This chapter initially outlines the environment that is presented to a pro cess

Subsequently the op erations available to a user pro cess and interpro cess communi

cation are discussed

This chapter do es not contain references to inputoutput les or users as these

concepts are not dened at the kernel interface level

Volumes Ob jects and Capabilities

Volumes represent the physical media on which the p ersistent storage of the Walnut

Kernel resides The volume number is a unique identier p ermanently asso ciated

with each physical storage device used for p ersistent storage by the Walnut Kernel

In addition a special volume is used to represent the memory o ccupied by the kernel

and memorymapp ed interfaces to hardware devices

CHAPTER THE USER PERSPECTIVE

Top of highest

referenced page

Maximum

Limit

Oset

Origin of ob ject

Avail Pages

Undened pages within ob ject

Potential Undened Pages

Dened Pages within ob ject

Figure An ob ject

VOLUMES OBJECTS AND CAPABILITIES

Under the Walnut Kernel all the data available to user programs is stored in ob

jects These ob jects are analogous to segments in a pagedsegmented architecture

They see gure consist of collections of pages dened by a maximum oset a

limit the maximum value to which the maximum oset can b e increased within the

ob ject an amount of money and a numb er of pages guaranteed to b e available to

it

Ob jects have the following prop erties

Ob jects are p ermanently asso ciated with a volume Ob jects cannot span more

than one volume nor can they b e moved b etween volumes

Pages are allo cated when the rst reference is made to them To prevent data

leakage pages are blanked by writing zeros into all the lo cations b efore the

page is made available to the user

If the numb er of guaranteed pages has b een exceeded and there are unreserved

pages on the volume then additional pages are allo cated to the ob ject If there

are no unreserved pages available then an exception will o ccur

Attempts to access b eyond the limit of the ob ject will result in an exception

The limit of an ob ject can b e increased by the resize kernel call

The main memory acts as a cache of ob jects

The Walnut Kernel uses PasswordCapabilities see gure to provide naming

and access control to ob jects The volume and serial parts of a valid capability

identify an ob ject on a volume Asso ciated with each capability is a set of attributes

which includes a set of rights and a view The password comp onents are used to

identify the rights the holder of the capability is allowed to exercise over the ob ject

1

A view is an attribute of a capability It denes the region of the ob ject that can b e addressed

by the p ossessor of the capability Views are contiguous regions and are dened by an oset from

the base of the ob ject and an extent An imp ortant distinction from the PasswordCapability

System is that the view entitles the user to address part of an ob ject it do es not guarantee that

pages are contained in that region nor that the pages are readable to the user This constraint is

particularly relevant to pro cess ob jects

CHAPTER THE USER PERSPECTIVE

bits bits bits bits

Volume Serial Password Password

Figure Structure of a Walnut Kernel Capability

The Walnut Kernel supp orts several sets of rights system rights user rights

drawing rights and message rights The system rights consist of a set of bits which

determine whether the holder is p ermitted by the system to p erform an op eration

The system rights are listed in table The user rights consist of a set of

bits which are managed by the kernel The kernel attaches no meaning to the user

rights bits They are intended to b e used by user pro cesses to implement access

to services in a way that is analogous to the control system rights bits have over

access to kernel services The drawing right of a capability is the amount of money

that can b e withdrawn through the capability or its descendents see section

Message rights are only relevant to pro cesses as they determine which subpro cess

or subpro cesses a capability can b e used to send a message to There are three levels

of restriction the capability can b e used to send a message to any subpro cess of

the pro cess the capability can b e used to send a message to any subpro cess of the

pro cess except for subpro cess zero subpro cess zero is discussed in section or

the capability can b e used to send a message only to a sp ecic subpro cess

When an ob ject is created it is allo cated a master capability If the master

capability is deleted the ob ject is destroyed When the master capability is created

it has system and user rights sp ecied by the creator and provides a view which

covers the ob ject All other capabilities which refer to this ob ject are derived from

the master capability or its descendents All capabilities referring to a given ob ject

have identical volume and serial elds The password elds are selected randomly

with the constraint that for all capabilities for an ob ject the Password is unique

The Walnut Kernel supp orts two mechanisms for generating new capabilities

making new ob jects and deriving capabilities from existing capabilities The pro cess

of deriving capabilities is fundamental to the function and security of the Walnut

Kernel Derived capabilities see gure are made by presenting the kernel

VOLUMES OBJECTS AND CAPABILITIES

SRDERIVE Allow capabilities to b e derived from this capability

SRSUICIDE Allow this capability to destroy itself and its children

SRDEPOSIT Allow the holder of this capability to dep osit money

into it

SRWITHDRAW Allow the holder of this capability to withdraw

money from it

SRREAD Allow the holder of this capability to read from the section

of the ob ject covered by this capability

SRWRITE Allow the holder of this capability to write to the section

of the ob ject covered by this capability

SREXECUTE Not used

SRUSER Allow user pro cesses to use this capability

SRPEEK Allow the holder of this capability to p erform a p eek sys

tem call see A on the pro cess represented by this capability

SRMULTILOAD Allow this capability to b e loaded by any pro cess

If this right is absent then only pro cesses with a serial numb er

equivalent to the capabilitys password may load this capability

Table System Rights

CHAPTER THE USER PERSPECTIVE

Presented Parameters Derived

Cap Capability

Limit

View

Oset

View Region View

ts Righ

Rights Mask Rights

Figure Derivation

VOLUMES OBJECTS AND CAPABILITIES

and a limit and invoking the derive with a capability a rights mask an oset

op eration The system rights of the derived capability are the logicaland of the

rights of the presented capability and the rights mask The message rights may b e

further restricted The drawing right is arbitrary and may exceed that of the parent

capability The view presented by the derived capability is the region of the ob ject

that is covered by the intersection of the view of the presented capability and the

region covered by the oset and limit

Derived capabilities at the time of derivation have equal or lesser rights than

than their parent capability Suicide right is an exception as this right may b e added

to the children of capabilities which do not hold this right

The rights of a parent capability may b e reduced through the use of the restrict

kernel call after a child capability has b een derived The child capability is unaected

by the restriction of the parent capabilities rights

Each ob ject is assigned an ob ject typ e when it is created The creator may

cho ose any bit value for an ob ject typ e with the exception of a few system

reserved values and the most signicant bit of the typ e The most signicant bit of

the ob ject typ e is set for pro cess ob jects and clear for other ob jects

Pro cess ob jects are distinguished from other ob jects in the system as part of

the contents are interpreted by the op erating system The pro cess ob jects ma jor

prop erties are

it denes an address space in which pro cesses op erate by a Table of Loaded

Capabilities TLC which is contained in the pro cess ob ject

it contains the state of or more subpro cesses

Although the master capability of a pro cess allows all the contents of the pro cess

ob ject to b e addressed parts of the pro cess ob ject are readonly and other parts

p ermit no attempts at access see section

2

An ob ject typ e is a bit value that can b e accessed by holders of an ob jects capability

CHAPTER THE USER PERSPECTIVE

Space Pro cess Address

The address space seen by a running pro cess is dened by the contents of the Table

of Loaded Capabilities TLC This table consists of a list of capabilities and their

mappings into windows The mapping describ es the oset from the start of the

ob ject the extent of the addressable region and the access rights conveyed by the

capability The address space of a Walnut Kernel pro cess gure consists of a

numb er of regions

The kernel area is lo cated at the low end of the pro cesss address space x

to xfffff and is neither readable nor writable by pro cesses The wall a single

is an exception as it can b e read by pro cesses page lo cated at xc

The wall is a page mapp ed into all pro cesses which contains public information

This information includes the capabilities of utilities resources and the current time

Pro cesses known as wall managers can up date the contents of the wall

The small window area x to xffffff is lo cated ab ove the kernel

area This area can b e loaded with views of ob jects which start and end on arbitrary

page b oundaries

and extends through to xffffff The pro cess ob ject is loaded at x

Figure is a map of the pro cess ob ject The pro cess header is not accessible

to pro cesses The address map contains information that allows capability indices

to b e translated to and from lo cations in the pro cess address space It is typically

used to translate addresses to capability indices The parameter page consists of

the parameter blo ck and the message area It is used to transfer information

when kernel calls are made The remainder of the pro cess ob ject can b e used for

the storage of data or co de

The large window area extends from lo cation x to the top of the

system memory This area can b e loaded with views of ob jects which start on

osets exactly divisible by x and either end on an oset exactly divisible

by x or end on the end of the ob ject

The Walnut Kernel allows up to views of ob jects to b e loaded into a pro cesss

address space

PROCESS ADDRESS SPACE

x

Large

Window

Area

Remainder

x

Parameter Page

Pro cess

Address Map

Ob ject

Pro cess Header

x

Small

Window

Area

x

Kernel

Area xc

x

Read Only

No Access

The Wall

Read Write

Figure Pro cess Address Space

CHAPTER THE USER PERSPECTIVE

Region Start End Access

Pro cess Header xxefff No Access

Address Map xfxffff Read Only

Parameter Page xxfff Read Write

Parameter Blo ck xxb Read Write

Message Area xcxfff Read Write

Remainder xxffffff Read Write

Figure Map of the Pro cess Ob ject

Pro cesses and Subpro cesses

A pro cess denes an environment in which subpro cesses op erate The environment

consists of an address space sp ecied in section and a collection of resources

When a pro cess is created the maximum numb er of subpro cesses and the numb er

of mailb oxes are set The numb er of these pro cess resources cannot b e varied during

the existence of the pro cess Two subpro cesses are created when a pro cess is created

subpro cess zero subpro cess zero is discussed in section and subpro cess one

Subpro cess one executes user co de It or its descendents can create other sub

pro cesses all of which execute co de that has b een mapp ed into the pro cess address

space Subpro cesses have no protection from the actions of other subpro cesses within

the same pro cess

Every subpro cess has a priority in the range of to and a wakeup time

Subpro cess zero has a unique priority of the highest p ossible The wakeup time

is a clo ck time in seconds A subpro cess will not run until the system clo ck equals

or exceeds its wakeup time

The scheduling of subpro cesses is similar to the scheduling of pro cesses on a

single pro cessor time sharing system When a subpro cess of a pro cess is executing

no other subpro cess of that pro cess can b e executing The algorithm for determining

which subpro cess of a pro cess to execute in the current timeslice is as follows

If a subpro cess is executing and there is a nonzero value in the reserve eld

of the parameter blo ck resume execution of that subpro cess

MESSAGES AND MAILBOXES

Execute the subpro cess with the highest priority which is not waiting

For subpro cesses of equal priority select the rst subpro cess encountered in

the subpro cess table

and Mailb oxes Messages

Each pro cess has a numb er of mailb oxes allo cated when it is created Mailb oxes

may b e congured to reject all mail or accept messages addressed to a sp ecied

subpro cess andor messages starting with a sp ecied string The application pro

grammer can guarantee the availability of suitable mailb oxes for a given typ e of

message provided the application is not already holding p ending messages of that

typ e

When a pro cess is created a mailb ox is op ened for subpro cess zero All the

other mailb oxes of the pro cess are initially closed until the pro cess explicitly op ens

them This allows the pro cess to p erform its initialization b efore starting to handle

messages

Messages p erform the following functions under the Walnut Kernel

Data Delivery up to words of information can b e transfered to the desti

nation pro cess in the b o dy of a message

Money messages can deliver money to a pro cess Attached to each message

is a sum of money which is transfered to the destination pro cess when the

message is stored in a mailb ox Negative sums of money are not p ermitted

Scheduling Control when a message is delivered it wakes up a sleeping pro cess

and makes the messages target subpro cess runnable by resetting its wakeup

time

Messages and subpro cesses provide a mechanism for encapsulating events which

are not synchronized with the op eration of a pro cess

The presence of a message in a mailb ox prevents a pro cess from waiting

There are two ways of directing messages to sp ecic subpro cesses

CHAPTER THE USER PERSPECTIVE

Using a capability that is restricted to send to only one subpro cess of a pro cess

ensures that any message sent with that capability will only b e delivered to

the subpro cess named by the capability

A subpro cess must b e sp ecied as a parameter of a send op eration when a

capability which is not restricted to a single subpro cess is employed

The rst mechanism is typically employed by server pro cesses giving capabilities to

pro cesses which will b e making use of the servers facilities This mechanism ensures

that messages cannot b e delivered to other subpro cesses of the server and this

limits the opp ortunity for b oth error and malicious attack The second mechanism

is typically employed by holders of a pro cesss master capability

Mailb oxes may b e set up to admit only sp ecied typ es of messages This selective

acceptance of messages was intro duced to ensure that imp ortant messages could b e

guaranteed delivery even if all other mailb oxes were full Mailb oxes can b e reserved

according to the following criteria

Subpro cess numb er A mailb ox can b e reserved so that it will only accept

messages addressed to a sp ecied subpro cess

Message prex A mailb ox can b e reserved so that it will only accept messages

that b egin with a sp ecied string

Subpro cess numb er and message prex only messages addressed to a sp ecied

subpro cess and b eginning with the sp ecied prex are accepted

When a message is sent to a pro cess it is placed in the rst mailb ox found that will

accept the message There is no mechanism which can b e used to control the order

of lling mailb oxes

A subpro cess can determine the order in which messages are retrieved The

receive op eration allows the subpro cess to sp ecify a message prex The prex is

used to retrieve the rst message starting with a matching string addressed to the subpro cess

MONEY

Money

The PasswordCapability System intro duced the concept of rental for garbage

collection and extended the use of money throughout the system to provide a

systemwide economy The Walnut Kernel adopted the economic mo del used in the

PasswordCapability System This section describ es the manipulation of money in

the Walnut Kernel

Each ob ject has a money word which stores the amount of money available

to the ob ject The money word p erforms two tasks it acts as a store of money

accessible to the holders of capabilities with withdrawal rights and provides funds to

pay for the rental of the disk space used by the ob ject Pro cesses have an additional

store of money known as the cash word The money stored in the cash word is used

to pay for kernel services and acts as storage for money transfered by a pro cess to

and from ob jects The kernel p erforms all op erations which manipulate the transfer

or use of money

Asso ciated with each capability is a drawing right The drawing right of the

master capability is synonymous with the ob jects money word The drawing right

determines the amount of money that can b e withdrawn through a capability Ca

pabilities with a drawing right of zero cannot b e used to withdraw money from the

ob ject to which the capability refers

Two op erations are supp orted on drawing rights dep osit and withdrawal These

op erations are applied to the drawing rights of all the ancestors of the capability

which is named by the op eration Withdrawal is only p ermitted when the drawing

rights of all the ancestors and the capability itself are greater than the amount to

b e withdrawn

Figure is used to illustrate the eect of dep osit and withdrawal op erations

Monetary op erations aect all the drawing rights on the path to the ro ot of the

capability tree for an ob ject If a dep osit is made via capability C then the

and D and the money word M are increased by the amount drawing rights D

of the dep osit To make a successful withdrawal from C then D D and M

must b e greater than the amount to b e withdrawn If this condition is met then

D D and M will each b e debited the amount to b e withdrawn

CHAPTER THE USER PERSPECTIVE

m

C

M

C C C

D D D

C

Capability

C C

D Drawing Right

D D

M

Money Word

m Master Capability

Figure ATree of Capabilities

Kernel Calls

The parameters used by a kernel call are drawn from the parameter page The

results of a kernel call are returned in the parameter page The parameter blo ck is

a data structure lo cated at the b eginning of the parameter page The elds of the

parameter blo ck are listed in gure

The reserve eld is used to ensure that values of the parameter blo ck are not

corrupted by other subpro cesses of the pro cess When the reserve eld is set

to indicate the required kernel call other subpro cesses are prevented from b eing

scheduled until the reserve eld is set to zero

To make a kernel call the reserve eld is set to the kernel call typ e the required

elds of the parameter blo ck are set data is placed in the message area if required

and a system call is issued The pro cess blo cks until the kernel call is completed

On return from a system call the error eld contains either zero or an error

co de If error is zero the kernel call completed successfully and information can

3

Section describ es the implementation of the SystemCal l interface and the rationale for

disallowing concurrent system calls by multiple subpro cesses

EXCEPTIONS

error returned error co de

vol volume numb er

serial serial numb er

pass password

pass password

srights system rights

urights user rights

base oset of capability from front of ob ject

limit max allowed oset from base

money money to b e transfered

typ e ob ject typ e

maxo max oset used requested

maxsz max size of dened content

maxcap max capabilities now allowed

oset oset into a capability window

subpn subpro cess numb er

cindex index of capability in the table of loaded capabilities

clo cktime time in seconds

reserve nonzero value reserves for subpro cess and identies kernel call

Figure Structure of Parameter Blo ck

b e recovered from the elds of the parameter blo ck and the message area as re

quired When all the required data has b een extracted from the parameter page

it is necessary for the program to write a zero into the reserve eld to allow other

subpro cesses of the pro cess to b e scheduled

Kernel calls always return If the call requests an illegal op eration including

references to undened addresses the call returns with an error value in the error

eld of the parameter blo ck

Exceptions

Events which raise pro cessor exceptions can b e managed through the use of trap

handling subpro cesses Each subpro cess of a pro cess may have a sp ecic trap handler

asso ciated with it A pro cess can register any subpro cess other than subpro cess zero

or the subpro cess itself as a trap handler for a subpro cess of the pro cess The trap

handler is invoked when a pro cessor exception is raised Exceptions are group ed to

CHAPTER THE USER PERSPECTIVE

allow for pro cessor indep endent trap handlers to b e written

ating point fault is used for all arithmetic Five typ es of exception can o ccur A o

errors including underow and division by zero Opcode faults are raised when illegal

op co des are detected An address fault is raised when a memory access violation

o ccurs Debug faults are system dep endent and are used to implement debuggers

Alignment faults are raised on nonaligned accesses when the pro cessor detects this

typ e of error

If a trap handling subpro cess has not b een assigned then the default action is

to terminate the pro cess

When an exception is raised the faulting subpro cess is made unrunnable and a

message is sent to the exception handling subpro cess for the faulting subpro cess If

the message is undeliverable the pro cess is terminated

Controlling Pro cess Scheduling

The Walnut Kernel provides two mechanisms for controlling the scheduling of pro

cesses

Wait and Messages

Freeze and Thaw

This section describ es the conventional mechanism which employs the wait system

call and messages The freeze and thaw mechanism is accessed through subpro cess

zero and is discussed in section

The wait system call is used for several purp oses

An argument of to the system call sets the running subpro cesss wakeup

time to forever and causes the scheduler to remove the pro cess from the

scheduling queue if there are no other runnable subpro cesses of the pro cess

When the argument is the system call surrenders the remainder of a pro cesss

timeslice

4

Forever is dened as xffffffff It is the largest value that the clo ck can represent

CONTROLLING PROCESS SCHEDULING

When any other argument value is used then the wakeup time of the subpro

cess is set to the value of the argument

The wait call is used to put subpro cesses to sleep when they do not have useful work

to p erform

The presence of a message causes a subpro cesss wakeup time to b e set to now

This ensures that pro cesses with waiting messages are scheduled

The arrival of a message causes a subpro cesss wakeup time to b e set to zero and

the pro cess to b e placed into the scheduling queue It do es not cause the receiving

subpro cess to preempt any other pro cess

A common construct in server pro cesses is the message lo op This construct may

b e represented in pseudo co de as

while true

b egin

wait

receivemsg

server functionmsg

end

The message lo op waits until a message is present after which it p erforms a task

which acts on the message If a message should arrive while the current message is

b eing acted on the wait op eration will have no eect in the next cycle of the lo op

The second message can b e handled immediately after the rst message has b een

handled

The wait and message mechanisms can b e used to implement sychronisation

op erations In this application one subpro cess waits and remains blo cked until it

receives a message which allows the pro cess to continue It is necessary to issue a

receive call after the wait as the presence of the message will prevent further wait

op erations having eect

5

The kernel op erations of receive and wait have b een encapsulated in subroutines to simplify the co de

CHAPTER THE USER PERSPECTIVE

Subpro cess Zero

Subpro cess zero is part of the kernel It p erforms tasks in resp onse to messages

which request services These tasks typically change the state of the pro cess for

example one of the functions shifts the pro cess from running to susp ended or

rep ort the state of the pro cess

A mailb ox is op ened for subpro cess zero when a pro cess is created Subpro cess

zeros reserved mailb ox is never closed while the pro cess exists This arrangement

ensures that the creator of a pro cess has control over the pro cess at all times

Subpro cess zero currently supp orts the following functions

Freeze prevents a pro cess from b eing scheduled On receipt of a freeze message

subpro cess zero sets the pro cess state to frozen and causes the pro cess to b e

removed from the scheduler queue

Thaw allows a pro cess to b e scheduled When a pro cess receives a message it is

placed into the scheduler queue If the pro cess is frozen the pro cess is typically

removed from the queue after the subpro cess zero messages are parsed On

receipt of a thaw message subpro cess zero sets the pro cess state to normal

and pro cess execution resumes

Wakeup sets the wakeup time of the sp ecied subpro cess to now The wakeup

message sets the wakeup time of the nominated subpro cess to the current

time The wakeup message is used to start a pro cess that has susp ended

activity and has closed mail b oxes It relies on the fact that the mail b ox

allo cated to subpro cess zero cannot b e closed

Co o ee requests the pro cess to send a status message using a sp ecied capability

The reply message sent by subpro cess zero consists of a set of words which rep

resent the Co o ee reply identier the volume and serial numb er of the current

pro cess and a pro cess status

Protected Freeze prevents a pro cess from b eing scheduled until all protected

freezes on the pro cess have b een thawed On receipt of a protected freeze

message subpro cess zero sets the pro cess state to frozen XORs the magic

SUBPROCESS ZERO

word contained in the message with a key held in the pro cess state incre

ments a count held in the pro cess state and causes the pro cess to b e removed

from the scheduler queue The XOR op eration and the count prevent other

parties from thawing the pro cess unless they know the set of magic words used

in the protected freeze op erations applied to the pro cess

Protected Thaw allows a pro cess to b e scheduled when all other protected freezes

have b een thawed On receipt of a protected thaw message subpro cess zero

XORs the magic word contained in the message with a key held in the pro cess

state and decrements a count held in the pro cess state If b oth the count and

key held in the pro cess state are zero then the pro cess is thawed If the count

is zero and the key is nonzero then the pro cess is terminated

CHAPTER THE USER PERSPECTIVE

Chapter

Design of the Walnut Kernel

This chapter describ es the guiding principles in the design of the Walnut Kernel its

overall architecture the rationale of some key decisions and its detailed structure

and op eration

Principles Design

The following design principles guided the development of the Walnut Kernel

Avoid features available only on small classes of pro cessors

Avoid stalls in the kernel while waiting on external events

Minimize retained kernel state variables

Ensure the kernel is scalable

Use static allo cation of kernel memory

Allow for a variety of shared memory architectures

The principles the motivation for their inclusion as design principles and their

implications are discussed in the remainder of the section

CHAPTER DESIGN OF THE WALNUT KERNEL

oid features available only on small classes of pro Av

cessors

The original Monash Multipro cessor Pro ject used purp ose built hardware to assist

with the management and use of capabilities A consequence of this decision was

that the Monash Multipro cessor Pro ject was tied to a sp ecic pro cessor memory

architecture bus architecture and implementation of these This design placed an

upp er b ound on the p erformance of the system and combined with the long lead

times and exp ense of developing exp erimental hardware prevented the system from

advancing With improvements in technology the advantages conferred by the de

sign of the hardware were outweighed by the advantages oered by the improved

technology The system fell into disuse when it b ecame clear that equivalent p erfor

mance could b e gained using conventional technology

By designing the Walnut Kernel to op erate on a wide range of pro cessors and

by placing minimal requirements on the typ e of memory management exp ected by

the kernel to b e available to the pro cessor it is hop ed that Walnut Kernel will b e

less prone to obsolescence caused by improvements in available hardware

This principle inuenced b oth the design and implementation levels An example

at the design level is the requirements on page table or translationlo okaside buer

entries the Walnut Kernel exp ects the presence of valid and dirty bits but do es

not exp ect or use use bits Although the ma jority of pro cessors supp ort b oth

use and dirty bits neglecting the presence of use bits caused no loss of p erformance

or utility An implementation lacking a dirty bit would have either a signicantly

increased numb er of page faults or a signicantly increased numb er of writes to disk

decreasing system p erformance The design opted to neglect the presence of use bits

but require the presence of dirty bits

A direct consequence of this design principle is that the Walnut Kernel do es not

take advantage of the segment registers available in the Intel The segment

registers on the Intel would p ermit ob jects and views with byte or word size

1

The VAX architecture provides only a modify bit dirty bit Cor

2

Intel is a trademark of Intel Corp oration

DESIGN PRINCIPLES

granularities to b e implemented However as the Intel i is one of the few

families of current micropro cessors supp orting segment registers and the trend in

micropro cessor design is away from the use of segmentation it was decided to use

the more generally available mechanism of paging

Avoid stalls in the kernel while waiting on external

events

The kernel and drivers are implemented with the p olicy

Up on encountering a state that would cause the kernel to stall or wait

b efore b eing able to continue the kernel will initiate a corrective action

and then pro ceed with another task

A consequence of this p olicy is that the kernel and drivers avoid tight busy

waiting lo ops on external events This p olicy avoids the catastrophic consequences

for system p erformance that result from uninteruptable lo ops waiting on delayed

events The p olicy is particularly applicable in a multipro cessor environment where

actions with other pro cessors are not synchronized and the comp etition for access

to a resource may take a signicant amount of time The p olicy allows the kernel

to do other tasks if the current task is prevented from making immediate progress

With suitable hardware assistance the Walnut Kernel can b e congured to allow

for long propagation times A DMA like op eration could b e initiated to lo ck and

mo dify data By surrendering control from the current task and scheduling another

task the kernel can continue p erforming lo cal op erations while the op eration with a

long propagation delay is completed by the hardware

retained kernel state variables Minimize

When any kernel activity is invoked for any reason the kernel will attempt to eect

some change in the system state Whether or not it succeeds in completing this

change the system state is left in a consistent conguration indep endent of any

3

i is a trademark of Intel Corp oration

CHAPTER DESIGN OF THE WALNUT KERNEL

lo cal variables of kernel routines Thus there is no need for retention of kernel state

information and no need to supp ort multiple threads of activity within the kernel

The kernel is not regarded as a pro cess with continuing state and threads of

execution Rather it is regarded as a set of state transformation rules A rule

kernel action once invoked either runs to completion or is abandoned with the

transformation incomplete b ecause of the need for a disk transfer or a resource

conict An abandoned rule leaves the system in a state where the transformation

can b e later completed

the kernel is scalable Ensure

The kernel was designed to achieve decentralized op eration with additional kernels

to run in parallel The absence of centralized control allows for easy scaling in

terms of numb ers of kernels running Decentralized op eration also encourages fault

tolerance at the kernel level

With suitable hardware and the correctandretry nature of the Walnut Kernel it

is p ossible to reduce the eects of long propagation delays for remote data accesses

on the throughput of the system

Allow for a variety of shared memory architectures

There are two broad categories of multipro cessor memory architecture shared mem

ory or discrete memory In shared memory systems all the pro cessors have access

to the same information in memory at the same time The hardware provides

supp ort for coherence In discrete memory systems the kernel must provide mech

anisms allowing access to pages of memory held by other pro cessors Typically the

mechanisms involve copying pages to lo cal memory and providing manyreaders

singlewriter access to the pages The Walnut Kernel is designed to op erate with

b oth these architectures or their hybrids

It is necessary to supp ort b oth categories of memory architecture as it aects the

scalability of the system The general trend is towards using symmetric multipro ces

sors shared memory for small systems and towards discrete memory architectures

for large multipro cessors Adding pro cessors to a symmetric multipro cessor is cost

PASSIVE ELEMENTS

ecient while the memory bus is not saturated as only the pro cessor mo dule needs

to b e duplicated As the memory bus approaches saturation the p otential return of

adding additional pro cessors to this shared bus architecture decreases For systems

with a larger numb er of pro cessors the bandwidth of the bus and shared memory

eventually ceases to b e able to meet the demands of the pro cessors To avoid this

problem it is typical to have separate buses and memory mo dules with some com

munication network linking the buses and memory mo dules Accordingly a discrete

memory structure is usual for larger multipro cessors

Passive Elements

Disk Structures

Ob jects are the basic unit of the Walnut Kernel An ob ject is comp osed of a body

and a dope The body of an ob ject consists of an array of bytes Not all of the bytes

of an ob ject are necessarily dened All the system information ab out an ob ject is

stored in the ob jects dope The inclusion of all the system information as part of

the ob ject is a distinguishing feature of the Walnut Kernel

An ob ject resides completely on a single volume Each volume has an identier

p ermanently asso ciated with it known as the volume number Volumes are usually

randomaccess blo ckoriented storage devices The most common devices used for

volumes are disks

An ob ject is divided into pages The size of a page is a multiple of the page

size of the host pro cessor Data is transfered to volumes in units of blocks A blo ck

o ccupies a logically contiguous section of a volume Blocks and pages are dened to

b e of the same size

The blocks of a volume are numb ered from zero to the numb er of blo cks on a

volume minus one The numb er of a block is known as its location on a volume

Both the body and the dope of an ob ject are stored as sets of blo cks on a volume

The sets are not required to b e contiguous

The b o dy of an ob ject has no systemimp osed structure unless it contains a pro cess

CHAPTER DESIGN OF THE WALNUT KERNEL

Header

Blo ck

Header

List of Disk

Blo cks for

Header Pages

Capability

Table

Capability

Hash Table

List of

Page Tables

n

Figure Ob ject Header Blo cks Pages

The dope consists of two structural elements the object header and a set of page

tables The object header consists of one or more blo cks Figure shows the

contents of header blocks of an ob ject when joined in order The rstheaderblock

starts with the structure Header see gure A list of the lo cations of disk

blo cks containing the header blocks of the ob ject is stored after the Header data

structure The capability table lists all the capabilities for the ob ject Empty slots in

the capability table are linked together to form a free list The capabilityhashtable

is an index into the capability table based on the rst password of a capability A

list of the lo cations on disk of the page tables of the ob ject is the nal element of

the header blocks

The rstheaderblock contains sucient information to retrieve the remaining

header blocks and hence allow access to all the pages of an ob ject The placement

of the list of header blocks ensures that the reference to the second header block

if required o ccurs within the rst blo ck guaranteeing access to the second and

subsequent header blo cks As the header blo cks contain references to the lo cation

of page tables which contain references to the lo cation of pages of an ob ject all the

pages of an ob ject can b e lo cated using the header blo cks of an ob ject Both pages

PASSIVE ELEMENTS

and page tables may b e undened for parts of an ob ject which have not yet b een

accessed

The structure Header see gure b egins with a magic numb er that identies

the ob ject as the rstheaderblock of an ob ject The dop esz eld contains the

numb er of bytes of header information The typ e eld is a bit ob ject typ e

identier The top bit of the typ e eld is set for pro cessob jects and clear for data

ob jects The master capability of the ob ject is stored in the vol serial pass

pass base limit money srights and urights elds The information relating

to the master capability and the linc eld forms an entry in the capability table The

dop eblks eld contains the numb er of header pages used by the ob ject The maxsz

eld contains the numb er of bytes guaranteed to b e available to an ob ject to store

header blo cks and data blo cks The maxo eld contains the largest addressed

oset into an ob ject The maxpage eld contains the highest addressed dened

data blo ck in the ob ject The size of the table available for holding capabilities is

stored in maxcap The numcap eld contains the numb er of capabilities stored

in the table The hashmsk eld is equal to the index of the top element in the

capabilityhashtable The freeindx eld contains the index of the rst element

of the free list for the capability table The maxtabs eld contains the maximum

numb er of page tables required for maxo The totdef eld contains the numb er of

disk blo cks allo cated to the ob ject The maxdef eld contains the numb er of pages

needed for maxsz The dlo co hasho cap o and tab o elds resp ectively

contain the byte oset of the list of disk blo cks of header pages the capability table

the capabilityhashtable and the list of disk lo cations of page tables The dreftime

and altime elds contain the last reference time and the last alter time of the ob ject

eld is used to indicate that the page table of a small ob ject has b een The squeeze

squeezed into the header pages of an ob ject to conserve space The link state and

restabs elds are ignored when the object header is on disk

The low order bits of the serial number of an ob ject contain the block number

of the rstheaderblock of the ob ject The high order bits of the serial numb er are

randomly selected when an ob ject is created

The construction of the serial numb er of an ob ject is a key feature of the design

CHAPTER DESIGN OF THE WALNUT KERNEL

struct Headerst typedef

Uw magic

Sw dopesz

Sw link

Uw type

Uw state

Uw vol

Uw serial

Uw pass

Uw pass

Sw base

Sw limit

Sw money

Uw srights

Uw urights

Sw linc

Sw dopeblks

Sw dlocoff

Sw maxsz

Sw maxoff

Sw maxpage

Sw toppage

Sw maxcap

Sw numcap

Sw hashmsk

Sw freeindx

Sw capoff

Sw hashoff

Sw maxtabs

Sw restabs

Sw totdef

Sw maxdef

Sw taboff

Uw dreftime

Uw altime

Sq squeeze

Sq dum

Sq dum

Sq dum

Header

Figure The Header Data Structure

PASSIVE ELEMENTS

of the Walnut Kernel as it eliminates the need for a catalog of ob jects on a volume

However the kernel needs to b e able to distinguish b etween an ordinary data blo ck

and the rstheaderblock of an ob ject to prevent users creating a blo ck with the

format of a rstheaderblock and using the fake header blo ck to access the pages of

other ob jects A bitmap has b een intro duced to identify the contents of disk blo cks

The bitmap consists of a contiguous set of disk blo cks The bitmap provides a

two bit summary of the usage of every disk blo ck on a volume Blo cks are free the

rstheaderblo ck of an ob ject inuse or bad Free blo cks are available to b e

allo cated by the kernel Bad blo cks are ignored by the kernel The rstheaderblock

of an ob ject is distinguished from other allo cated blo cks that are currently in use

The essential contents of a volume are a DiskIDblock and a bitmap The Disk

IDblock identies the lo cations of key data structures on a volume and the logical

and physical details of the volume

The rst word of the DiskIDblock contains a bit pattern which identies the

blo ck as a DiskIDblock If the signature is incorrect the volume is assumed to

b e either corrupt or invalid The remaining words contain the total numb er of

blo cks on the disk the numb er of used blo cks the index of the rst bitmap blo ck

the numb er of available blo cks on the disk the DiskBlo ckMask the name of

the volume the serial numb er of the initialization pro cess see section and the

numb er of reserved blo cks on the disk

typedef struct Captabentst

Uw pass

Uw pass

Sw base

Sw limit

Sw money

Uw srights

Uw urights

Sw link

Uw dad

Captabent

Figure The Captab ent Data Structure

The capability table is built from Captab ent structures see gure The

pass and pass entries contain the passwords of the capability The oset eld

CHAPTER DESIGN OF THE WALNUT KERNEL

gives the oset of the capability window from the base of the ob ject The limit eld

contains the size of the capability The money eld contains the drawing right of

the capability The drawing right of the master capability is known as the money

word and represents the money held by the ob ject The srights and urights elds

contain the system and user rights information for the capability The dad eld

contains the index of the parent of the capability The link eld p oints to the next

element in the hash chain of capabilities

Figure in chapter identied three parameters used to describ e the space

allo cation of an ob ject the maximum oset the limit and the numb er of pages

allo cated to an ob ject The three parameters corresp ond to the maxo maxsz and

limit elds in the Header of an ob ject The parameters have b een selected b ecause

they allow ecient sizing of tables within the object header The maxsz parameter

determines the numb er of blo cks reserved for the use of an ob ject on a volume

Reserving disk blo cks prevents the over committing of resources on a volume The

maxo eld determines the numb er of page tables required at present The limit

eld determines the numb er of page tables allowed for the ob ject

Memory Structures

olume Table lists the volumes that the kernel can access It is constructed The V

from an array of VolTabEnt structures The VolTabEnt structure see gure

holds a p ointer to the queue used by the kernel to communicate with the device

driver which manages the volume The entry contains all the physical informa

tion required to access the disk including the device typ e devtyp e a p ointer to

a device prop erties structure containing information found out ab out a device at

blo cks the lo ca b o ot time physchar the numb er of reserved blo cks reserv

tions of the DiskIDBlocks idblk and idblk the lo cation of the disks Bitmap

map blo ck the size of the disk size the mask used to extract blo ck lo cations

from serial numb ers dblkmsk and ndblkmsk the numb er of available blo cks

avail and the numb er of used blo cks used Pointers to copies of the DiskID

4

Media failures can result in reserved blo cks b eing unavailable however it is not practical to

guard against all hardware failures

PASSIVE ELEMENTS

typedef struct VolTabEnt

Uw vol

struct diskqhead queue

Uh status

Uh devtype

struct DevProp physchar

Uw reservblocks

Discmapent map

Uw idblk

Uw idblk

Uw idblk

Uw maxvolsize

Uw mapblock

Sw size

Sw dblkmsk

Sw ndblkmsk

Sw avail

Sw used

Uq lock

Uq pad pad pad

Voltabent

Figure The VolTabEnt Data Structure

Block and the Bitmap which are stored in memory are held in the idblk and map

elds

The Active Object Table AOT holds information ab out each ob ject loaded into

memory The AOT contains memory images of ob ject headers the header blo cks

of an ob ject see gure The ma jority of the elds of the ob ject header are

identical when the page is in memory or on disk However the link state and

restabs elds have meaning when the ob ject header is loaded into the Active Object

Table

The AOT is managed using a heap discipline Information relating to a particular

ob ject is found by using a hash table The serial numb er of a capability is used as key

into the aothash table The hash table contains osets into the AOT and Header

data structures are linked together to form hash chains The link eld is used to

p oint to the next element of the hash chain

The contents of the state eld indicates the typ e of change the ob ject is under

going An ob ject can b e accessed freely if it is in normal DOPENORMAL state

CHAPTER DESIGN OF THE WALNUT KERNEL

An ob ject is completely inaccessible while the header blo cks are b eing brought into

memory DOPECOMING or removed from memory DOPEGOING the ca

pability list is b eing compacted DOPECLEANCAP the ob ject is b eing resized

or the capability list is b eing rehashed DOPEREHASH DOPERESIZE

if the ob ject is b eing destroyed or Ob ject headers are marked DOPEDYING

DOPEMAKING if the ob ject is b eing made

The entries in the aotlock table corresp ond to the entries in the aothash table

The aotlock table provides lo cks for each hash chain in the AOT A hash chain is

lo cked whenever an alteration is made to an object header a page table or the

physical memory table The design of the kernel guarantees that the lo cks are held

only for short time Centralising the lo cks in the aotlock table removes the need for

lo cks in many of the other kernel data structures

The Walnut Kernel uses demand paging with two levels of page tables The

top level page tables are asso ciated with a pro cess Second level page tables are

asso ciated with ob jects A page table entry may contain the lo cation of a disk

blo ck on a volume or a reference to a page in memory and its asso ciated p ermissions

The kernel must b e able to distinguish b etween entries containing disk lo cations and

those containing memory lo cations Two bits are required to identify entries that

contain disk lo cations as an entry may contain a memory lo cation yet b e invalid

Entries which refer to disk lo cations have a clear PTEPRESENT bit clear valid

bit and a set PTEDISC bit

The Physical Memory Table PMT holds information relating to the state

of each page frame of the memory Figure illustrates the two typ es of en

tries found in the Physical Memory Table Both typ es of PMT entries identify

the ob ject from which the page was drawn by volume vol and serial numb er

serial The ventry eld is the index into the volume table for vol The

dblk eld contains the lo cation on disk for the page The typ e eld identies

how the page is used Pages may b e marked as kernel pages FRAMEKER

5

Second level page tables may also b e asso ciated with pro cesses When a second level page

table is asso ciated with a pro cess it is known as a Private Page Table PPT Small windows are

implemented using PPT s For simplicity the discussion of second level page tables asso ciated with

pro cesses is deferred until small windows are covered see section

PASSIVE ELEMENTS

typedef struct Fizentst

Uw vol

Sw ventry

Uw serial

Sw pagenum

Sw dblk

Uw ref

Uw dumw

Uq type

Uq state

Uq dum

Uq dum

Fizent

typedef struct Fizenttst

Uw vol

Sw ventry

Uw serial

Sw tablenum

Uw dblk

Sw respages

Uw freftime

Uq type

Uq state

Uq dirty

Uq dum

Fizentt

Figure The Fizent Fizentt Data Structure

CHAPTER DESIGN OF THE WALNUT KERNEL

NEL part of the AOT FRAMEDOPEBUFFER second level page tables

FRAMEPAGETABLE data pages FRAMENORMAL top level page ta

bles FRAMEDIRECTORY private page tables FRAMEPPT or uncached

pages used for DMA buers FRAMEIOBUF The state eld is used to indicate

progress in bringing in and removing pages from memory

The Fizent data structure describ es data pages The ref eld p oints to the page

table word which corresp onds to the physical page represented by the entry in the

PMT

The Fizentt data structure describ es page tables The numb er of memory resi

dent pages for the page table is stored in respages The last time a page from the

table was referenced is held in freftime Allo cation or deallo cation of a disk blo ck

in the b o dy of an ob ject results in an alteration to a page table The dirty eld is

used to indicate if a page table has b een altered and should b e written to disk

Pro cesses

Pro cesses are the animate elements of the Walnut Kernel A process object is an

ob ject which contains the state of a pro cess The most signicant bit of the typ e

eld of an ob ject is set to indicate that an ob ject is a process object The data

structures holding the state of the pro cess are stored at known osets In general

the pages holding the pro cess state are not readable by user pro cesses

The Pro chd data structure gure is stored in in the rst data blo ck of

the pro cess ob ject The master eld holds a copy of the master capability for

the pro cess The nessp eld holds the the numb er of necessary pages for the

pro cess All the necessary pages of a pro cess must b e loaded into memory b e

fore a pro cess can b e scheduled The state eld indicates the state of a pro cess

A pro cess may b e runnable PROCSTATENORMAL p erforming a system

call PROCSTATEKERNEL handling a page fault PROCSTATERFAULT

for read faults or PROCSTATEWFAULT for write faults frozen PROC

STATEFROZEN p erforming house keeping tasks after it has died PROC

STATEPROBATE dead PROCSTATEDEAD or protected frozen PROC

STATEPFROZEN In addition if it is not currently in the scheduling queue it is

PASSIVE ELEMENTS

typedef struct Prochdst

Capl master

Uq nessp

Uq state

Uq action

Uq stage

Uq currsubp

Uq maxsubp

Uq numsubp

Uq maxlc

Uq numlc

Uq maxmess

Uq nummess

Uq messlock

Sw cash

Uw lockword

Uw lockword

Uw icekey

Sw icecount

Uw direcp

Uw dcleartime

Uw runtime

Uw wakeup

Uw type

Uw cause

Uq messtab

Uq subptab

Uq tlctab

Uw tlcfree

Uq dmaptab

Uq fizadd

Uw faultaddress

Capl heir

Scratch scr

Prochd

Figure The Pro chd Data Structure

CHAPTER DESIGN OF THE WALNUT KERNEL

deemed idle PROCSTATEIDLE For a pro cess in PROCSTATEKERNEL

the action eld contains a value indicating the systemcall currently b eing executed

by the pro cess The stage eld indicates the stage to which a systemcall action

has b een completed The currsubp and numsubp elds resp ectively indicate the

current subpro cess and the numb er of dened subpro cesses of a pro cess The numlc

eld holds the numb er of capabilities currently loaded The numb er of empty mail

b oxes is stored in nummess The messlo ck eld acts as a semaphore for the pro

cesss mail b oxes The money used to pay for systemcalls and transfers of money

to and from ob jects is stored in the cash eld The pro cesss two lo ck words are

held in lo ckword and lo ckword The protected freeze and protected thaw op

erations use the icekey and icecount elds The last time the pro cesss top level

page table was cleared is stored in dcleartime The runtime eld holds the time

when the pro cess was last run A pro cess do es not run until after the wakeup time

has passed The typ e eld is a duplicate of the ob jects typ e eld The cause

eld is used for diagnostics it identies the reason for entering the scheduler The

faultaddress eld contains the logical address of the memory access which resulted

in a page fault The zadd eld is a p ointer to this Pro chd data structure using

a physical address The heir led contains the capability of an ob ject to which a

pro cesss cash should b e sent up on its demise The scr eld is a p ointer using phys

ical addressing to the Scratch data structure of the kernel executing the pro cess

The direcp messtab subptab tlctab and dmaptab elds resp ectively p oint

to a pro cesss top level page table message table subprocess table Table of Loaded

Capabilities TLC and the map of its address space The tlcfree eld contains the

index of the head of the free list of the pro cesss TLC

The sizes of the TLC the message table and the subprocess table are sp ecied

when a pro cess is created These values cannot b e varied during the life of a pro cess

The maxlc eld sp ecies the maximum numb er of loaded capabilities The numb er

of mail b oxes the size of the message table in a pro cess is sp ecied by the maxmess

eld The maxsubp eld holds the maximum numb er of subpro cesses

Each pro cess has a Table of Loaded Capabilities TLC The TLC lists all the

capabilities loaded by the pro cess the rights held by those capabilities when the

PASSIVE ELEMENTS

typedef struct Tlcentst

Uw vol

Uw serial

Uw pass

Uw pass

Uw srights

Uw urights

Uw base

Uw size

Sw displ

Tlcent

Figure The Tlcent Data Structure

capabilities are loaded and the lo cation of the loaded section of the capability in the

pro cesss address space The index of an entry in the table is known as the capability

index or cindex of a capability The index can b e used to identify a loaded capability

Empty entries in the table are formed into a doubly chained linked list The Tlcent

data structure is illustrated in gure The vol serial pass pass srights

and urights elds hold the values present when the capability is loaded The base

and size elds hold the oset from the start of the visible part of the ob ject and

the size of the visible part Both quantities are in characters The displ eld holds

the displacement b etween the b eginning of the window holding the capability and

the b eginning of the address space

typedef struct Subprocentst

Uw wakeup

Sw pcnt

sysstate regset

coprocstate coproc

Uw trap

Uq state

Uq priority

Uq pad

Uq pad

Subprocent

Figure The Subpro cent Data Structure

The Subprocess Table holds the state of each of the subpro cesses of a pro cess The state of the sup ervisor of a subpro cess is stored in slot zero and the state of the

CHAPTER DESIGN OF THE WALNUT KERNEL

subpro cess corresp onding to the master capability of the pro cess is stored in slot one

The subprocess table is constructed from Subpro cent data structures see gure

The subpro cess is not scheduled to run until after the time in the wakeup eld

has passed The p cnt eld contains a pseudo program counter value used by drive

pro cesses The state of the pro cessors when the user pro cess is preempted is stored

in the elds regset and copro c The numb er of the subpro cess assigned to handle

traps in the current subpro cess is held in trap The state eld Subpro cesses may

b e nonexistent alive SUBPNORMAL or SUBPDEAD Subpro cesses

are allo cated scheduling priorities to help select b etween subpro cesses when a pro cess

has b een scheduled to run The priority of a subpro cess is held in priority The

larger the value of the priority eld the higher the priority of the subpro cess The

value xff is reserved for the sup ervisor

A pro cesss address map is stored at xf in the logical address space and

the page containing it is marked readonly The address map is a table of capability

index values for memory lo cations Figure in chapter contains sample co de

using the address map

typedef struct Messentst

Uq chars

Uq subproc

Uq reserve

Uq matchlen

Sw money

Uw body WORDSPERMESSBODY

Messent

Figure The Messent Data Structure

The message table is built from Messent data structures see gure Each

mail box Messent data structure can hold a single message Empty mail b oxes

have xff in the chars eld otherwise the chars eld contains the length of the

message The subpro c eld holds the numb er of the subpro cess to which the

message should b e delivered The money eld holds the amount of money sent

with the message Negative amounts of money cannot b e sent by user pro cesses

The b o dy array holds the message sent

A mail box may b e reserved for the use of a particular subpro cess by setting the

PASSIVE ELEMENTS

reserve eld to the subpro cess numb er The value x indicates that the mail box

may b e used for a message to any subpro cess A match string can b e sp ecied for

a mail box The match string is stored in the b o dy of the message and the length

of the string is stored in the matchlen eld Only messages which are prexed by

the match string can b e stored in the mail box

Each pro cess has a parameter page The parameter page is made up of the

parameter block see gure in chapter and the message area The kernel reads

elds from the parameter page when a systemcall is made and returns values in

elds of the Param data structure when the systemcall returns The elds and

usage of the parameter page are describ ed in chapter

Oset in

Address

Pro cess when

Ob ject Loaded

x Pro chd

a Message Table

x

1

1

Subpro cess Table

a

x

2

2

a Table of Loaded Capabilities

3

x

3

xf xf Address Map

x Paramaeter Page x

Figure Layout of the Pro cess Ob ject

The layout of the process object is illustrated in gure The Pro chd data

structure is concatenated with the message table the table of loaded capabilities and

the subprocess table to construct the rst set of pages in the process object The

subsequent two pages of the ob ject hold the address map and the parameter page A

signicant feature of the process object is that the message table is entirely contained

within the rst page of the ob ject to avoid extra page faults when sending a message

CHAPTER DESIGN OF THE WALNUT KERNEL

Kernel Data Structures

Scratch data structures hold the state of each instance of the kernel while the

kernel is p erforming an op eration see gure The error eld is used to hold

an error co de Negative error co des indicate that a function cannot b e p erformed

at this time the kernel typically retries the op eration at a later time Positive

error co des indicate that a function cannot b e completed Error co des greater than

indicate that the kernel is in an inconsistent state The numeric value of

an error co de indicates the routine in which the error o ccurred and the error that

o ccurred The rightmost decimal digits indicate the error and the more signicant

three digits indicate the routine

Only some of the elds contain valid information The set of valid elds is deter

mined by the op eration b eing carried out The vol serial pass pass srights

urights base limit money typ e maxo maxsz maxcap oset subpn

and cindex elds hold information ab out the pro cess or ob ject b eing op erated on

The ventry eld holds an index into the volume table The head eld holds a

p ointer into the AOT for an ob ject header The xhead eld is a p ointer to another

ob ject header The cap ent eld p oints to a capability entry in the header of an

ob ject in the AOT The maximum oset into an ob ject is held in ob jlim The size

of the ob ject header is held in dop esz If the hash chain for an ob ject header in the

AOT is lo cked aotres holds a p ointer to the semaphore The elds pnum page

numb er dirent second level page table entry and pte private page table entry

are used for manipulating page table entries Resolved physical addresses are stored

in zadd The darg eld contains a counter which is decremented while op erations

are carried out It is set to a p ositive value whenever a kernel op eration started or

restarted When darg is zero or negative the op eration is abandoned until it is

retried This use of darg ensures that no kernel action lasts for more than a certain

time chosen to b e less than a scheduling timeslice The eld pro chd p oints to

the pro cess header of the pro cess currently b eing handled by the kernel and param

p oints to the pro cesss parameter page The kerneltick eld holds the value of the

when the kernel function was started tick counter

6

Each time a timer interrupt o ccurs the tick counter is advanced

PASSIVE ELEMENTS

typedef struct Scratchst

Sw error

Uw vol

Uw serial

Uw pass

Uw pass

Uw srights

Uw urights

Sw base

Sw limit

Sw money

Uw type

Sw maxoff

Sw maxsz

Sw maxcap

Sw offset

Sw subpn

Sw cindex

Sw ventry

Header head

Header xhead

Captabent capent

Sw objlim

Sw dopesz

Uq aotres

Uw pnum

Uq fizadd

Uw dirent

Uw pte

Sw darg

Prochd prochd

Uw param

Uq kerneltick

Uq dum

Uq dum

Uq dum

Scratch

Figure The Scratch Data Structure

CHAPTER DESIGN OF THE WALNUT KERNEL

Two arrays of integers are used in process scheduling The arrays mixvol and

mixserial hold the volume and serial numb ers of pro cesses to b e scheduled by the

kernel Pro cesss which are scheduled to wake up at time FOREVER are not

present in the mix

Active Elements

The op erations of the active elements of the kernel may b e collected into ve groups

Object Memory Management is concerned with the moving of parts of ob jects into

and out of main memory and with ensuring the stability of data stored in an ob ject

Capability Management consists of providing all the op erations on capabilities and

ensuring that the set of op erations is consistent Process Memory Management

governs the loading of views into a pro cesss address space Message Management

is resp onsible for delivering and receiving messages Finally the Process Scheduler

controls the scheduling of pro cesses and subpro cesses Figure shows the ma jor

relationships b etween these groups of functions and the data structures of the kernel

Memory Management Ob ject

Three elements of the kernel are connected with Object Memory Management scav

enge aotscavenge and a collection of routines that provide access to parts of

ob jects The routines scavenge and aotscavenge each have their own Scratch

data structures and are p erio dically scheduled These routines are resp onsible for

removing entries from memory and the active ob ject table A numb er of routines

are used to load and access a page of an ob ject The refer routine is central to

accessing pages

The Walnut Kernels Object Memory Management is founded on the following

guarantees

A page remains in memory as long as any page table entry including private

page tables contains a valid reference to it

A page table remains in memory as long as any pro cesss rst level page table

7

The end of time or F OREVER has the value xffffffff

ACTIVE ELEMENTS

PMT

Physical Memory Table

Ob ject Memory

AOT

Management

Active Ob ject Table

VolTab

Volume Table

Capability

Ob ject Header

Management

Pro cess Memory

Management

Message

Management

Mix

Pro cess Header

Scheduling Queue

Pro cess

Scheduler

Figure The usage of data structures by kernel functions

CHAPTER DESIGN OF THE WALNUT KERNEL

contains a valid reference to it and as long as any page to which it refers remains

in memory

An ob jects dop e remains in memory in the AOT as long as any of the ob jects

page tables remain in memory

The scavenge routine examines each entry in the PMT The routine b oth copies

dirty pages to disk and removes pages from memory Scavenge stores the time at

which the last pass through memory was completed and the numb er of the page

currently b eing examined in its private Scratch data structure The scavenge rou

tine aims to examine every entry in the PMT in DIRECTORYDELAY seconds

When scavenge is scheduled it selects a target page frame numb er and attempts

to examine each entry in the PMT b etween the last entry examined and the target

entry The target frame numb er is prop ortional to the numb er of seconds elapsed

since a sweep of the table was completed divided by the DIRECTORYDELAY

scavenge will exit if there are more than SCAV To avoid o o ding the disk queue

WRITELIMIT disk write op erations outstanding

When scheduled by the kernel scheduler the scavenge routine checks the state

of any outstanding disk write op erations The state of each enqueued diskwrite is

stored in the scavnote hash table There are four p ossible states for each entry

page accepted for write and selected for removal page written and selected for

removal page accepted for write and page written For each completed write

op eration scavenge

If the page contains a page table and the page is not scheduled for removal

clears PMT dirty eld

If the page do es not contain a page table and the page is not scheduled for

removal clears the dirty bit in the page table entry for the page

If the page contains a page table and the page is scheduled for removal the

object header in the AOT is retrieved the numb er of the disk blo ck for the

page table is written into the dop e restabs is decremented and the memory

page is released into the free list

If the page do es not contain a page table and the page is scheduled for removal

ACTIVE ELEMENTS

the numb er of the disk blo ck for the page is written into the page table for

the ob ject respages is decremented and the memory page is released into

the free list

After pro cessing the contents of scavnote the routine attempts to schedule pages

to b e copied to disk andor removed from memory Pages are scheduled to b e copied

to disk if they are dirty Only if less than a quarter of the memory is free are the data

pages of an ob ject scheduled for removal An LRU Least Recently Used discipline

is used to determine which pages should b e removed Scavenge clears the present

bit in the second level page table entry on clean pages to ensure that a page fault

o ccurs A page fault is used to determine that a page has b een accessed eliminating

the need for a use bit in the page tables provided by hardware The b ottom four

bits of the state eld of the zent data structure are used to hold a counter which

indicates when a page was last accessed The counter is incremented each time a

page is examined by scavenge and cleared by any page fault on the page The value

of the threshold at which a page is determined to b e old enough to b e removed is

determined by the demand for pages The greater the demand for pages the lower

the threshold Clean pages are removed by clearing the page table entry and adding

the page to the free list Entries are made in scavnote to indicate whether a page

is b eing copied to disk or copied and removed Second level page tables are removed

only when there are no pages p ointed to by the table resident in memory Top level

page tables are discarded and their pages released after DIRECTORYDELAY

seconds has elapsed

In addition to assisting in the removal of pages scavenge will retry failed at

tempts at loading a blo ck from disk

The aotscavenge routine p erforms a similar task to scavenge however it acts

on the Active Object Table The routine examines each entry in the AOT by stepping

through the aothash table and pursuing each hash chain If either the darg of

aotscavenges scratch is zero or the routine has pro cessed more than the fraction

of the hash tables entries than the time since the last sweep through the AOT

divided by DIRECTORYDELAY then the routine will exit It will resume work

from where it left o when next invoked Each Header is examined and the action

CHAPTER DESIGN OF THE WALNUT KERNEL

of saving the ob ject header to disk is initiated if the time since it was last referenced

is greater than a threshold value The threshold is guaranteed to b e greater than

DIRECTORYDELAY

An attempt by a pro cess to reference an ob ject can fail at many p oints The

actions taken by the kernel dep end on the p oint of failure In general the kernel

takes corrective action when a failure o ccurs and allows the reference to b e retried

when the pro cess is rescheduled References to an ob ject are typically generated by

pro cesses The ma jority of references are handled by the page translation hardware

using the page tables set up by the kernel A page fault o ccurs when a page is either

notpresent as shown by absence of the page table entry valid bit or an attempt is

b eing made to write to a page which has b een marked readonly In either case the

xfault routine is invoked by the fault handler The fault handler stores the fault

address in the faultaddress of pro chd On entry xfault examines the top level

page table asso ciated with the pro cess and recovers the top level page table entry If

the top level page table entry is valid the second level page table entry is recovered

If the fault has b een caused by a notpresent mark in the second level page table

entry then the state eld of zent is mo died to indicate that the page has b een

accessed the present bit in the page table entry is set and xfault returns to the

pro cess If the fault was caused by an attempt to write to a page without write

p ermission an error is returned Otherwise the pro cesss address map is used to

nd the capability index of the the capability corresp onding to the faultaddress

The entry in the pro cesss Table of Loaded Capabilities is copied into scratch and

the refer routine is invoked Refer checks the typ e of access b eing p erformed is

valid and within the loaded section of the capability The vol and serial elds are

used to access the Active Object Table If there is no entry for the ob ject in the

bitmap corresp onding to the volume an error is returned indicating the ob ject no

longer exists If there is no entry in the AOT the recovery of the Header for the

ob ject from disk is started and xfault returns The capability table in the object

header is checked to ensure that the capability loaded in the pro cesss TLC is still

valid If the passwords do not match an error is returned If a page table for the

ob ject is already in memory refer recovers the address of the page table from the

ACTIVE ELEMENTS

list of page tables in the object header and returns b oth a top level page table entry

and a second level page table entry for the required access to xfault If a page

table is not in memory refer starts an op eration to retrieve the page table from

disk and returns to xfault If refer was successful in constructing the page table

words xfault uses the words to complete the page tables and returns to the user

program

Capability Management

Capability management consists of three elements the derivation of capabilities

the restriction of capabilities and the revo cation of capabilities The ma jority of

these op erations are conducted solely in the header of the ob ject the capabilities

relate to

The derivation of capabilities is p erformed by the addcap routine The vol

serial pass and pass of the capability to b e derived from is passed to addcap

in the scratch data structure The rights mask and the parameters for the cover

age of the derived capability are passed in the elds srights urights base and

limit Before a derivation can b e p erformed the rights of the presented capability

are checked to ensure that it has SRDERIVE right If no derive right is present

an error is returned If the SRMULTILOAD right is absent the password of

any derived capability is co erced to the value of the password of the presented

capability With the exception of the suicide right and the message rights of pro

cesses derivation consists of returning the logicaland of the rights of the nominated

capability and the rights mask supplied by the caller The memory area covered by

the capability is determined by the intersection of the region covered by the pre

sented capability and the area covered by adding the base to the start of the region

covered by the presented capability up to the extent provided by the limit The

SRSUICIDE right can b e added to the children of any capability

The system rights eld for a pro cess is treated dierently from the system rights

of a data ob ject The last eight bits of the srights eld limit the subpro cesses to

which a message may b e sent by using this capability The bits may contain xff for

all subpro cess or xfe for all subpro cess other than subpro cess zero or a subpro cess

CHAPTER DESIGN OF THE WALNUT KERNEL

numb er Only more sp ecic or equal subpro cess destinations for messages can b e

derived Furthermore it is required that the capability base b e zero for a derived

capability which has SRSEND right

The revo cation of a capability is a two phase pro cess The delcap routine deletes

a capability and all the derivatives of the capability from the capability table in the

ob jects header References to a deleted capability loaded in the address space of a

pro cess remain valid until a page fault results from attempting to access the deleted

capability The top level page table of every pro cess is guaranteed to b e discarded

and replaced with an empty page table every DIRECTORYDELAY seconds

The rst access after the top level page table is replaced results in a page fault

and the deletion of the capability is noted The Walnut Kernel guarantees that a

capability is unavailable to all pro cesses within DIRECTORYDELAY seconds of

revo cation

Pro cess Memory Management

Process Memory Management consists of two op erations mapping views into the

address space of a pro cess and removing a view from the address space of a pro cess

A pro cess requests the loading of a view into its address space using the LOAD

CAP systemcall When presented with the capability to b e loaded the Walnut

Kernel attempts to lo cate the object header in the AOT If the ob ject is not repre

sented in the AOT recovery of the object header from disk is initiated When the

object header is present in the AOT the appropriate entry in the capability table is

recovered If the recovered capability lacks SRMULTILOAD right and password

of the capability is not equal to the serial numb er of the pro cess then an error is

returned Otherwise the values representing the area covered by adding the base

to the start of the region covered by the capability to b e loaded up to the extent

provided by the limit are placed in the base and limit elds of TLC entry The

rights elds of the Tlcent data structure are lled in and the systemcall returns

successfully The capabilitys cindex is written into all address map entries covered

by the view

The removal of view from a pro cesss address space is trivial The elds in the

ACTIVE ELEMENTS

pro cesss address map corresp onding to the memory address range o ccupied by the

mapping are zero ed and the entry in the pro cesss TLC is invalidated and linked

into the list of free TLC entries

Message Management

The message mechanism enables b oth the transfer of information b etween pro cesses

and control of the scheduling of pro cesses When a pro cess is created an array of

Messent data structures is allo cated within the pro cess ob ject Only the kernel

can access mail b oxes directly

A message is sent by the kernel copying a set of bytes from the source pro cesss

message area the remainder of the page in a pro cesss address space that contains

the parameter blo ck to the destination pro cesss mail b ox For this to o ccur it is

necessary for the capability used to identify the destination to start at oset zero

from the b eginning of the pro cess and to have the SRSEND right The recipient

pro cess may b e explicitly named using the external form of the send systemcall

alternatively it can b e implicitly named by sending a message to a capability loaded

into the sending pro cesss address space using the internal form of the send system

call The latter metho d is more ecient as it eliminates the need for the kernel

to verify the access rights of the destination pro cess ob ject It reduces the send

op eration to a memory copy The transmit routine implements b oth the internal

and external forms of message sending

The transmit routine transfers money from the senders cash word to the re

ceivers cash word The transfer is made after the message has b een placed in the

mail b ox

Messages are always directed to a subpro cess When a message is sent to a

subpro cess the wakeup time of the pro cess is set to the current time and the

pro cess is placed in the mix Whenever a mailb ox contains a message a subpro cess

cannot change its wakeup time from the current time Hence nonfrozen pro cesses

with p ending messages are always scheduled to run

The receive routine is called by the receive systemcall The routine can take a

match string as a parameter which allows a subpro cess to retrieve messages b egin

CHAPTER DESIGN OF THE WALNUT KERNEL

ning with the sp ecied string If no match string is provided then the message in

the rst mail b ox containing a message for the current subpro cess will b e retrieved

When a pro cess is created all of its mailb oxes are closed except for a single mail

b ox allo cated to subpro cess zero After the pro cess has completed its initialisation

the pro cess can open its mailb oxes setting the elds which reserve mailb oxes for

messages with sp ecied prexes and sp ecic subpro cesses Messages sent b efore the

pro cess has op ened its mailb oxes are not delivered A systemcall allows mailb oxes

to b e closed

Pro cess Scheduler

At present a round robin scheduler is employed within the kernel to schedule all

pro cesses with a sp ecied wakeup time Pro cesses which have selected to wait forever

are removed from the scheduling queue The scheduling queue within the kernel is

known as the mix

schd selects an element from the mix the When the process scheduler macro

volume and serial numb er of a pro cess are passed to the startpro c routine using

the scratch data structure

Startpro c must ensure that the set of pages critical to the op eration of a pro cess

are loaded into memory b efore attempting to transfer control to the pro cess The

set of critical pages consists of two parts the dope for the ob ject holding the pro cess

and the set of necessary pages of the pro cess The necessary pages of an ob ject are

the pages containing the Pro chd the Table of Loaded Capabilities the subprocess

table the message table and the parameter page The AOT entry for the pro cess is

checked If the entry is not present retrieval of the dop e is initiated When the AOT

entry is recovered the top bit of the typ e is tested to ensure that the entry ob ject

contains a pro cess If the ob ject do es not contain a pro cess an error is returned

A lo ck is placed on the message table by lo cking messlo ck If the lo ck op eration

fails a message is currently b eing sent to this pro cess and startpro c exits with a

negative error If the lo ck op eration succeeds startpro c guarantees to clear the

lo ck b efore exiting

The PROCSTATEMIX bit of the state eld of the process header is set

ACTIVE ELEMENTS

Pro chds wakeup eld is examined next If the pro cess wakeup time is NEVER

also known as FOREVER startpro c exits with a p ositive error which informs

the scheduler to remove the pro cess from the mix If it is to o early to wakeup the

pro cess a negative error is returned

The time the pro cesss directory top level page table was last cleared is checked

to ensure that the top level page table is still valid If scavenge has had an opp or

tunity to remove the page a new page is created This has a side eect of ensuring

that all capabilities are revalidated when they are next used

Once the top level page table has b een established references to the necessary

pages page table are inserted in the top level page table If the necessary pages are

not present in memory they are recovered from disk

The pro cess state is checked If a pro cess is dead PROCSTATEDEAD

an error is returned and it is removed from the mix If a pro cess is in probate

PROCSTATEPROBATE the pro cess attempts to send a message to its heir

containing the pro cesss cash and the pro cesss name Only attempts are made

at delivering the message to the heir If the heir is unable to accept the message

the pro cess changes to PROCSTATEDEAD and the cash is lost

A check for new messages is p erformed and messages for nonexistent subpro

cesses are erased When a subpro cess receives a message it is marked runnable

Messages for subpro cess zero are pro cessed and the op erations p erformed

If the pro cess is either frozen PROCSTATEFROZEN or sub ject to a pro

tected freeze PROCSTATEPFROZEN startpro c exits allowing other pro

cesses to b e scheduled

At this p oint startpro c cho oses the subpro cess to execute If the reserve eld

of the pro cesss Param data structure Parameter Block is nonzero the subpro cess

that was executing at the end of the last timeslice is restarted Otherwise the

subpro cess with the highest priority and with a wakeup time less than the current

time is selected If no runnable subpro cesses are found startpro c returns with a

negative error

Having selected the subpro cess to b e run and ensuring that the pages critical to

schd the op eration of the pro cess are in memory startpro c returns to the macro

CHAPTER DESIGN OF THE WALNUT KERNEL

routine

schd selects a new subpro cess If an error was returned by startpro c macro

from the mix and calls startpro c Only when a runnable pro cess is selected is the

remainder of macro schd executed

The remainder of macro schd is devoted to executing the selected pro cess The

actions of macro schd fall into three categories

If the pro cess is currently handling a page fault PROCSTATERFAULT

or PROCSTATEWFAULT the xfault routine is called If the fault is

xed the pro cess state is set to PROCSTATENORMAL the subpro cess

state is restored and the pro cess is executed If the fault cannot b e xed

immediately the pro cess is returned to the mix

If the pro cess is currently p erforming a systemcall PROCSTATEKERNEL

the kcact routine is called to handle the systemcall If the systemcall has

kcact is invoked to clear any elds in Param of data b een completed p ost

which are not to b e returned to the pro cess the subpro cess state is restored

and the pro cess is executed The pro cess is returned to the mix if the system

call cannot b e completed

If the pro cess is in PROCSTATENORMAL the subpro cess state is re

stored and the pro cess is executed

Persistent Elements

Figure illustrates two typical layouts for a disk containing a Walnut Kernel

volume The layout of disk is fairly exible as the only elements which have xed

lo cations are the DiskIDBlocks The rst DiskIDblock is lo cated in the rst blo ck

of a data volume or at a lo cation dened by a compile time constant in the kernel for

a bootable volume The duplicate DiskIDBlock is always lo cated in the last blo ck

on the disk The DiskIDBlocks and the Bitmap contain information relating to

the whole disk The remainder of the disk is used to store ob jects

The reserved area is lo cated at the front of the disk and is typically used to hold

PERSISTENT ELEMENTS

Bo ot Blo ck

Disk ID Blo ck

Bitmap

Op erating System

Disk ID Blo ck

Bitmap

Ob jects Ob jects

Last

Last Disk ID Blo ck Disk ID Blo ck

Bo otable Volume

Data Volume

Reserved Blo cks

Figure Disk Layout

CHAPTER DESIGN OF THE WALNUT KERNEL

the op erating system and the b o ot blo ck The reserved area is ignored by the kernel

Typically data volumes do not have a reserved area

The DiskIDBlocks are duplicated to assist in the recovery of information from a

corrupted volume The lo cation of the bitmap and the identication of the reserved

area simplify the recovery pro cedure Furthermore the duplicate DiskIDBlocks

allows a volume with a damaged DiskIDBlock to b e mounted

Unlike the PasswordCapability System and le based op erating systems the

Walnut Kernel do es not p ossess a File Allo cation Table FAT or its equivalent A

FAT is a lo okup table that is used as the rst step of translating a name into a disk

lo cation Instead the Walnut Kernel uses part of the ob jects name the low order

bits of the ob jects serial numb er to lo cate an ob ject

A le allo cation table allows names to b e indep endent from p osition The absence

of a FAT structure initially app ears to b e a critical problem for the Walnut Kernel as

it ties ob jects to particular lo cations on disks The failure of a disk blo ck containing

the header of an ob ject would result in the loss of the ob ject In practice the presence

of a FAT has little advantage for the op eration of the Walnut Kernel Three issues

are signicant reliability reconstruction and security

Reliability and reconstruction are related issues Enhancing the reliability of a

systems data storage reduces the likeliho o d of needing to reconstruct lost informa

tion Both pro cesses require redundancy in the data to infer the lost comp onents

A le allo cation table provides a centralised listing of all the items on a disk

They are frequently duplicated to several lo cations on a disk to prevent the p oten

tially catastrophic results of loss of the table the loss of all translation information

and hence the loss of access to all the ob jects on the disk The entries in a FAT

either contain information relating to an ob ject or p oint to blo ck of information

ab out an ob ject

Redundancy has b een built into the Walnut Kernels p ersistent representation

The Walnut Kernel has a pair of DiskIDBlocks which are used to identify the

volume and the header pages of an ob ject are recognizable by signature words at

b oth the top and b ottom of the page allowing pages to b e identied The signatures

can b e used in reconstructing damaged bitmaps

PERSISTENT ELEMENTS

The presence of a FAT would not signicantly enhance the reliability of the

system ab ove the level provided by the current scheme as

Loss of a header blo ck results in the loss of all derived capabilities for an ob ject

Although a FAT could contain a duplicate of all the information contained in

the header of an ob ject the signicant p enalty in terms of b oth space and

sp eed would b e unacceptable

Nor would reconstruction b e signicantly enhanced by the presence of a FAT as

The data regarding the lo cation of header pages is already duplicated in the

bitmaps

The use of the low order bits to identify the disk lo cation of the header page of

an ob ject has a minimal eect on the security of the system see section

The absence of a FAT is signicant in terms of restoring data from a backup

media Under the Walnut Kernel it is not p ossible to restore an ob ject if the blo ck on

which the header page is required to reside is damaged The disk storage mechanism

lacks p osition indep endence A system using a File Allo cation Table would not b e

constrained to restoring an ob ject to the same physical lo cation However the

ma jority of mo dern disks employ remapping techniques which translate accesses to

failed sectors to replacement sectors seamlessly Thus the hardware app ears to b e

faultless on restore even if there has b een a failure

A more subtle problem is present Under the Walnut Kernel it may not b e

p ossible to backup a single ob ject and restore that ob ject at a latter date The

problem stems from the lack of p osition indep endence If a blo ck required for the

header page is already in use then it is not p ossible to restore the ob ject

The backup and restoration of capability based systems is an op en question

Problems exist as to the meaning of partial backups and restores the semantics of the

deletion of an ob ject is changed as the ob ject may have b een backed up The security

of backed up ob jects is also sub ject to signicant doubts Solutions employing

cryptography to protect the contents and signatures to prevent tamp ering have

b een put forward These mechanisms are vulnerable to cryptographic attacks with selected plain texts and hence require extremely strong algorithms to remain secure

CHAPTER DESIGN OF THE WALNUT KERNEL

Furthermore as they rely on an encryption key derived from secret information held

by the system it is typical for backups of ob jects on the system to b e generated

using keys related to keys used to backup other ob jects on the system Relying on

related keys decreases the system security as the compromise of the encryption of

one ob ject can give clues to compromising other ob jects

The Walnut Kernel do es not address the problems of backup apart from the

trivial case of sector by sector backup and restoration of the complete system

at this stage Our view is that backing up an ob ject creates a new ob ject with a

new name which should b e distinguished from the original even if its b o dy and

passwords are the same

ws Private Page Tables Small Windo

Second level page tables are typically asso ciated with ob jects However to provide

small windows a collection of second level page tables known as Private Page

Tables PPT is asso ciated with each pro cess In the description of the system

the second level page tables asso ciated with pro cesses were neglected for simplicity

This section describ es the op eration of small windows and PPT s

A compile time constant determines the numb er of top level page table entries

devoted to implementing small windows These entries are separated from the page

table entries used for large windows by the entry used to hold the second level page

table for the pro cess ob ject see gure

Private Page Tables are sub ject to scavenging and require replacement if the

table may have b een removed The op eration of PPT s is analogous to the op eration

of top level page tables When a reference is made to an ob ject using a capability

loaded into a small window and the PPT is invalid the capability is validated and

a p ointer to the required page of the ob ject is made The only signicant dierences

b etween the usage of PPT s and top level page tables are that the PPT s are a p er

pro cess data structure and the dirty bits in the ob jects page tables need to b e set

when a write op eration is p erformed on a small window To ensure that the dirty

bits of the ob jects page tables are correctly up dated entries in PPT s created by

reads are not marked writable even if the capability allows the ob ject to b e written

SMALL WINDOWS PRIVATE PAGE TABLES

......

......

......

......

......

......

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

......

......

......

......

......

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

......

......

z

j

Ob ject

Pro cess Pro cess

Page Table

Top Level Page Private Page

Table Table

U

q

Page of

Ob ject

......

......

......

......

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

.

......

......

.

......

......

.

......

......

.

......

......

. .

......

......

.

......

......

.

......

......

......

......

......

.

......

......

.

......

......

z

......

......

......

.

......

......

.

......

......

......

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

......

...... Large Window

Pro cess Pro cess

Small Window

Top Level Page Private Page

......

......

......

......

......

......

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

.

......

......

......

......

Pro cess Ob ject Table Table ......

......

Window

Figure Windows and Ob jects

CHAPTER DESIGN OF THE WALNUT KERNEL

to This causes the rst write op eration to page fault and allows the dirty bit in

the ob jects page table to b e set The PPT entry is then marked writable

System Architecture

The design of the Walnut Kernel is divided into common and hardware architecture

dep endent comp onents see gure This division in the design is reected in the

implementation which simplies the task of p orting the kernel to other pro cessors

and allows the identication of features required to allow the implementation of the

Walnut Kernel on other architectures

The comp onents of the Walnut Kernel common to all implementations include

the systemcall interface and memory and capability management The low level

functions that supp ort the common comp onents of the system are architecture

dep endent The architecture dep endent comp onents include the low level device

drivers and the kernel interface to essential system hardware

The functions of the mo dules of the kernel are dened as follows

SystemCall Interface mo dule as the interface b etween user programs and the

Walnut Kernel The mo dule p erforms the initial verication of arguments to

systemcalls and ensures that only p ermitted information is returned to the

user

Subpro cess Zero Interface interprets messages sent to subpro cess zero of a pro

cess This mechanism augments the systemcall interface

High Level Functions have b een describ ed in section

Kernel Scheduler mo dule is driven by the timer interrupt and it calls device driver

routines which require regular execution and the pro cess scheduler to allow

preemption of user pro cesses

Device Drivers are not logically part of the kernel In some implementations

the collection of interrupt driven mo dules that interface with the hardware

execute within the kernel In other implementations hardware may p erform

SYSTEM ARCHITECTURE

Architecture Architecture

Dep endent Indep endent

SystemCall Interface

High Level

Functions

Message

Device

Management

Drivers

Capability

Ob ject

Walnut

Management

Kernel

Pro cess

Memory

Management

Ob ject

Memory

Kernel

Management

Scheduler

Pro cess

Scheduler

Subpro cess

Interface

Timer

Hardware

Devices

Paging

Supp ort

Figure Comp onents of the Walnut Kernel

CHAPTER DESIGN OF THE WALNUT KERNEL

all the functions requiredCat In b oth cases a user level pro cess gains access

to the device through a numb er of pages of physical memory accessed by the

capability mechanism

Physical Memory

Kernel Memory

Secondary Storage

. . Disk Queue

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

.. ..

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. . .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

.. ..

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

state . .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

Mapable Physical . .

. .

. .

. .

. .

. .

. .

. .

. .

. .

.. ..

. .

. .

. .

. .

. .

. .

. .

. .

. .

Interrupt Handler

Memory

IO Device

Control

Buer

state

Interrupt Handler

IO Device

Control

Buer

j

state

Figure Organization of the Walnut Kernel

This design applies to b oth single pro cessor systems and multipro cessor systems

Figure presents a high level representation of the organization of the Walnut

Kernel

Interfaces to devices are placed into memory regions which can b e mapp ed into

ordinary pro cesses The remainder of the physical memory accessible to the pro

cessor is used for paging The architecture exploits the sharing of memory to allow

the kernel and user pro cesses to communicate with devices In systems with pur

p ose built devices the device can interrogate the shared memory area directly to

determine its actions Devices which require a pro cessor to provide close control of

the stages of their op eration are supp orted using low level device drivers which are

SYSTEM ARCHITECTURE

scheduled using the kernel scheduler and interrupts The low level drivers transfer

data and op erate on instructions found in the shared buer area

The disk device device driver is unique in that it is the only device which has

direct access to the tables used by the kernel The disk device noties the kernel

that a page has b een brought into memory by setting a bit in the physical memory

table indicating that the page is present

The kernel is unaware of the op eration of other classes of input output devices

All other devices are handled by user level device drivers which interact with the

devices using the standard capability mechanism to map in the memory shared with

the physical device or the low level device driver

terface SystemCall In

The systemcall interface is the mechanism through which user pro cesses explicitly

communicate with the kernel

The SystemCal l Interface consists of b oth architecture dep endent and architec

ture indep endent elements The system dep endent comp onent provides a transition

of privilege level to sup ervisor mo de and starts kernel co de at a xed address The

p ortable comp onent communicates directly with the ma jority of the high level mo d

ules of the kernel The user pro cess transfers information to the kernel by lling

in a parameter blo ck that is accessible to b oth the kernel and the user pro cess and

using the system dep endent mechanism for switching to sup ervisor mo de

The SystemCal l Interface p erforms some checking on calls to ensure that re

quests are valid and legal b efore invoking the appropriate kernel routines to p erform

the requested op eration After completing the required kernel op eration the param

eter blo ck containing the values to b e returned to the user pro cess is p ost pro cessed

to erase elds which contain information not to b e passed back to the user This

practice simplies the task of showing that the kernel do es not leak information to

user pro cesses

The programmers view of the parameter blo ck user level systemcalls and mes

sage op erations are discussed in detail in app endix A

The reserve eld of the parameter blo ck is used to select the typ e of system

CHAPTER DESIGN OF THE WALNUT KERNEL

call to b e executed Furthermore setting the reserve eld to a nonzero value

guarantees that only the current subpro cess of a pro cess is scheduled until the eld

is set zero It is imp erative that subpro cesses set the the reserve eld to the system

call identier b efore setting any of the other elds of the parameter blo ck Failure to

do so may result in the parameters of a systemcall b eing corrupted by the op eration

of another subpro cess of the pro cess At the completion of the systemcall a pro cess

must copy any values required from the parameter page b efore clearing the reserve

eld

Other mechanisms were considered for transferring information b etween user

pro cesses and the kernel These included using registers to pass parameters passing

a p ointer to parameter blo ck lo cated in the pro cesses address space using a blo ck

of data lo cated on the stack and using multiple areas for b oth each subpro cess and

each typ e of systemcall

The use of registers for transferring data was rejected as it places constraints

on the choice of pro cessor and platform This is not compatible with the design

principles

Using the stack to pass parameters was rejected b ecause of the extra eort re

quired to check the validity of the parameter area on each systemcall to prevent

attempts to read memory outside the area p ermitted Furthermore using the stack

would have the p otential to allow programmers to create errors which would b e

dicult to trace Many pro cessors have stacks which grow towards zero allowing

string op erations which overow the space allo cated on the stack to corrupt the

parameters of subroutines which have not yet completed Routines constructing the

parameters of systemcall which use other subroutines are esp ecially vulnerable as

the eect of a systemcall varies widely with minor changes to the parameter blo ck

The use of a parameter blo ck xed in a pro cesss address space was motivated

by the following considerations

Fixing the blo ck in the address space eliminates the need to test whether the

parameter blo ck lies in an area of memory validly accessible to the pro cess

Using xed memory lo cations simplies the task of ensuring that the kernel

do es not leak information and cannot b e tricked into altering system state

SYSTEM ARCHITECTURE

invalidly

Including the parameter page in the set of pages required to b e present in

memory for a pro cess to execute ensures that the parameter page is always

present while a pro cess is running so pro cesses can make systemcalls without

the risk of a page fault

The current design supp orts passing a wide range of parameters to the kernel

as the message area can b e used to pass any data structure slightly smaller than a

page in size required to the kernel The design oers great exibility and p otential

to extend the design

We decided not to allow subpro cesses of a pro cess to execute in parallel This

eliminated the requirement to provide multiple parameter blo cks Should it b e

deemed desirable to allow subpro cess to op erate in parallel the kernel can b e easily

mo died to allow subpro cesses to op erate in parallel until a subpro cess sets the

reserve eld to a nonzero value Other subpro cesses would then not b e scheduled

until the reserve eld is cleared Mutual exclusion on the reserve eld prevents

simultaneous systemcalls from the same pro cess simplifying the kernel design In

eect systemcalls would b ecome critical regions which only one subpro cess could

enter at a time

Subpro cess Zero

A sp ecial subpro cess known as subprocess zero was intro duced in the Walnut Kernel

Subpro cess zero is an extension of the kernel that interprets sp ecially formatted

messages sent to a pro cess It allows a pro cess to control the execution of another

pro cess by sending a message to the other pro cesss subpro cess zero This mechanism

supp orts only a limited numb er of op erations on a pro cess including susp ending

resuming the running of the pro cess starting a subpro cess and enquiring ab out a

pro cesss status The mechanism is implemented by parsing any p ending subpro cess

zero messages at the start of a pro cesss timeslice and then p erforming the action

asso ciated with the message If the content of the message is not recognised the message is ignored

CHAPTER DESIGN OF THE WALNUT KERNEL

There are two alternative implementations of this mechanism extending the

systemcall mechanism by adding calls to p erform the tasks currently handled by

subpro cess zero or requiring all pro cesses to provide an executive subpro cess which

handles the messages and p erforms the actions currently delivered to subpro cess

zero The current mechanism was selected b ecause it provided the conceptual equiv

alent to the latter mechanism with the immutability of the former mechanism The

design considers the functions p erformed by subpro cess zero to b e semantically asso

ciated with a pro cess but requires that all the functions b e present in each pro cess

in a consistent form

Device Drivers

The UNIX mo del of a device driver consists of two halves b oth of which are built

into the UNIX kernel The tophalf of the driver interacts with user pro cesses

It runs in synchrony with the user pro cess and may susp end itself using the sleep

systemcall The b ottomhalf of the device driver runs asynchronously with resp ect

to user pro cesses It is typically interrupt driven

The Walnut Kernel handles devices in a manner which diers signicantly from

the UNIX mo del Under the Walnut Kernel a user level pro cess p erforms the tasks of

the tophalf of the UNIX device driver The user pro cess also known as the device

manager communicates with either the hardware or a low level driver using a shared

buer area If interrupts or the system IO address space need to b e addressed to

op erate a device then a low level driver is required The low level device driver is

compiled into the kernel and typically p erforms the task of handling interrupts and

moving data b etween buer pages in the unity mapp ed region of the kernel and the

device Low level device drivers use only physical addresses to access the device

and the data and control buers Although the low level driver is compiled into the

kernel it is logically not part of the kernel as it op erates on only a dened area of

physical memory in resp onse to interrupt events

In a multipro cessor system the low level driver could b e replaced with sp ecialised

hardware writing directly into shared memory

This metho d of implementation avoids a numb er of p otential security holes

SYSTEM ARCHITECTURE

There is no requirement for a sp ecial class of user level pro cesses that can

access physical IO Only the kernel has direct access to hardware functions

eliminating a common avenue of attack

Careful co ding of the low level device driver can insure that the kernel do es

not leak information or alter data which do es not b elong to the device

This metho d also provides a numb er of signicant p erformance advantages as it

allows the low level drivers to resp ond rapidly to interrupts Two factors contribute

to the p erformance of this mechanism

The use of physical addresses and buers statically allo cated in memory This

avoids the overheads of paging and ensures the presence of buers in memory

The direct use of the interrupt which eliminates the need to invoke user rou

tines up on an interrupt

It should also b e noted that the Walnut Kernel is at least as ecient as the UNIX

mo del in terms of the numb er of memory copies required to transfer data from the

device to the user program Under UNIX at least copies are required to transfer

the data from the device to the user pro cess The Walnut Kernel typically uses

copies but requires only copies The two copy scenario is achieved by providing

the capability of the memory buer to the user level pro cess that is going to use

the data In this scenario data is copied from the device into the buer and then

retrieved by the user pro cess

Kernel Scheduler

The Kernel Scheduler p erio dically invokes routines that have b een registered with

it These routines include the Pro cess Scheduler and device drivers It indirectly

activates the scavenger routine by calling the Pro cess Scheduler The Kernel Sched

uler is driven by a hardware interrupt and invokes each registered routine in turn

The Kernel Scheduler provides the ticks which are used to preempt running tasks

8

device b ottomhalf tophalf user pro cess

CHAPTER DESIGN OF THE WALNUT KERNEL

Discussion

The architecture enhances the p ortability of the kernel The kernel interacts with

only one class of inputoutput device disk drives The interface to this class of

devices consists of a disk queue By standardising the queue interface the kernel

can b e made p ortable up to the disk queue interface Any system dep endent co de

is placed on the hardware side of the disk queue

The logical separation of devices from the kernel allows the easy intro duction of

sp ecial purp ose pro cessors on multipro cessor systems This provides an opp ortunity

to use a sp ecial purp ose pro cessor which handles disk transactions freeing the general

purp ose pro cessors to handle user co de Other sp ecial purp ose IO pro cessors may

also b e directly mapp ed into the memory of a pro cessor This makes the kernel

extremely exible as new p eripherals can b e made available without mo dication

to the kernel by mapping the devices into the address space and using user level

programs to control the new devices

This architecture is slightly less ecient at handling interactive programs than

conventional systems such as UNIX When a pro cess blo cks on IO in a conventional

system the pro cess scheduler is notied that the pro cess cannot b e run and the pro

cess is typically removed from the short term scheduling queue until the IO op eration

has completed Under the Walnut Kernel this optimisation is not available to the

pro cesses which manage devices Device drivers are separated from the kernel and

this prevents the scheduler from b eing notied that a pro cess is blo cked on IO The

name of the device manager pro cess could have b een stored in the data accessible to

the low level device driver and messages sent to the pro cess on device activity at the

cost of increasing the coupling of the device driver and the kernel and additional

system load resulting from messages sent by the kernel to the device manager The

absence of this optimization results in a small loss of eciency Careful program

ming practices such as surrendering the pro cessor quickly on determining IO is not

p ossible reduce the impact of this architectural limitation Ecient blo cking IO is

provided for other pro cesses through stream IO libraries These libraries put the

reading pro cess to sleep when the pro cess is blo cking on input When more data is

available the reading pro cess is reawakened by a message from the writing pro cess

DESIGN ISSUES

A signicant feature of the kernel to the user was the decision to make

the kernel o ccupy a region of the pro cess address space From a system design

p oint of view making the kernel p ermanently resident in memory was of greater

imp ortance By making the kernel p ermanently resident in memory and a part of

the user pro cess address space a numb er of signicant problems were avoided and

the task of implementation simplied Development and debugging time was saved

The elimination of page faults within the kernel simplies the virtual memory

system In systems where paged kernels are employed it is p ossible for the kernel

to b e stalled while paging in a critical comp onent In addition the problem of page

faulting while handling a page fault has b een eliminated resulting in a reduced size

of the kernel stack and reduced complexity in handling page faults As page faults

should not arise in sup ervisor mo de the leaking of kernel p owers to user pro cesses

is prevented

A unity mapping of the kernel memory region into the pro cess address space

allows the kernel to switch b etween virtual and physical addressing schemes easily

Unity mapping memory simplies the design and implementation of low level device

drivers

A ma jor advantage of placing the kernel at a xed lo cation in the physical address

space is that it enables use of a logic analyzer to b e used to trace the addresses of

executing instructions During the early phases of development the logic analyzer

proved invaluable as it provided a mechanism for determining the cause of failures

at the instruction level

Design Issues

Pages versus Segments

The overriding concerns of p ortability forced a much coarser granularity of protec

tion onto the Walnut Kernel than was present in the PasswordCapability System

Although it would b e desirable to have the byte protection granularity found on

the PasswordCapability System available this would require the use of either seg

ment registers where the pro cessors have these available Is not p ossible to provide

CHAPTER DESIGN OF THE WALNUT KERNEL

extremely ne levels of control where only the paging mechanism is available us

ing memory access instructions It would b e p ossible to provide ne grain control

on accesses to an ob ject mediated by the kernel This could b e implemented as a

systemcall which would recover a set of bytes sp ecied by the caller This mecha

nism suers from signicant overheads and was deemed to o inecient to b e widely

used

The Walnut Kernel has a signicant advantage over the PasswordCapability

System in that the Walnut Kernel supp orts approximately capabilities in a

pro cess address space at a time This numb er could b e easily increased should the

numb er b e considered insucient The older system supp orted only capabilities

at a time Although it would have b een p ossible to rewrite the PasswordCapability

Systems kernel to handle greater numb ers of concurrently loaded capabilities the

mo dications would b e far greater than those required on the Walnut Kernel

The division of windows into small and large typ es reects a compromise b etween

the space eciency provided by large windows and reduced granularity oered by

small windows Although the compromise adds complexity to the view of the system

seen by the programmer it allows programmers greater exibility in dealing with

and control over ob jects b eing manipulated by Walnut Kernel programs

Multiple Pro cessors

A p eer implementation was selected for the design of the multipro cessor system

rather than a masterslave implementation The use of a uniform architecture for

b oth unipro cessor and multipro cessor systems eliminates optimisations available for

the two classes of implementation However it signicantly eases the task of p ort

ing the system and reduces development costs In the case of multipro cessor de

velopment it allows low cost compatible unipro cessor hardware to b e used in the

development phase b efore shifting to multiple pro cessors for the pro duction phase

Supp orting shared memory multipro cessors is relatively simple in that the kernel

can access information available to other pro cessors by uttering addresses within the

shared comp onent of the system memory The co de and data used by the kernel

can b e shared in total The only exception is that the scratch data structures need

DESIGN ISSUES

to b e distinct for each instance of the kernel Typically an instance of the kernel is

asso ciated with each pro cessor in the system If each pro cessor p ossesses sucient

private storage each instance of scratch may b e held in the memory asso ciated with

each pro cessor In the case of fully shared memory multipro cessors it is necessary

to distinguish the private storage of dierent instances of the kernel by using an

indexing scheme based on pro cessor numb er to prevent the overwriting of the areas

by other kernels

As commercial SMP machines typically have uniform memory access sp eed the

organization of system tables has a limited eect on the op erating eciency of the

kernel In direct contrast to this the p erformance of implementations on machines

with NUMA NonUniform Memory Architectures organization is sensitive to the

arrangement and lo cation of system tables On such systems it is necessary to place

frequently accessed data in memory lo cal to the pro cessor On architectures with

a large communication path diameter the organization of memory can b ecome a

critical factor in system p erformance

To allow for ecient implementation of the kernel on multipro cessor systems

with a NUMA organization the Walnut Kernel has b een designed to allow pro cess

information to b e stored in memory lo cal to the pro cessor executing the pro cess and

to minimize access to centralised tables This approach leads to distributing system

tables across the system By asso ciating system tables with individual pro cesses

the Walnut Kernel provides a mechanism for decentralising the ma jority of the

system tables and ensuring that kernel information required for a pro cess can b e

easily identied and hence moved to memory lo cal to the executing pro cessor A

few centralized data structures are still required Key among these structures are

a list of ob jects currently in use in the system AOT and a list of pro cesses to b e

run mix The kernel was designed to minimise the numb er of accesses required to

these structures

Multipro cessors with partitioned data in address spaces which are private to

a pro cessor or group of pro cessors such as the SP AMM can b e supp orted

by the Walnut Kernel The SP is representative of a class of multipro cessors

9 Systems without a globally addressable memory space

CHAPTER DESIGN OF THE WALNUT KERNEL

which has recently b ecome p opular known as a network of workstations These

systems communicate using message passing The Walnut Kernel would implement

distributed shared memory on this class of architecture by copying blo cks of mem

ory from address space to address space using the message mechanism provided by

the hardware Access to common tables would b e p erformed by encapsulating the

op erations on the tables into messages and either broadcasting the message where

appropriate or transmitting the message to the no de managing the table Shar

ing pages of user data could b e p erformed by sharing read only copies of a page

The rst pro cess to attempt to write to a page would invalidate all other copies

of the page and then allow the pro cess to write to the page A short time later

the pro cess would return the page to a readonly state and allow other pro cesses to

take new copies of the page This mechanism allows pro cesses to provide the sem

blance of shared memory on a partitioned data machine However the mechanism

is exp ensive in b oth communications costs and page fault handling This indicates

that although the Walnut Kernel can op erate in that environment it would incur

signicant overheads with programs using ne grained parallelism

The practice of timing out data structures used by the kernel and the page map

ping mechanism applied to a pro cess have signicant advantages over the alternative

mechanism of recording pro cesses that use a given capability and up dating aected

pro cesses up on revo cation of a capability Although the timing out approach places

a constant overhead on pro cess execution this overhead is not directly dep endent

on the numb er or arrangement of the derivatives of an ob jects master capability

nor is it dep endent on the numb er of pro cesses sharing a capability The overhead

of up dating lists of pro cesses using a given capability is esp ecially burdensome on

multipro cessing systems This is b ecause it requires the list to b e lo cked during

up dates caused by pro cesses loading and unloading capabilities Contention over

lo cks can cause ma jor p erformance b ottlenecks The Walnut Kernel avoids this

problem by ensuring that the structures describing a capability are only lo cked for

a short time when a capability is b eing mo died This ensures that pro cesses on

separate pro cessors sharing resources are not inhibited signicantly by the loading

and unloading of capabilities

DESIGN ISSUES

The expiring of page tables to o rapidly intro duces a signicant overhead to a

pro cess however reducing the rate causes an increase in the length of time the

rights conferred by a revoked capability p ersist A balance b etween these comp eting

interests determines the rate at which page tables are replaced Noting that the cost

of rebuilding page tables for a working set of pages is prop ortional to the numb er

of active pages within the tables makes a rapid rate of expiration attractive as it

reduces the amount of work required for the rebuilding of a table A p erio d of the

order of a second is viable

Messages

The message passing mechanism do es not preserve the order of messages In partic

ular when messages are passed b etween a single sender and a single receiver there

is no guarantee that the messages will b e received in the order sent The prop erty

of preserving the order of messages is highly regarded by other system designers and

is supp orted in systems such as the SP SHFG The presence of the prop erty

reduces the level of nondeterminism present in parallel programs enhancing the

repro ducibility of results

Although the issues of preserving the order of messages were considered the

current design of the message passing mechanism makes no attempt to preserve

chronological order The decision was made on pragmatic grounds as to prop erly

supp ort order of delivery it is necessary to have synchronized clo cks for each pro

cessor to provide a precise representation of a universal time across the system

The requirements for precise timing would restrict the typ e of hardware that could

b e employed in contradiction of the design principles Furthermore encouraging

programmers to b elieve that the order of messages is strictly preserved could pro

mote unsafe programming practices in a multipro cessor system Programmers might

attempt to exploit the prop erty using a collection of pro cesses resulting in an in

crease in nondeterminism of the system rather than a decrease in nondeterministic

b ehavior

When a Walnut Kernel pro cess sends a message the pro cess remains blo cked

until either the message is delivered or it is determined that the message cannot b e

CHAPTER DESIGN OF THE WALNUT KERNEL

delivered Another p otential implementation is to blo ck the pro cess only until the

message is queued for delivery to the other pro cess The alternative implementation

is attractive as it allows a pro cess to dispatch messages quickly and continue pro

cessing however it also has signicant disadvantages Among the problems is the

design issue of the correct sizing of the queues to contain undelivered messages and

the absence of a guarantee of delivery of a message The current implementation do es

not suer from the problems inherent to the alternative and although an individual

pro cess is delayed other pro cesses are able to use pro cessor cycles while delivery is

taking place In addition the Walnut Kernel implementation places greater control

of the message mechanism in the hands of the programmer as the programmer is

able to determine what action to take if a message is deemed to b e undeliverable

In systems with a delivery buer shared by a numb er of pro cesses it is p ossible

to create a deadlo ck by preventing the delivery of messages from a pro cess to a

pro cess exp ecting that message This p otential is not present in the Walnut Kernel

as each pro cess has its own mail b oxes The ability to reserve mail b oxes allows

the programmer to guarantee the availability of a set of mail b oxes for a given task

Control of the allo cation of mail b oxes gives the programmer greater control over

message delivery allowing the programmer to manage resources to minimise or avoid

the risk of deadlo ck

Pro cesses and Subpro cesses

The original Capability Based Kernel did not supp ort subpro cesses and provided

no mechanism for asynchronous event notication The Walnut Kernel intro duced

subpro cesses as a mechanism to allow asynchronous events to b e handled by a pro

cess

Subpro cesses consist of a thread of execution through the pro cesss address space

The Walnut Kernel supp orts up to subpro cesses within a pro cess Subpro cesses

are preemptively multitasked with the exception that they share access to a common

parameter blo ck and the subpro cess claiming the parameter blo ck excludes other

subpro cesses of the pro cess from executing until the parameter blo ck is released

Facilities for dealing with asynchronous events were added to pro cesses by allowing

DESIGN ISSUES

subpro cesses to receive messages and reserve mail b oxes for their use A message

constitutes an event asynchronous to the pro cess Subpro cesses are necessarily co

op erative by nature as they share a single address and share a critical common

resource the parameter blo ck and hence have no protection from each other

Pro cesses may b e adversarial in nature as they are immune from the eects of other

pro cesses unless address space is shared or another pro cess p ossesses capabilities to

the pro cess or ob jects used by a pro cess

Only a single subpro cess of a pro cess may b e active at one time The design

decision was driven by two considerations ensuring compatible b ehavior b etween

unipro cessor versions and multipro cessor versions and avoiding splitting pro cess

data structures over the lo cal memories of more than one pro cessor By eliminating

multitasking at the subpro cess level pro cesses have compatible b ehavior on b oth

single pro cessor and multipro cessor systems In NUMA machines signicant p er

formance losses could result from a pro cessor executing a subpro cess with nonlo cal

pro cess data structures when increased load on the communication mechanisms o c

curs

The restriction on concurrent execution to pro cesses encourages the use of sub

pro cesses for their original function of supp orting asynchronous events The ability

of a pro cess to control the level of the exp osure to the actions of other pro cesses

by explicitly sharing address space makes pro cesses sup erior to subpro cesses for

many coop erative tasks

ExecuteOnly Co de

During the design pro cess it was observed that it would b e desirable for the kernel

to supp ort the executeonly right found in the Monash Multipro cessor Pro ject As

the primary target machine Intel did not supp ort execute right directly within

page tables the design option of intro ducing execute right by manipulating the

page tables was explored Several mechanisms for synthesising execute only co de on

machines with execute right implied by read right were prop osed The most time

ecient mechanism was

CHAPTER DESIGN OF THE WALNUT KERNEL

For each executeonly ob ject loaded into the pro cesss address space

a new set of top level page tables and private page tables is created

These page tables have read right set for the executeonly capability All

other entries in these page tables are set to invalid The normal page

tables used by the pro cess have the invalid bit set for entries relating to

executeonly ob jects

Up on entry to an executeonly ob ject a page fault o ccurs The fault

handler checks the Table of Loaded Capabilities and discovers that the

fault lo cation corresp onds to an executeonly page The fault handler

substitutes the appropriate set of page tables for the normal set of page

tables

When an access is made outside the the executeonly area a page

fault o ccurs If the access is made to a nonexecuteonly capability the

normal page tables are used otherwise the appropriate set of page tables

for the new executeonly capability are loaded

The mechanism is quite exp ensive as it incurs the cost of a page fault for b oth

entry to and exit from executeonly co de and every memory access outside the

executeonly capability In addition the mechanism adds a signicant numb er of

overhead pages to a pro cess using executeonly capabilities as it requires the presence

of a numb er of sets of page tables

The overheads required to provide executeonly co de across a wide range of

platforms were considered to o high Two motivations exist for the need for execute

only co de The rst is the desire to hire out co de to a user with the certainty that

they cannot retain the use of the co de after the hire p erio d has ended Unfortunately

the Walnut Kernel do es not currently supp ort a mechanism suitable for renting co de

except by renting the services of a pro cess which has access to that co de The second

motivation is to hide a critical piece of information typically a capability within a

co de ob ject A new mechanism was intro duced to provide an alternative solution to

the problem of hiding critical data the SRMULTILOAD right SRMULTILOAD

right allows capabilities to b e derived which are only usable by a sp ecic pro cess

making dissemination of the capability unpro ductive

DESIGN ISSUES

A numb er of mechanisms have b een considered for hiding the contents of an

ob ject One mechanism requires a new class of ob jects known as protected objects

Protected ob jects are unreadable until a systemcall enter is made which causes

a jump to a xed lo cation in the ob ject The ob ject is then made readable A

further systemcall leave is used to make the protected ob ject unreadable and

return to the caller Protected ob ject mechanism is only a partial solution as it do es

not provide adequate protection in a pro cess with several subpro cesses and it do es

not protect the calling subroutine from the co de contained in the called subroutine

The mechanism can b e enhanced by making all other protected ob jects contained

in a pro cess unreadable when an enter call is made This prevents called routines

from spying on the caller Further enhancement is required to deal with multiple

subpro cesses Pages must b e made readable or unreadable dep ending on which

subpro cess is executing This is undesirable as it requires a pro cess to maintain

information ab out the accessibility of the ob ject represented by each entry in the

TLC for each subpro cess At present no acceptable mechanism has b een found

The prefered approach to making co de widely available but unreadable is to

encapsulate it in a Walnut Kernel pro cess Users wishing to employ the co de can

then request its execution by sending the pro cess a message including capabilities

for the data on which the co de is to op erate Clearly this approach is inecient for

co de b o dies with very short execution times but these are unlikely to b e candidates

for hiring

are Hardw

The minimum requirements for supp orting the Walnut Kernel on a system are

Supp ort for paged virtual memory This may b e implemented using either

page tables or translation lo okaside buers If page tables are supp orted then

a two or more level page table is required and page table entries must b e of

regular size and format having at least valid and dirty bits

A timer interrupt A regular source of interrupts is required to allow pre

emptive multitasking to function

CHAPTER DESIGN OF THE WALNUT KERNEL

Integers of at least bits in size must b e available

Hardware supp ort for atomic access to memory lo cations

Access to a time source of at least second accuracy

A privilege mechanism and a metho d for making systemcalls

These requirements are met by a wide range of currently available micropro ces

sors from b oth the RISC and CISC design streams

The Walnut Kernel places minimal requirements on the contents of page tables

It requires only the presence of a dirty bit and a valid bit of course The use of the

Walnut Kernels own data structures combined with the minimal requirements on a

page table entry enhance the kernels p ortability However this comes at the cost

of duplicating part of the data contained in the page tables On systems with only

translation lo okaside buers such as the R MIP no duplication o ccurs

System Initialization

The system initialization pro cess is unique within the Walnut Kernel as although

it is p ersistent as are the other pro cesses it is restarted each time the kernel is

started The capability mechanism enhanced with the restrict op eration allows user

pro cesses to manipulate the system memory with high security The initialization

pro cess is run as an ordinary user pro cess The only sp ecial feature of the initial

ization pro cess is that no other pro cess can execute b efore the initialization pro cess

allows the pro cess scheduler to start It uses systemcalls to derive relatively safe ca

pabilities for user level device drivers from a capability for the kernel address space

The initialization mechanism allows the existing derivation co de to b e employed

minimizing the amount of sp ecialised co de for use only during initialization found

within the kernel

Currently the initialization pro cess p erforms the following tasks

read the capability for the system memory from the wall a page mapp ed into

the address space of all pro cesses

MONEY

derive capabilities required by device drivers

restrict the capability for the system memory

remove the capability for the system memory from the wall

start the pro cess scheduler

send messages to device drivers containing the required capabilities

The scheduler is enabled to schedule pro cesses other than the initialization pro

cess by writing into an address in the kernel area of the system memory By ensuring

that other pro cesses are not scheduled b efore the initialization pro cess has restricted

and removed the publicly readable version of the system capability leakage of in

formation critical to the function of the system to malicious pro cesses is prevented

The current system uses a constant for the value of the capability for the physical

ob ject However by having the initialization pro cess read the value of the capability

from the wall a randomly selected set of passwords could b e employed without

changes to the initialization pro cess

Money

This section addresses the issues of rent collection and payment for kernel services

The mechanisms discussed have yet to b e implemented in the exp erimental version

of the kernel

The rent collector p erio dically scans every pro cess on the mounted volumes and

deducts rental prop ortional to the time since rent was last collected from the ob ject

The rent collection mechanism allows for variations in the rate of op eration of the

rent collector and for consistent charging in prop ortion to resource use for oline

volumes Rent is collected from an ob jects store of money Bankrupt ob jects are

deleted

The systemcall interface incorp orates a charge which is levied against the cash

of the pro cess making the call Although it would b e desirable to levy the pro cess

for the amount of work required to complete the systemcall this would intro duce

uncertainty for the caller and complicate the collection pro cess

CHAPTER DESIGN OF THE WALNUT KERNEL

Pro cess Scheduler

The Walnut Kernels existing roundrobin pro cess scheduler is suitable for testing

purp oses in an exp erimental implementation but is inadequate for a pro duction

system In a mature system a more sophisticated scheduler would b e employed

This scheduler would use a round robin p olicy for a set of pro cesses with current

wakeup times the pro cess is scheduled to wake up at or b efore the current time

A set of lists of pro cesses with access times in the future would b e maintained The

lists would contain pro cesses scheduled for wakeup within a time p erio d Perio dically

each list would b e examined and elements would b e migrated to appropriate lists or

the current queue Figure illustrates the prop osed scheme

EXEC

Stage Queue Stage n Queue Round Robin Stage Queue

t w ak eup t t w ak eup w ak eup t Scheduler

2 n1 1 1

sec Scan Rate t sec Scan Rate t Scan Rate w ak eup sec

1 n 1

Figure Prop osed Scheduling Scheme

It is intended that this scheme will b e implemented in two parts The rst will

consist of the existing round robin scheduler within the kernel but enhanced so that

the interface to the queue will b e accessible to user pro cesses holding an appropriate

capability The second part will b e one or more user level pro cesses which will

maintain the lists of pro cesses and move pro cesses into the mix

The prop osed mechanism allows pro cess scheduling to b e conducted at the user

level User level scheduling oers the p ossibility of exploring alternative near term

and long term scheduling schemes and simplies the kernel Short term scheduling

is retained in the kernel for the purp oses of eciency to eliminate the need to

communicate with a user pro cess on each pro cess swap

The current implementation of the pro cess scheduler is clearly inadequate for a

PROCESS SCHEDULER

pro duction system The highly restrictive limits it places on the numb er of pro cesses

is inappropriate for a p ersistent based system However the implementation allowed

the system to b e assembled and tested quickly

The prop osed scheduler addresses the deciencies of the current implementation

CHAPTER DESIGN OF THE WALNUT KERNEL

Chapter

Implementation

At present there are two implementations of the Walnut Kernel The rst of these

runs under b oth UNIX and MSDOS This version simulates the hardware sup

p orting the kernel The second version runs on IBMPCs using Intel and i

pro cessors based on an ISA bus architecture This chapter describ es the two imple

mentations and the mechanisms used to load user programs onto a native system

The Standalone Implementation

The Standalone version of the Walnut Kernel emulates asp ects of the underlying

hardware and executes ab ove a host op erating system Its name has b een the sub ject

of debate and is sometimes considered a misnomer It was named the Standalone

version early in the development to imply that it did not require the supp ort co de

used in a native version The name is somewhat misleading as the Standalone

version is dep endent on the host op erating system for p ersistent storage memory

and IO This version do es not supp ort the execution of compiled programs Instead

a sp ecial typ e of pro cess known as drive processes are executed

Drive pro cesses are interpreters for a simple language that allows the contents of

the parameter blo ck to b e sp ecied and system calls to b e invoked The language has

constructs that allow for iteration and alternation providing a language suciently

p owerful for testing all the higher level functions of the kernel

1

MSDOS is a trademark of Microsoft Corp oration

CHAPTER IMPLEMENTATION

This version of the kernel shares all the architecture indep endent co de of the ker

nel with the exception of the pro cess scheduler which has b een mo died to supp ort

the op eration of drive pro cesses and the system call mechanism The scheduler in

the standalone version is a lo op which selects the next pro cess to run sets up the

parameter blo ck and invokes the drive pro cess interpreter When the drive pro cess

returns the program counter p osition in the drive pro cess program is stored and

the lo op is rep eated The drive pro cess do es not access the system call mechanism

instead it calls the routines that the system call mechanism would use directly

The system memory controlled by the Walnut Kernel is simulated by allo cating

a large blo ck of memory when the kernel is started This is then divided up by

the supp ort routines b efore the scheduler is invoked Accesses to memory by the

drive pro cess are simulated by having the drive pro cess call a subroutine which

accesses the page tables constructed by the kernel in the simulated memory These

routines emulate the activities of access which result in page faults as well as ordinary

accesses

A single volume is simulated by the supp orting software It is represented as

a memory array This volume can b e loaded from or stored to disk The user is

prompted for a le when the kernel starts from which to read the disk image and

has the option of sp ecifying a le name for storing the image when the system is

shutdown The disk images are accurate representations of the data stored in disk

blo cks on a native system These images are used as one of the mechanisms for

loading programs into a native system see section

The Standalone system was invaluable in the development of the Walnut Kernel

as it p ermitted the development of the kernel in an environment where debugging

facilities where available Activities of the kernel could b e logged to les and re

p eatable test scripts could b e run Early in the development of the i version

several signicant diculties o ccurred triple faults could o ccur which resulted in a

reb o ot timing problems resulted in subtle dicult to rep eat errors and there was

no p ermanent store available for logging The Standalone version provided b oth a

mechanism to build disk images used to initialize the native system and a testing

ground that allowed the high level routines to b e exercised in a friendly environment

I IMPLEMENTATION

b efore attempting to use them in a native system

The Standalone version of the kernel has b een used in the following environments

based PC running DOS

based PC running OS

based PC running FreeBSD

R based workstation running

R based workstation running Irix

i Implementation

The current implementation of the Walnut Kernel runs on Intel and i based

p ersonal computers The kernel makes direct use of the system hardware requiring

only the b o ot sector loader and initialization co de contained in the PC BIOS

Use of an existing op erating system as a host allows the implementor to use the

work of the designers of the host op erating system However the price is a loss of

exibility and the additional overhead of accessing the host op erating system For

these reasons the kernel was made to rely only on the system hardware for supp ort

The option of using MSDOS as a host was rejected In addition only the loading

and initialization services provided by the BIOS were used as the standard PC BIOS

routines are written to op erate in the pro cessors real mo de bit mo de only

The i supp orts a numb er of architectural features which may b e exploited at

the op erating system levelInt CG These features include

Segments Variable length segments are provided as the primary mechanism for

memory protection This mechanism enables checking of segment typ e read

able writable and executable segment limits and segment privilege levels

Paging Two level page tables with usersup ervisor and readwrite bits are sup

p orted Translation lo okaside buers are loaded automatically by the pro cessor

using the contents of the page tables

CHAPTER IMPLEMENTATION

Multiple Privilege Levels Two distinct privilege mechanisms are present a

level segment based mechanism and a level page based mechanism

Tasks A task data structure is required by the pro cessor This data structure con

tains the values of registers for a task at each of the privilege levels supp orted

by the pro cessor

space Access to the IO memory space of the pro cessor can b e controlled on IO

a segment based privilege level basis or on a p er task basis using a map which

allows access to IO addresses selectively

The current system uses Intels Flat Mo del of memory management Under this

mo del segments are present in a minimal capacity and cover the complete address

space of the pro cessor The paging mechanism is used to provide protection and a

large virtual address space This decision conforms to the requirements of section

where only features available on a wide class of pro cessors are exploited by

the Walnut Kernel A two level privilege scheme is the most general mechanism for

providing the necessary protection for the kernel This mechanism was adopted in

the interests of p ortability

The task mechanism and its asso ciated IO address space access management

allowed the Walnut Kernel to have user level pro cesses with the ability to op erate

as low level device drivers A scheme was adopted using a single task with multiple

user pro cesses and a single kernel shared by the user pro cesses The current scheme

has the advantages of greater p ortability oering similar levels of supp ort to those

found on RISC pro cessors It also avoids the need for the kernel to manage multiple

task state segments

IO address space access was limited to the kernel to force low level drivers into

the kernel User level low level drivers have the apparent advantages of b eing mo d

iable during the op eration of the system and having the same facilities available

as user level programs User level drivers can complicate the supp ort of user pro

cesses as they place new and signicant demands on the kernel The ability to allow

selective access to hardware facilities is required for user level drivers In systems

with memory mapp ed IO this is clearly compatible with the concept of capabil

LOADING PROGRAMS INTO THE WALNUT KERNEL

ities Supp orting systems with a separate IO address space requires additional

complexity to b e added to the capability mo del This is b ecause there is no longer

a single uniform resource consisting of memory to b e managed but two distinct

resources memory and IO address space to b e managed The user level driver

approach is further complicated by the need to ensure that drivers have adequate

access to physical memory and execution time Lack of either of these resources

could result in a deterioration of system p erformance By placing device interrupt

handlers into the kernel it is p ossible to make the kernel resp onsible for the correct

op eration of the system This allows the kernel to ensure that accesses to hardware

do not place the system into an unstable state Moving low level drivers into the

kernel allows scheduling and resource issues to b e handled easily As a result of these

considerations the Walnut Kernel implements device interrupt handlers within the

kernel

The kernel do es not use many of the sp ecial features of the Intel and i to

ensure a simple migration path from the initial implementation to generic hardware

i implementation currently has low level device drivers for RS serial p orts

Centronics parallel p orts keyb oard and color and mono chrome text based displays

The circular buers used to access these devices are held in the rst pages of

memory Access to volumes blo ck oriented devices is provided by a low level

device driver which supp orts ST and IDE based hard disks A oppy disk driver

is provided for reading programs from DOS formatted disks

Programs into the Walnut Kernel Loading

There are three mechanisms currently available for loading executable programs into

the Walnut Kernel

Using a serial p ort under the control of a drive pro cess

Adding a program to the collection of ob jects written to the b o ot drive when

the system is initialised

Using Shell see section to read the program from a DOS formatted oppy disk

CHAPTER IMPLEMENTATION

The Walnut Kernel exp ects executable programs to consist of two parts The

rst part is the text comp onent the instructions to b e executed and the second

part is the data comp onent static data global variables etc Constants can b e

placed in either part Two ob jects are used so that the program data can b e copied

and hence used for each each pro cess without interference and the program co de

can b e shared Programs are currently compiled under gcc and linked with ld A

le and pro duces a text and a data le These are program reads the resulting aout

then transfered to the Walnut Kernel using one of the techniques describ ed b elow

The i version of the Walnut Kernel currently supp orts drive processes These

are used as a debugging aid and for testing purp oses The two serial p orts provided

in the current version are used to provide a console op erating on a glass tty for

the drive pro cesses and a serial IO stream used for transferring data A primitive

communication proto col is used on the serial connection to drive a program running

under DOS The program reads disk les and transfers them through the serial

connection Although this mechanism is still present in the kernel it is no longer

used The simple proto col was prone to failures caused by the loss of synchronization

b etween the sender and receiver at the time the inconvenience caused by this was not

sucient to warrant the additional eort required to improve the proto col Other

mechanisms have b een implemented which are faster and more reliable

When a b o otable system is b eing created the initialization pro cess and the data

it op erates on must b e transfered to the system These ob jects must b e present

b efore the system is started to ensure that they are placed in standard lo cations and

hence can b e found by b oth the kernel and the initialization pro cess To p erform

this task the standalone version of the kernel is employed to load the ob jects in a

prescrib ed order and create a disk image le The volume and serial numb ers are

known for each ob ject created by the standalone version b ecause the size of each

ob ject is known the order of loading is sp ecied the algorithm for allo cating disk

blo cks is known and no ob jects are removed Furthermore the series of instructions

used by the standalone version to load the ob ject from disk includes instructions

to store the master capability of the ob jects loaded in an ob ject on the disk This

ob ject is read by the initialization pro cess on startup and then deleted Other user

LOADING PROGRAMS INTO THE WALNUT KERNEL

programs can b e added by having the standalone version load their co de and data

ob jects The master capability for the ob jects created can b e displayed using a drive

pro cess The nal disk image le pro duced by the standalone version is read by one

of the install programs and the blo cks are copied to the install disk

A facility to read les from DOS formated oppy disks and make ob jects under

the i version of the kernel was intro duced into the shell This mechanism uses the

oppy disk manager to access the oppy disk at the sector level Shell has functions

which allow it to read les from disk and read the disks directory

CHAPTER IMPLEMENTATION

Chapter

Performance

This chapter details a numb er of p erformance measurements carried out on the

Walnut Kernel The metho d of the exp eriments and the conditions under which

they were conducted are discussed In addition the p erformance of a conventional

UNIX op erating system op erating on similar hardware is provided to p ermit some

comparison to b e made b etween the Walnut Kernel and a conventional op erating

system The tests conducted under UNIX approximated the functions of the Walnut

Kernel However signicant dierences in design b etween UNIX and the Walnut

Kernel prevent a close comparison of the p erformance of the two systems

Test Environment

Software

The tests were conducted on a version of the Walnut Kernel which did not contain

the kernel debugger but retained diagnostic co de This version of the kernel had not

b een optimised for p erformance

The tests on UNIX were run on a FreeBSD VR op erating system Wal

This version of UNIX is a derivative of the University of California Berkeleys Net

working Release of BSD

Note that the Walnut Kernel was compiled without optimisation whereas the

1

The FreeBSD CDROM is a trademark of Walnut Creek CDROM

CHAPTER PERFORMANCE

UNIX kernel was highly optimised

Hardware

The test machines were as close to identical as p ossible in that they used mother

b oards p eripherals and interface cards sourced from the same manufacturer The

critical parameters of the test machines were sp ecied as follows

Intel DX MHz

Mb RAM

Caviar HDD

Both the DXs internal cache and the external k cache were disabled for

the tests The caches were disabled to make the initial conditions of all exp eriments

as similar as p ossible In addition b ecause of the wide variety of external caches

available for the i and the diculty in determining the typ e of cache in use

from system sp ecications disabling the external cache would tend to improve the

rep eatability of results when the exp eriments are p erformed on similarly sp ecied

hardware

There is no apparent reason to suggest that the memory reference patterns of

the Walnut Kernel would b e less favorable for caching than UNIX

Timing

The test programs on the Walnut Kernel had access to a bit counter that is

driven by a MHz frequency source This counter was derived from channel

of the systems Programmable Interval Timer SC Mac Tri Nor High

nS is achieved by p olling resolution timing The resolution of the timer is

the Programmable Interval Timers counters Ro d

The Programmable Interval Timer supp orts channels Although several

mo des of op eration are supp orted by each of the s channels only the square

wave generator mo de is relevant to the implementation of high resolution timer

When congured as a square wave generator a channel provides a bit counter

which is driven by the MHz frequency source The bit counter is loaded

TIMING

with the reload value and the counter pro ceeds to count downwards by two Each

time the counter reaches zero the output bit of the channel is inverted and the

counter is reloaded with the reload value The output bit of a channel in this mo de

duty cycle with an even reload value generates a square wave with a

IRQ

IRQ

IRQ

MHz Pro cessor

IRQ

Interrupt

IRQ

Pin

IRQ

IRQ

IRQ Reload

A Cnt

Cascade

Reload

DRAM

IRQ

Cnt

refresh

IRQ

circuit

Reload IRQ

Cnt IRQ

IRQ

IRQ

IRQ

IRQ

A

Programmable Interval Timer

A Programmable Interrupt Controller

Figure Timer and Interrupt Hardware in the IBMPCAT

In the IBMPCAT architecture see gure channel is used to provide

timer interrupts channel is used to provide memory refresh cycles and channel

is used as a tone generator for driving the system sp eaker Channel s output bit

drives interrupt request line IRQ which is connected to the interrupt controller

CHAPTER PERFORMANCE

The interrupt controller generates a timer interrupt on each p ositive transition of

IRQ

As the bit counter provided by the is decremented by twos the least

signicant bit of the timers counter provides no useful information A bit counter

is synthesised by extracting the useful bits from the timers counter and using

the timers output bit to provide the most signicant bit of the synthesised counter

value

The pro cess scheduler is entered whenever a timer interrupt o ccurs or a pro cess

surrenders its timeslice The pro cess scheduler determines the numb er of counts

that have elapsed b etween the current access to the timer and the last access and

adds this to a bit counter that is visible to programs This guarantees that

whenever a pro cesss co de is entered the current value of the bit counter is

accessible

The microsecond timing provided by FreeBSD is derived from the channel

timer of the Programmable Interval Timer using a metho d similar to that describ ed

ab ove The gettimeofday library function is used to access the time

Walnut Kernel

Three programs were written to test the p erformance of the Walnut Kernel The

programs contained several tests Each test consisted of rep etitions of a set of

system calls The results of the tests were written into an ob ject created by the test

program

A second delay was built into the test program to allow the drivetyp e pro cess

to b e used to freeze any other pro cesses present on the system and to allow time for

the system to settle after p erforming this task

2

Timer interrupts are known as ticks

3

Driv e pro cesses are pro cesses which execute a simple command interpreter built into the

kernel Drive pro cesses are intended only to b e used in the development phase of the Walnut

Kernel and will b e removed from other versions of the kernel

WALNUT KERNEL

Program

This program tested the p erformance of communication related system calls Sp ecif

ically it tested the sp eed of the following system calls

external send

send

receive

load capability

unload capability

Test and Test

These tests measured b oth the time taken to execute an external send system call and

the total time taken to complete the transfer from the b eginning of the rst attempt

until the end of the successful attempt In b oth tests no money was transferred

b etween the sender and the receiver Test had a message of zero length and Test

had a message of bytes length

Test and Test

These tests measured b oth the time taken to execute a send system call and the total

time taken to complete the transfer from the b eginning of the rst attempt until the

end of the successful attempt The target pro cess was loaded into a large window

of the test pro cess and identied by oset In b oth tests no money was transferred

b etween the sender and the receiver Test had a message of zero length and Test

had a message of bytes length

Test and Test

These tests measured b oth the time taken to execute an external send system call

and the total time taken to complete the transfer from the b eginning of the rst

attempt until the end of the successful attempt A wait of one second was intro duced

CHAPTER PERFORMANCE

b etween each rep etition of the test to ensure that the target pro cess had an empty

mailb ox available to receive the message In b oth tests no money was transferred

b etween the sender and the receiver Test had a message of zero length and Test

had a message of bytes length

Test and Test

These tests measured b oth the time taken to execute a send system call and the total

time taken to complete the transfer from the b eginning of the rst attempt until the

end of the successful attempt The target pro cess was loaded into a large window

of the test pro cess and identied by oset A wait of one second was intro duced

b etween each rep etition of the test to ensure that the target pro cess had an empty

mailb ox available to receive the message In b oth tests no money was transferred

b etween the sender and the receiver Test had a message of zero length and Test

had a message of bytes length

Test and Test

This test measured b oth the total time taken to transfer a message and the time

taken to execute a receive system call The transfer time is calculated from the

b eginning of the rst attempt to p erform a send to the time at which the contents

of the message is made accessible by a successful receive call A message of bytes

length and with no money was sent A wait of one second was intro duced b etween

each rep etition of the test to ensure that the target pro cess had an empty mailb ox

available to receive the message

Test used an external send system call to p erform the transfer Test used

a send system call to a pro cess loaded into a large window

Test and Test

These tests measured b oth the time taken to load and the time taken to unload an

ob ject from a window of the test pro cesss address space Test used a large

window and Test used a small window

WALNUT KERNEL

Results

The following table summarises the raw results for each exp eriment The measure

ments are in microseconds

Messages

High Low Average Median

Exp byte transfer

K EXTSEND

Total time

Exp byte transfer

K EXTSEND

Total time

Exp byte transfer

K SEND

Total time

Exp byte transfer

K SEND

Total time

Exp byte transfer

K EXTSEND

Total time

Exp byte transfer

K EXTSEND

Total time

Exp byte transfer

K SEND

Total time

Exp byte transfer

K SEND

Total time

Exp byte transfer

Ext Trans

K RECV

Exp byte transfer

Send Trans

K RECV

CHAPTER PERFORMANCE

Address Space Management

High Low Average Median

Exp

Load Large

Unload Large

Exp

Load Small

Unload Small

Program

This program tested the sp eed of system calls which manipulate ob jects The fol

lowing system calls were tested

Make Ob ject

Bank

Destroy Capability

Test to

This test measured the time required to execute a make object system call to p erform

a dep osit and a withdrawal on that ob ject using the bank system call and to destroy

the ob ject using a delete capability system call with the master capability of the

ob ject There was a second delay b etween rep etitions of this test

Test generated ob jects of a single page in size Test made a page ob jects

Test made page ob jects and Test was applied to page ob jects

Results

The following table presents the raw results for each exp eriment The measurements are in microseconds

WALNUT KERNEL

Ob ject Management

High Low Average Median

Exp page ob ject

K MAKEOBJ

Dep osit

Withdraw

K DEL

Exp page ob ject

K MAKEOBJ

Dep osit

Withdraw

K DEL

Exp page ob ject

K MAKEOBJ

Dep osit

Withdraw

K DEL

Exp page ob ject

K MAKEOBJ

Dep osit

Withdraw

K DEL

Program

This program tested the p erformance of pro cess related system calls Sp ecically it

tested the sp eed of the following system calls

make pro cess

bank

unload capability

delete capability

Test

This test measured the times required to execute a make process system call to

p erform a withdrawal on that pro cesss ob ject using the bank system call to unload

the new pro cess from the test pro cesss address space and to destroy the pro cess

CHAPTER PERFORMANCE

using a delete capability system call with the master capability of the pro cess There

was a second delay b etween rep etitions of this test

Results

The raw results for each exp eriment are in the following table The measurements

are in microseconds

Pro cess Management

High Low Average Median

Exp

K MAKEPROC

Withdraw

K UNLOADCAP

K DEL

UNIX

Two programs were written to p erform measurements under UNIX The rst pro

gram contained several separate tests The second program contained a single test

routine Each test consisted of rep etitions of a set of library calls The results

of the tests were written into a numb er les created by the test program

The programs were run on a FreeBSD system which had just b een reb o oted and

had only a single interactive session op erating on it This session was used to run

the two test programs

Program

This program evaluated the times taken to p erform interpro cess communication

tasks le system tasks and memory allo cation tasks

Test and Test

These tests used two pro cesses connected by a pip e to test the time required to

send and receive messages After creating a pip e the test pro cess created a child

pro cess through the use of a fork system call The child sent messages to the parent

UNIX

pro cess through the pip e The times sp ent in the write and read library functions

were recorded Both the write and read calls were op erated in nonblo cking mo de

Test had a message of zero length and Test had a message of bytes

length

Test and Test

These tests used two pro cesses connected by a pip e to test the times required to

send and receive messages After creating a pip e the test pro cess created a child

pro cess through the use of a fork system call The child sent messages to the parent

pro cess through the pip e The times sp ent in the write and read library functions

were recorded A one second delay b etween attempts to transmit a message was

intro duced Both the write and read calls were op erated in nonblo cking mo de

Test had a message of zero length and Test had a message of bytes

length

Test

This tests used two pro cesses connected by a pip e to test the time required to send

and receive messages After creating a pip e the test pro cess created a child pro cess

through the use of a fork system call The child sent messages to the parent pro cess

through the pip e The time taken from entering the write library function to exiting

the read library function was recorded A one second delay b etween attempts to

transmit a message was applied Both the write and read calls were op erated in

blo cking mo de

Test and Test

Test measured the time required to create a zero length le using fopen and the

time required to destroy the le using unlink

Test measured the time required to op en an existing le using fopen and the

time required to close the le using fclose

CHAPTER PERFORMANCE

Tests to

Tests to measured the time required by mal loc to allo cate space and free

to release the allo cated space The tests allo cated and pages of space

resp ectively

Tests to measured the time required by cal loc to allo cate space and free

to release the allo cated space The tests allo cated and pages of space

resp ectively

Results

The following tables show the raw results for each exp eriment The measurements

are in microseconds

InterPro cess Communication

High Low Average Median

Exp bytes

read

write

Exp bytes

read

write

Exp bytes

read

write

Exp bytes

read

write

Exp bytes

transfer

write

File Management

High Low Average Median

Exp Create File

fop en

unlink

Exp Op en Existing File

fop en

fclose

UNIX

Memory Management

High Low Average Median

Exp page

mallo c

free

Exp pages

mallo c

free

Exp pages

mallo c

free

Exp pages

mallo c

free

Exp page

callo c

free

Exp page

callo c

free

Exp page

callo c

free

Exp page

callo c

free

Program

This program evaluated the times taken to create a new pro cess

Test

This test used a fork library call to generate a child pro cess which p erformed an

exit The time required by the parent pro cess to p erform the fork was measured

Results

The following tables list a summary of the raw results for each exp eriment The

measurements are in microseconds

CHAPTER PERFORMANCE

Pro cess Management

High Low Average Median

Exp

fork

Observations

Making direct comparisons b etween the Walnut Kernel and FreeBSD is dicult

as the two architectures supp ort dierent paradigms and op erations The Walnut

Kernels memory ob ject paradigm is not repro duced in UNIX Op erations on ob jects

have some of the characteristics of op erations on les p ersistence and other

characteristics b est mo deled by UNIXs memory management libraries A ma jor

dierence in op eration b etween the two systems was the metho d of acquiring the

current time Under FreeBSD a system call was required to get the current time

whereas under the Walnut Kernel the current time was available as a bypro duct

of any system call The use of intrusive measurement on UNIX and nonintrusive

measurement on the Walnut Kernel made the duplication of some of the b ehaviors

of the test programs impractical As a result only broad comparisons can b e made

b etween the systems

alnut Kernel Behavior W

Every seconds a pro cess is required to rebuild its private and top level page tables

and the pro cess also invalidates all its windows When the pro cess next accesses a

memory lo cation the capabilities for that window is revalidated Intro ducing a delay

b etween tests reduces the numb er of op erations p er second and increases the eect

of these overheads p er op eration p erformed

Messages IPC

An unexp ected feature of the measurements of the Walnut Kernels message delivery

EXTSEND p erformed b etter than K SEND This raises the p os system was that K

sibility of improving the message transfer p erformance of the system by optimising

K SEND The transfer of data to a pro cess which is already loaded into the senders

OBSERVATIONS

address space should require fewer checks on the accessibility and validity of the

target pro cess In addition the transfer of data should b e simplied as the target

page is loaded into the pro cesses address space

Write External Send Performance

High

Average

Median

Low

uS

U Exp U Exp WK Exp WK Exp

Figure Comparison of External Send to Write Op erations

SEND and write times for the two systems The Figure compares the K

message transfer p erformance of the two systems is similar The graph indicates

that the p erformance of the external send op eration is typically faster than that of

the write op eration and that the worst cases encountered for the Walnut Kernel

were signicantly b etter than those encountered by FreeBSD

Figure compares the time taken to complete a transfer of data b etween two

pro cesses The Walnut Kernel p erforms signicantly b etter than FreeBSD in this

test The worst case p erformance of IPC using pip es under UNIX is signicantly

worse than for the equivalent op eration under the Walnut Kernel

Management Address Space Management File

The typical p erformance of fopen and load capability op erations to b e similar as

shown in gure However the worst case p erformance of the Walnut Kernel load

capability large op eration is approximately twice the cost of the worst case of fopen

CHAPTER PERFORMANCE

Transfer Performance

High

Average

Median

Low

uS

U Exp WK Exp

Figure Comparison of Transfer Time Between Two Pro cesses

Although the typical sp eed of p erforming an unload op eration is ab out larger

than that of fclose the worst case p erformance of the Walnut Kernel is b etter than

UNIX

Ob ject Management File Management

The creation of ob jects under the Walnut Kernel is approximately equivalent to the

creation of les under UNIX as ob jects and les represent the units of p ersistent

storage found in the two systems Figure compares the p erformance of the two

systems in this area

Although the time required by the Walnut Kernel to create an ob ject is dep en

dent on the size of the ob ject and the initial numb er of capabilities required ob ject

creation time seems to b e signicantly faster than le creation time under FreeBSD

The destruction of ob jects is approximately equivalent to the deletion of les

under UNIX as b oth destroy items in the p ersistent store In gure the Walnut

Kernel is shown to p erform b etter in terms of returning more quickly than FreeBSD

However it should b e noted that although this prevents new Walnut Kernel pro

cesses loading the ob ject immediately pro cesses which currently have the ob ject

OBSERVATIONS

Fop en Load Performance

High

Average

Median

Low

uS

U Exp WK Exp WK Exp

Fclose Unload Performance

High

Average

Median

Low

uS

U Exp WK Exp WK Exp

Figure Comparison of File Op eration Times to Ob ject Op eration Times

CHAPTER PERFORMANCE

File Creation Ob ject Creation Performance

High

Average

Median

Low

uS

U Exp WK Exp WK Exp WK Exp WK Exp

Figure Comparison of File Creation Time to Ob ject Creation Time

loaded will retain access for seconds

Pro cess Management

The fork op eration to the make process op eration are compared for purp oses of

completeness The comparison yields little useful information as the result of a fork

is to pro duce a copy of an existing pro cess whereas the make process op eration

pro duces a new pro cess The closest approximation of the Walnut Kernels make

process oered by UNIX is a fork followed by an exec Unfortunately as the exec

system call do es not return it is not p ossible to measure the sp eed of that system

call using user level programs Figure compares the two op erations

Conclusion

Where op erations were comparable the Walnut Kernel p erformed as well or b etter

than the comp eting FreeBSD system The Walnut Kernels p o orest p erformance

when compared to UNIX was in the generation of new pro cesses However this

comparison is of limited signicance as the Walnut Kernel generates a new pro cess

CONCLUSION

File Deletion Ob ject Destruction Performance

High

Average

Median

Low

uS

U Exp WK Exp WK Exp WK Exp WK Exp

Figure Comparison of File Deletion Time to Ob ject Destruction Time

Make Pro cess Fork Performance

High

Average

Median

Low

uS

WK Exp U Exp

Figure Comparison of Fork to Make Pro cess Time

CHAPTER PERFORMANCE

with diering contents the equivalent of a fork followed by an exec under UNIX

and b ecause UNIX returns as so on as it is clear that there are adequate resources

for the new pro cess to b e created unlike the Walnut Kernel which returns as so on

as the new pro cess is ready to execute

Chapter

User Level Programs

This chapter describ es programming techniques used for application programs and

applications that have b een written to op erate under the Walnut Kernel These

applications have allowed programmers to explore the p ossibilities oered by a

capabilitybased op erating system and provided feedback to the op erating system

designers This feedback has resulted in changes to the design of the kernel

Four programs are describ ed

Initpro c the initialization pro cess

Glui a screen multiplexor

Shell a user shell

Wyrm an arcade style game

Initpro c is resp onsible for deriving capabilities used by pro cesses which manage

access to devices Shell and Glui form a user level interface which allows access to

the functions of the kernel and to ob jects within the system The game Wyrm is an

example of a highly interactive application which demands fast resp onse times from

the system and is IO b ound

Program structures and data structures used in the applications are describ ed

in section Section describ es enhancements to the Walnut Kernels pro cess

1

The ma jority of the material contained in this chapter is repro duced from CPW

CHAPTER USER LEVEL PROGRAMS

structure to assist the implementation of shared libraries Sections to discuss

the application programs

Structures

This section describ es a numb er of common structures found in programs op erating

under the Walnut Kernel It addresses b oth organization of programs and data

structures

Program Structures

Walnut Kernel programs are similar to programs which are implemented under

GUIs in that b oth typ es of programs resp ond to external events GUIs provide

two constructs for handling events

Message Lo ops are a lo op which contains a call to a function that accepts an

event from a queue of events and then calls a function to handle the event

Callback Functions are registered with the user interface and are invoked with a

set of parameters when an event o ccurs Callback functions are used to handle

asynchronous events

The Walnut Kernel supp orts constructs which p erform similar tasks but are imple

mented dierently

while true

b egin

wait

receivemsg

functionmsg server

end

Figure Pseudo co de for a Message Lo op

The Walnut Kernel typically handles messages by using a message lo op see

gure This simple construct places the pro cess or subpro cess into a sleep

STRUCTURES

state until a message arrives receives a message handles the message and returns

the pro cess to a sleeping state As a pro cess cannot sleep when there are messages

waiting for it the message lo op can handle multiple messages without the need to

test for the presence of a message b efore going to sleep

Asynchronous events which would b e handled with a call back function under a

GUI are implemented through the use of subpro cesses A subpro cess is a thread of

control within a pro cess to which a message can b e sp ecically addressed When

a message arrives for a subpro cess the subpro cess is made executable Typically a

message lo op is used to receive and handle the message b efore putting the subpro cess

back to sleep

Data Structures

Persistence sharing and relo cation shap e the typ es of data structures in common

use under the Walnut Kernel

File oriented op erations are typically p erformed on a stream of data converting

the contents of an input stream to an output stream Persistent data structures

do not require conversion to and from a secondary storage format eliminating the

stream orientation imp osed by the le mechanism In addition programmers are

able to p erform random access op erations on input and output data structures

without the overheads that would b e present on a stream oriented system The

absence of these constraints provides a new degree of freedom in the design of data

structures

A hash table is an example of a data structure that b enets from a p ersistent

implementation On a p ersistent system the hash table is stored in a directly usable

form This can b e contrasted with a le oriented system which has the choice of

extracting the data from the table and storing it in a linear form or storing the

table as a blo ck of memory dump ed to disk The former requires either a complex

transform on the data to recover it in the correct order for storage or an additional

data structure that keeps track of the order in which data should b e stored The

latter approach requires the table to b e read at the b eginning of the program and

written at the end of the program intro ducing a signicant IO overhead

CHAPTER USER LEVEL PROGRAMS

The easy sharing of data requires programmers to b e aware of synchronization

access control and lo cking issues Currently systems programmers work in an envi

ronment where sharing considerations are imp ortant Application programmers need

to b ecome aware of the issues and techniques for managing shared data Provision

must b e made in shared data structures for collective access to the data structure

This may include cho osing data structures that allow simultaneous access circular

buers or employ lo cking

Programmers have a choice of loading an ob ject at a xed address or allowing

the loading of an ob ject at an arbitrary address If an ob ject is always lo cated

at a xed address p ointers may b e used within the ob ject to refer to other parts

of the data structure This arrangement has the advantage of sp eeding references

within an ob ject However it causes a loss of exibility and may restrict the sharing

of ob jects This is b ecause programs will only b e conveniently able to load one

ob ject at a time that o ccupies a set of p oints in the address space Relo catable

ob jects use index values to refer to parts of the ob ject This requires an addition

op eration b efore a dereference op eration can b e p erformed resulting in a p otential

loss of p erformance

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Read

.

.

.

.

.

.

.

.

.

.

.

.

.

Write Write

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Read Read

. rite W

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Partly Full Empty Full

Figure A Circular Buer

Circular buers see gure are used to transfer stream oriented information

b etween pro cesses Two implementations are used

A minimal implementation is used by character mo de devices such as serial

p orts and the keyb oard to communicate with their manager pro cesses

LEGACY CODE

An optimized version is used for interpro cess communication

Both circular buer implementations do not require lo cking but ensure that

data is correctly transfered from the sender to the receiver They op erate by giving

the sender readwrite access to the writep ointer and readonly access to the read

p ointer The receiver has readwrite access to the readp ointer and readonly access

to the write p ointer This eliminates contention over up dating the p ointers The

data structure op erates safely even if information relating to the p osition of the

other p ointer is old The data structure has a minor ineciency in that there is

always a single wasted slot when the data structure is full

In the more ecient implementation b oth the sender and receiver have a private

p ointer known as the tripwire The tripwire is set to p oint to either the value of

the other p ointer or the top of the buer Before sending or receiving an element

from the circular buer the value of the p ointer is compared against the tripwire to

determine if there is the risk of overlling the buer or crossing the end of the buer

This mechanism saves the cost of a comparison on most accesses to the buer by

converting the separate tests for overlling and wrapping from top to b ottom of the

buer into a single test If the comparison indicates that either of the b oundary

conditions has b een reached further tests are carried out to determine which of the

two conditions caused the problem and the tripwire is set to a new p osition

Legacy Co de

A library has b een constructed that emulates many of the functions found in the C

stdio library This library has two roles

It allows the reuse of a large quantity of existing C co de reducing development

eort

It provides an environment that is familiar to a large range of programmers al

lowing them to use existing skills while learning ab out the features the Walnut

Kernel environment oers them

Under the emulation library les and streams are implemented using the same cir

cular buer co de Files gain no advantage from b eing implemented using circular

CHAPTER USER LEVEL PROGRAMS

buers however there is no p erformance p enalty either By cho osing to implement

the two mechanisms in the same way co de volume is reduced and co de maintenance

is simplied

Shared Libraries

One of the ma jor ob jectives of the Walnut Kernel was to encourage sharing of co de

and data Section discussed mechanisms which allow the sharing of data This

section describ es existing mechanisms for the sharing of co de

The executable co de may b e either relo catable or nonrelo catable Relo catable

co de uses relative addressing to reference other parts of the library mo dule Non

relo catable co de is linked using absolute addresses for all references Relo catable

co de simplies the implementation of shared libraries In the absence of relo catable

co de it is necessary either to force mo dules to b e always loaded at the same address

or create multiple versions of a mo dule at diering addresses

Several classes of data may b e required by shared libraries emb edded data

shared data and data lo cal to an instance of the co de Emb edded data consists of

literal constants compiled into the co de Shared data is accessible to all invo cations

of a library and may b e seen and mo died by each instance A variant on shared

data is constant data shared by all instances but typically not mo died Data lo cal

to an instance of the co de is private to that instance and is typically not accessible

to other instances of the co de

The current implementation of the Walnut Kernel supp orts relo catable co de

The passwordcapability mo del is well suited to supp orting the ma jority of the

classes of data Emb edded constants can b e protected from alteration by not pro

viding the write systemright on capabilities given to users Sharing of data is

achieved by loading capabilities Providing mechanisms which supp ort data lo cal to

an instance is less straight forward

2

This taxonomy of memory typ es is based on work conducted for the Monads pro ject Geh

3

For the purp oses of this discussion an instance of the co de is created by loading a piece of

library co de into the address space of a pro cess If the co de is multiply loaded into a pro cess there

are said to b e multiple instances of the co de

SHARED LIBRARIES

Section describ es a numb er of p otential approaches to supp orting shared

libraries The remainder of this section addresses current mechanisms and the mo d

ications to the pro cess structure required to assist their implementation

xffffffff

Private Data Pointer Table

Stored in Pro cess Ob ject

Find Capability Index

Pro cess Ob ject

Library fn Co de Data

x

Pro cess Address Space

Library fn Co de Data

Figure Implementation of Lo cal Storage for Shared Library Co de

To supp ort data lo cal to a pro cess a mo dule must b e able to create an ob ject

load the ob ject into the address space of the pro cess and b e able to lo cate the

ob ject to allow the mo dule to read and write the mo dules lo cal data There are

two varieties of solutions The rst variety involves structuring the address space of

the pro cess so that either mo dules are always loaded in the same place or providing

a xed relation b etween the lo cation of the ob ject used to store lo cal data and the

co de ob ject Both these mechanisms place restrictions on the layout of memory

The second variety of solution is to store the address of the ob ject containing lo cal

data in a lo cation known to the mo dule This app ears to b e a catch situation as

the mo dule requires a private lo cation to b e able to store the address of its lo cal data

CHAPTER USER LEVEL PROGRAMS

ob ject This paradox can b e avoided by providing a table stored by convention in

the pro cess ob ject The mo dule has access to the address its co de is loaded at and

uses the address to index into the table This provides a small private store readily

accessible to a shared library mo dule The table is typically used to store p ointers

to a lo cal data ob ject The mechanism is illustrated in gure

A table indexed by the addresses of lo cations where ob jects can b e loaded is

relatively large An enhancement which makes more ecient use of space is to use

the Capability Index for the window in which the co de is running Capability indices

range from to keeping the table down to a manageable size Although it is

CAPID system call to nd the index value for an address it was p ossible to use the K

considered to o inecient to use a system call for a p otentially frequent op eration

To assist in the rapid translation of addresses to Capability Indices the pro cess

structure was altered to make the Address Map readable This allowed a short

segment of assembly co de see gure to b e used to rapidly translate addresses

to index values

Initpro c

When a Walnut Kernel is b o oted it generates an ob ject known as the system ob ject

This ob ject contains all the memory pages o ccupied by kernel co de kernel data and

device driver interfaces and buers The initialization pro cess derives capabilities

from the system ob ject used by the pro cesses which manage devices The restrict

op eration is then applied to the master capability This op eration removes rights

asso ciated with a capability without aecting the rights of the children of the ca

pability This eliminates a p otential security hole asso ciated with the existence of a

capability allowing unfettered access to the kernel and device interfaces After deriv

ing the set of less p owerful capabilities Initpro c noties the scheduler that it is safe

to schedule other pro cesses and sends messages to all manager pro cesses containing

the capabilities they require to access the devices they manage Initpro c completes

its op eration by entering a message lo op and waiting for a message indicating that

the system is to b e recongured

Initpro c illustrates a numb er of features of programming under the Walnut Ker

INITPROC

codecapindex globl

PROCHDADDRESS x

MAPADDRESS xf

text

align

addr callingaddressmagic

if addr PROCHDADDRESS

addr xfff uqp MAPADDRESS

else

uqp MAPADDRESS addr xfff

return uqp

codecapindex

movl esp eax

pushl ecx

cmpl PROCHDADDRESS eax

jge bigwin

smallwin

movl ecx

sarl cl eax

andl xfff eax

jmp both

bigwin

movl ecx

sarl cl eax

andl xff eax

both

movb MAPADDRESSeax eax

andl xff eax

popl ecx

ret

Figure Find Capability Index for Executing Co de

CHAPTER USER LEVEL PROGRAMS

nel however the pro cess is unique among Walnut Kernel pro cesses in that it is

restarted from a xed address each time the kernel is b o oted Apart from always

starting Initpro c from a xed address the kernel provides no sp ecial functions to

supp ort this co de Thus all of Initpro cs co de op erates at the user level requiring no

sp ecial kernel supp ort or privileges All other pro cesses resume their op erations from

the p oint at which they were stopp ed when the system was shutdown Furthermore

as the system ob ject is stored in volatile storage the system ob ject do es not retain

information ab out the capabilities applying to it over a reb o ot The initialization

pro cess is resp onsible for remaking the capabilities used by the manager pro cesses

b efore allowing other pro cesses to b e scheduled

The kernel scheduler monitors a word in the Wall When the word b ecomes non

zero the kernel scheduler allows the scheduling of any runnable pro cess Initpro c

derives a capability for the Wall from the system ob ject This capability is sent to

the Wall manager and used by Initpro c to notify the scheduler

In addition to the easy sharing of data demonstrated by the ab ove application

p ersistence is also exploited in Initpro c The derivative capabilities generated from

the system ob ject are stored in an array When Initpro c is restarted following a

shutdown it examines this array and generates derivative capabilities with the same

name and rights as those found in the array b efore restarting the scheduler This

simplies the design of the manager pro cesses as the capabilities given to managers

by Initpro c app ear to p ersist over the reb o ot Holders of derivatives of capabilities

distributed by Initpro c will nd that those capabilities no longer work

Glui

Glui is the manager pro cess for the screen and the keyb oard It provides several

stream mo de interfaces to the keyb oard and screen A series of keystrokes are used

to switch b etween sessions In addition Glui supp orts a mechanism for giving direct

access to the screen memory for a numb er of pro cesses Like Initpro c Glui functions

using system calls available to all pro cesses

When the Walnut Kernel is b o oted Initpro c sends a message with a capability

for the resources managed by each manager pro cess On receipt of the messages

GLUI

Hardware Device Drivers Glui Apps

Screen Images

Output Screen Buer

Inbuilt VT

VT

Screen mem mapp ed

Buer

Emulator Emulator

Inbuilt VT

Keyb oard Keyb oard Buer Input Input

Emulator

Scanner Buer

Figure Keyb oard and Screen IO

containing capabilities for the keyb oard the screen the VT emulator built into

the kernel and the Wall Glui creates virtual screens and derives a capability

which allows messages to b e sent to Glui This capability is then placed on the

Wall

The screen and keyb oard IO architecture of the Walnut Kernel is illustrated in

gure To provide terminal multiplexing facilities Glui intercepts all keyb oard

input scanning for control sequences If no control sequences are found the keyb oard

input is placed in the input buer for the application currently b eing displayed on

the screen The output buer of the current application is p olled p erio dically If

new information is found in the buer it is passed to the VT emulator co de

built into Glui This emulator writes its output directly to the memory mapp ed

screen buer To move the cursor and sound the b ell Glui passes control co des to

the VT emulator built into the device drivers

To change the display to another application Glui stops accepting input from

the current client program The current contents of the screen are copied to a buer

asso ciated with the current application This buer is lo cated within Glui and is not

made accessible to other programs The buer corresp onding to the new application

CHAPTER USER LEVEL PROGRAMS

is copied to the screen and the output buer of the new application is read to up date

the screen Keyb oard output is directed to the input buer of the new application

Glui supp orts two typ es of output services

a VT emulator

the hardware screen buer

When a pro cess requires IO through the VT emulator and keyb oard it sends

a message to Glui using the capability on the Wall If there is a virtual screen

available Glui sends a message back which contains the capability for a keyb oard

buer and a screen buer These buers use the circular buer proto col discussed

in section The pro cess requesting the screen may send data containing VT

screen control sequences via the output stream Input is received via the input

stream

When direct access to the hardware screen buer is requested the pro cess must

supply for itself a capability that allows the pro cess to b e frozen If this capability

is not provided or do es not allow Glui to send the freeze message the request will

b e rejected If a suitable valid capability is supplied a capability without SRMUL

TILOAD right and with a password equivalent to the requesting pro cesss serial

numb er is returned to the requesting pro cess When loaded by the the request

ing pro cess this capability allows direct access to the screen buer however this

capability cannot b e loaded by any other pro cess

Protected freeze and thaw are used on the pro cesses granted direct access to

the memory mapp ed screen buer This prevents other pro cesses from thawing a

pro cess with a usable capability for writing to the screen Glui is able to ensure

that only one pro cess writes to the screen at a time preventing corruption of the

screens contents

Both the SRMULTILOAD right and the protected versions of freeze and thaw

were intro duced to enable Glui to allow controlled direct access to the hardware

screen buer Other solutions were considered including lo cking pro cesses AW

and schemes for the rapid revo cation of capabilities

Under the Walnut Kernel a pro cess is lo cked when it is created with a bit

SHELL

lo ckword This lo ckword is XORed with each alter capability b efore the capability

is used by the kernel The pro cess can only use capabilities which have b een XORed

with the lo ckword and then b e passed to the pro cess This prevents a lo cked

pro cess from communicating with other pro cesses without the assistance of a party

who knows the lo ckword value This mechanism was considered but lo cking severely

curtailed the ability of the client program to communicate

Although a numb er of rapid revo cation schemes were considered the generaliza

tion of these schemes to a multipro cessor environment either resulted in a mecha

nism insuciently resp onsive or required an unacceptably high overhead to supp ort

a relatively infrequent op eration

Shell

Shell is a command interpreter It provides mechanisms for managing ob jects or

ganizing les generated through the stdio emulation co de and launching programs

Shell has detailed information relating to the structure of a pro cess which follows

the conventions adopted for the Walnut Kernel

When Shell is rst started it sends a message to Glui requesting a terminal

emulator output buer and a keyb oard input buer On receipt of these capabilities

it presents the user with a prompt and awaits further instructions Users can run

pro cesses in two mo des

the screen to the new pro cess The input buer and output buer Yielding

used by Shell are given to the new pro cess for its use until the new pro cess

terminates

Creating a new screen for the new pro cess The shell requests a new set of

buers from Glui which are given to the new pro cess

The two mo des dier in several resp ects When the new pro cess is to inherit the

screen from Shell the buers are made available to the new pro cess and the shell

4

A nonalter capability do es not p ossess write rights and cannot b e used to transfer information

to another pro cess Alter capabilities can b e used to transfer information

CHAPTER USER LEVEL PROGRAMS

go es into a lo op which p olls the new pro cesss status Shell ignores the contents

of the buers and do es not take any command input until it detects that the new

pro cess has ceased to function Shell then resumes using the buers and accepting

commands When the screen is not inherited from Shell a set of buers is requested

from Glui and made available to the new pro cess Shell continues to interpret

the input from the keyb oard and remains active on the screen that it is currently

connected to

Two mechanisms were intro duced into the Walnut Kernel to allow pro cesses to

determine the state of another pro cess

Co o ee Messages are sent to a pro cess and results in a Co o ee reply message b eing

sent to a capability sp ecied in the co o ee message The Co o ee reply message is

automatically generated by the kernel and contains a eld indicating whether

the pro cess is running frozen sleeping or dead

Peek System Call returns a value which indicates whether the pro cess is running

frozen sleeping or dead

The Co o ee message was intro duced rst however p olling pro cesses to determine

their state proved to b e useful and p opular so the more ecient p eek mechanism was

provided The p eek mechanism has the advantage of a signicantly lower overhead

as it requires only a single system call and the message passing mechanism is avoided

A pro cess ob ject conforming to the Walnut Kernel conventions contains

Startup Co de Area optional This area may contain a small amount of co de

used in starting a pro cess

File Descriptor Table mandatory This area contains the le descriptors for use

by the pro cess Note The rst elements of the File Descriptor Table are

mandatory to allow for standard output standard input and standard error

The entries in this table are used by the Unix emulation library

Private Data Pointer Table mandatory This area contains p ointers to private

data The table is indexed by the capability index of the executing co de and

is used to lo cate data used by the executing co de

WYRM

Default Heap optional The default lo cation for the creation of the heap

Default Stack optional The default lo cation for the creation of the stack

To start a pro cess the shell takes a co de ob ject and a data ob ject for the program

to run in the new pro cess The data ob ject is duplicated A new pro cess is made

by invoking the kernel The capabilities for the co de ob ject and the duplicate data

ob ject are passed to the kernel as autoload capabilities the stack p ointer and

program counter are set and the wakeup time for the new pro cess is set to forever

After the new pro cess is created Shell mo dies the pages of the ob ject loaded into

the shells address space Shell writes into the le descriptor table the capabilities for

the new pro cesss standard input output and error and any other le descriptors

that are required The command line arguments and a capability for the ob ject

containing the pro cesss environment strings are written into the heap space A

message is sent to the pro cess to wake the pro cess up

The metho d used to create pro cesses allows multiple copies of a program to b e

run simultaneously The scheme is economical of b oth disk space and memory space

as it shares a single image of the co de The data is duplicated to prevent multiple

copies of a program interfering with each other

Wyrm

Wyrm is an arcade style game inspired by the games nibblesCor and wormToy

Apart from its frivolous value Wyrm has b een used to test the resp onsiveness of

the interface and a numb er of IO mechanisms

The current version of Wyrm makes use of the Unix emulation stream IO co de

to communicate via standard IO with Glui which draws the parts of the game on the

screen This version is highly resp onsive and shows that the two layers of software

5

Autoload capabilities are automatically loaded into the address space of a pro cess when the

pro cess is created

6

A wyrm is a mythical creature of great p ower The game was sarcastically named wyrm

b ecause of its lack of sp eed After tracing a numb er of implementation problems in the stream IO

co de Wyrm now proudly lives up to its name

CHAPTER USER LEVEL PROGRAMS

provided by the existing IO structure are suciently quick for highly interactive

applications

had Early in the development of the Unix emulation libraries a misplaced ush

caused us to b elieve that the system p erformance was inadequate At that time a

version of Wyrm which made direct use of the screen was written in an attempt to

determine where the b ottleneck lay This resulted in changes in the design of the

kernel and Glui to correctly supp ort the sharing of memory mapp ed buers

Chapter

Security

The Walnut Kernel diers from the PasswordCapability System in a numb er of

resp ects This chapter outlines the eect of these dierences on the security of the

Walnut Kernel The PasswordCapability System required that ob jects b e continu

ous and that there b e sucient storage to contain the ob ject when the ob ject was

created The Walnut Kernel has paged ob jects which may have unallo cated pages

within the ob ject Other signicant dierences include the intro duction of the re

strict op eration by the Walnut Kernel serial numb ers having a physical meaning

capabilities with nonrandom passwords the SRMULTILOAD right and the pro

tected freeze and thaw op erations

Ob jects

The Walnut Kernel uses three parameters to describ e the storage and address space

requirements of an ob ject The PasswordCapability System used a single parameter

to describ e these features of an ob ject The separation of storage requirements

of an ob ject from the size of the address space used by the ob ject was aimed at

overcoming the limitations imp osed by having a page sized protection granularity It

also provided an opp ortunity to intro duce a new degree of freedom into the Walnut

Kernel not found in the PasswordCapability System Figure compares the

structures of the two systems ob jects

The decoupling of the address space requirements from the storage space re

CHAPTER SECURITY

Walnut Kernel PasswordCapability System

Top of

Highest

Referenced

Page

Max

Limit

Size

Oset

Origin

Origin

Size

Undened pages within ob ject Contents of ob ject

Dened Pages within ob ject

Potential Undened Pages

Size Guaranteed available disk space

Figure Comparison of Walnut Kernel and PasswordCapability System Ob jects

OBJECTS

quirements increases the exibility of the ob ject mechanism In the Walnut Kernel

several uses have b een prop osed that make use of this exibility These applications

include the construction of singleob ject pro cesses and the storage of source co de

and its derived ob ject co de in the same ob ject Compiler designers tend to use a

large address space and p opulate it with widely separated co de and data segments

Compilers using this approach on the PasswordCapability System would have com

mitted a large amount of system disk resources In contrast the Walnut Kernel

requires fewer resources

The second suggested application places co de into the same ob ject as the com

piled output This mechanism ensures that source co de is available to maintenance

programmers The wide separation of the starting address of the source co de from

the compiled ob ject co de eliminates the need to relo cate co de on co de growth This

simplies the task of lo cating the start of the sections of an ob ject and allows the

ob ject to b e easily partitioned into source and co de segments using the capability

mechanism

The mechanism of assigning the pages of an ob ject when the page was rst

accessed intro duced a secret channel The channel resulted from the changing of

the state of an ob ject when a read op eration was carried out on memory accessed

through a readonly capability To transfer information through this channel the

following preconditions must b e fullled

All the pages of the volume must b e allo cated to ob jects on the volume

There must b e a known numb er of unassigned pages in the ob ject through

which the information is to b e conveyed and the lo cation of unassigned pages

must b e known

The transmitter of the message sends it by reading from previously unassigned pages

This results in the pages b eing assigned After the assignment has taken place the

receiver of the message determines the numb er of pages left by reading from a

collection of unassigned pages until a fault results from attempting to allo cate a

1

The collection of pages used by the receiver must b e disjoint from the set used by the transmitter

CHAPTER SECURITY

page The addressfault exception handler of receiving pro cess is invoked and the

numb er of pages the receiver has managed to access corresp onds to the message

The existence of a new secret channel has b een demonstrated However the

transmission mechanism can b e easily frustrated by either freeing space on the vol

ume restricting information relating to the assignment of pages in shared ob jects

or ensuring that readonly capabilities are assigned only to sections of ob jects which

are completely assigned

Restrict

The security analysis of the PasswordCapability System was based on a mo del

which required child capabilities to b e no more p owerful than their parents The

intro duction of the restrict op eration allows parent capabilities to b e made less

p owerful than their children The Walnut Kernel mo died the requirement on child

capabilities to apply only at the instant of creation

The initial motivation for restrict was to provide a mechanism which allowed

the kernel memory ob ject to b e made available safely to the initialization pro cess

When the Walnut Kernel is started an ob ject is created which covers the kernel

program and data areas The initialization pro cess derives smaller views from the

kernel memory ob ject which contain buers used to communicate with hardware

devices and kernel variables which may b e used to send information to pro cesses

or allow p ossessors of the capability to p erform restricted functions Possession

of the master capability for the kernel memory ob ject allows the holder complete

control of the system Early versions of the Walnut Kernel used a capability with a

well known value for the kernel memory ob ject The restrict call was intro duced to

allow the initialization pro cess to remove all the rights of the kernel ob jects master

capability after the smaller views were derived

2

This mechanism has b een prop osed as a metho d of allowing the kernel to b e notied that the

system should b e shutdown

3

The current system passes the value of a randomly selected master capability to the initial

ization pro cess The restrict call is used at the completion of derivation as an additional security measure

SERIAL NUMBERS

The restrict op eration reduces the rights of a capability by p erforming a bitwise

and of the rights mask supplied with the rights bitmap of the capability The restrict

op eration will only op erate on capabilities which have suicide right

To show that the Walnut Kernel retains the security prop erties of the Password

Capability System it is necessary to show that the restrict op eration intro duces no

b ehaviors that cannot b e achieved under the PasswordCapability System The re

strict op eration places a system enforced limitation on the usage of a capability The

limitation is equivalent to a program on the PasswordCapability System voluntarily

foregoing the use of some of the rights conveyed by a capability Thus the Wal

nut Kernel is no more sub ject to rights amplication than the PasswordCapability

System

App endix B contains a formal description of the eects of the restrict op eration

Serial Numb ers

The Walnut Kernel uses the less signicant bits of the serial numb er to lo cate the

header page of the ob ject The remaining bits of the serial numb er are randomly

allo cated see gure When an ob ject is referenced the blo ck numb er of the

header page is extracted by p erforming a bitwiseand of the DiskBlo ckMask and

the presented serial numb er The header page is accessed and the presented serial

numb er is compared with the serial numb er stored in the header page If they match

the op eration which made reference to the ob ject is allowed to continue If there is

no match an error is returned indicating that the capability is invalid This diers

from the mechanism in the PasswordCapability System where serial numb ers of

ob jects were allo cated randomly and a table was used to convert the serial numb er

to a reference to the physical representation of an ob ject

This change has the p otential to reduce the security of the system by reducing

the size of the name space that an individual must search to nd a valid capability

In practice the mechanism has a minimal eect on system security It can b e shown

that the impact of this change is equivalent in the worst case to reducing the serial

numb ers length by bit The pro of follows

The random bits of the serial numb er are unpredictable hence those bits do not

CHAPTER SECURITY

B RM S

B Blo ck Numb er of Header Page

M Disk Blo ck Mask

R Random Numb er

S Serial Numb er

Figure Comp onents of the Walnut Kernel Serial Numb er

reduce the size of the space which must b e searched

The user has limited knowledge of the state of the physical device The user can

only determine the header blo cks which have b een allo cated to capabilities the user

already knows This is the same level of knowledge as found in the password capa

bility system Furthermore as the system op erates ob jects are created destroyed

enlarged and shrunk This results in a continually changing set of candidate header

blo cks Selection from the changing random p o ol results in a random series of p o

tential header blo cks

Unless the disk holds exactly a p ower of two blo cks the most signicant bit of

the blo ck numb er will not b e biased towards zero The worst case would o ccur when

there is only a single blo ck numb er with a serial numb er in which the top bit is set

This would result in reducing the search space by a bit

Assuming the worst case the serial numb er is bits in length Although the

serial numb er of an ob ject was never regarded as a secret a item search space

for serial numb ers is suciently large to make random probing for capabilities not

viable

Passwords NonRandom

The Walnut Kernel intro duced a mechanism for creating capabilities with passwords

sp ecied by the pro cess p erforming the derive op eration This mechanism was

intro duced to allow the initialization pro cess to create capabilities derived from the

4

The top bit of the serial numb er is used to mark alter capabilities

SRMULTILOAD RIGHT

kernel memory ob ject with known passwords The kernel memory ob ject has the

unique feature of retaining no state over a system reb o ot All other ob jects reside on

a physical medium allowing the state of the ob ject to remain in existence until the

ob ject is destroyed Accordingly all the capabilities derived from the kernel memory

ob ject are forgotten whenever the system reb o ots Without the ability to generate

capabilities with known passwords all pro cesses using capabilities derived from the

kernel memory ob ject would after a reb o ot have to handle the loss of access to

the view provided by that capability The initialization pro cess regenerates the

capabilities for the kernel memory ob ject after each reb o ot b efore allowing other

pro cesses to restart Regenerating the capabilities simplies the pro cesses which

interface with the kernel memory ob ject

The ability to create capabilities with nonrandom passwords is essential to allow

capabilities without the SRMULTILOAD right to p erform a useful role

The creation of capabilities with known passwords has a minimal eect on the

security of the system As the random password mechanism remains the user is

still able to create capabilities which have a full range of password values providing

the same level of security as in the PasswordCapability system In addition the

nonrandom mechanism can b e used with no impact on system security in the two

following roles

A capability can b e made available to a wide range of users by creating a

capability with a known name Using a known name is equivalent to widely

advertising a capability with a random password

Deleted capabilities can b e replaced by a p ossessor of a suciently p owerful

capability Accordingly those who knew the original name of the capability

can make use of the new capability created with the same passwords entailing

no further spread of information

SRMULTILOAD Right

The SRMULTILOAD right was intro duced to allow capabilities to b e created which

could only b e used by a sp ecied pro cess The screen manager uses this mechanism

CHAPTER SECURITY

The screen manager acts as a screen multiplexer supp orting several text screens

The user can switch b etween screens by using a hot key sequence

To sp eed up access to the screen and allow greater control over the screen the

manager can grant direct access to the screens video buer to a pro cess However

the screen manager needs to b e able to ensure that only one pro cess can exercise

write access to the screen buer at a time When the hot key sequence is pressed the

screen manager uses a protected freeze to prevent the pro cess with a capability to

the screen buer from executing and switches control of the screen to a new pro cess

In this example the screen buer capabilities given out by the screen manager do

not have SRMULTILOAD right so only the designated pro cess can use them Other

pro cesses cannot use the capability so if the capability is given to a third party the

screens contents cannot b e altered through the use of that capability

Capabilities without SRMULTILOAD right can only b e used by pro cesses whose

serial numb er is equal to the value of password of the capability

This mechanism allows only a decrease in the numb er of p otential users of a

capability thus allowing tighter control over the use of capabilities As it restricts

the p otential domain of use of a capability it do es not decrease the security of the

system

reeze and Thaw Protected F

The protected freeze and thaw mechanism is an enhancement of the freeze and

thaw mechanism that allows a pro cess to ensure that a pro cess under its control

cannot awake unexp ectedly through the action of another pro cess

The screen manger example section used for the SRMULTILOAD right

also employed a protected freeze and protected thaw The motivation for the in

tro duction of this new mechanism was that the original freeze and thaw mechanism

could not prevent a third party pro cess from waking up a pro cess holding a capability

for a screen buer and hence altering the contents of the screen

The protected freeze mechanism and thaw mechanism uses a magic numb er to

prevent a pro cess from b eing thawed unless all the pro cesses which used a protected

freeze on a pro cess have thawed the pro cess The magic numb ers are XORed together

PROTECTED FREEZE AND THAW

and a pro cess is only runnable when the numb er of protected freezes equals the

numb er of protected thaws and the pro duct of the XOR op erations is zero

This mechanism provides an enhancement on an existing mechanism The basic

prop erties of the mechanism do not aect the security mo del of the Walnut Kernel

The mechanism has enhanced the security of the system by allowing a pro cess to

exert control over the execution of another pro cess without the risk of a third pro cess

overriding the inhibition of execution

CHAPTER SECURITY

Chapter

Prop osed Hardw are

This chapter describ es a prop osal for a novel hardware environment on which the

Walnut Kernel may op erate The prop osal presented here was originally jointly

develop ed by Dr Ronald Pose of the Department of Computer Science at Monash

University and myself in A pap er CP outlining the design of a no de of

the multipro cessor and a mechanism for avoiding the problems asso ciated with a

global system clo ck was presented to the Seventeenth Annual Computer Science

Conference This pap er is repro duced in app endix C Further work based on this

prop osal has since b een undertaken by Dr Pose and a numb er of his p ostgraduate

and honors students PFR FPR

Design Goals

The Secure RISC Multipro cessor Pro ject underto ok to design a scalable general

purp ose multipro cessor The architecture was required to supp ort a wide range of

sizes varying through single pro cessor workstations multipro cessor workstations

medium size multipro cessors and large clustered multipro cessors

In addition the system was required to b e built of mo dular comp onents use

a passive backplane for interconnecting pro cessors provide high p erformance by

minimizing the p otential for p erformance limiting b ottlenecks in the bus structure

b e suciently exible to supp ort a variety of algorithms rather than b eing tuned

to a sp ecic class of algorithms and provide mechanisms to supp ort fault tolerance

CHAPTER PROPOSED HARDWARE

A key requirement was to allow the numb er of pro cessors to b e increased grad

ually when demand for pro cessor p ower was required and budgets allowed Large

commercial multipro cessors tend to scale incrementally up to the size of the inter

connection network Scaling b eyond this size typically requires replacing the inter

connection network with a larger network This can make upgrading prohibitively

exp ensive

To achieve the goal of gradual scaling we prop osed an architecture where the

switching network was distributed evenly across the no des of the system We also

determined that the sp eed of pro cessors and memory must b e allowed to vary from

no de to no de This decision allowed older less fast no des to b e retained in a useful

role in a system which had acquired more recent faster pro cessor and memory units

while still exploiting the new units to their maximum p otential

Architecture

The design goals eliminated a numb er of p opular multipro cessor organizations Nei

ther hyp ercub e nor tree architectures were suitable as they lacked adequate path

redundancy for fault tolerance and they were unable to supp ort well a wide variety

of algorithms Dr Pose put forward a series of bus top ologies with varying degrees

of interconnection b etween buses These designs were meshbased providing re

dundant paths and allowing for the p ossibility of minimising b ottlenecks by routing

around network hotsp ots We converted the prop osed bus interconnection schemes

into a regular implementable form The resulting way and way interconnection

designs are illustrated in gure Furthermore we observed that the patterns

could b e converted into cylindrical and spherical forms by folding the mesh in half

and connecting busses at the edges of the mesh

Figure provides the conventional representation of the pro cessor intercon

nection top ology The diagram shows the direct connections b etween pro cessors

The way interconnection pattern allows a pro cessor to directly communicate with

other pro cessors and the way interconnection pattern allows a pro cessor to

1 A structure of rep eated subunits

ARCHITECTURE

way way

Multipro cessor No de Board

Bus Segment

Bus Joiner

Figure Bus Structures

CHAPTER PROPOSED HARDWARE

communicate with other pro cessors

Conventional clo ck distribution mechanisms were unlikely to b e suciently scal

able to b e used on an architecture with widely variable top ologies and bus lengths

To meet the demands p osed by this architecture we prop osed a system built from

a single typ e of no de Each no de had its own clo ck which it exp orted to the other

no des on its bus segment when communicating

No de Design

The Multipro cessor No de b oard Figure supp orts a combination of pro cessor or

memory mo dules on the MP bus Memory Pro cessor Bus These mo dules are to

b e constructed on daughter b oards which are plugged into the Multipro cessor No de

b oard A single Multipro cessor No de b oard b ehaves as a classic SMP machine Using

the external p orts it is p ossible to connect to an external network of multipro cessor

no de b oards using a passive backplane

Each Multipro cessor No de Board has a lo cal clo ck pulse generator This is used

to provide clo ck signals to the pro cessor and memory daughter b oards the control

logic and the arbiters This clo ck is also gated out through the p orts to clo ck the

external bus when the p ort b ecomes a bus master

The requirement for a global clo ck is eliminated by using the FIFOs to decouple

the lo cal clo cks from the clo ck found on the external bus

Functional Description

The Multipro cessor No de Board consists of ma jor functional blo cks connected by

a state machine the control logic The blo cks are

MP Bus

Bus Switching Unit

2

A technique known as Salphasic Clo ck Distribution Chi can b e used to distribute a syn

chronous clo ck signal with the required accuracy However a synchronous design do es not allow

pro cessors to op erate at arbitrary sp eeds reducing debugging exibilit y and negatively aecting the cost of system expansion

NODE DESIGN

Way

Way

Pro cessor No de

Connection

O Diagram Connection

Figure Top ology of Pro cessor Interconnections

CHAPTER PROPOSED HARDWARE

Figure Blo ck Diagram of Multipro cessor No de

Port Interface Units

MP Bus

The MemoryPro cessor bus MP bus is a data address and control signal bus

The data and address paths are bits wide with the data and address signals

multiplexed onto the bus This bus runs using a split bus proto col provided by the

pro cessorsMIP on the daughter b oards plugged into the Multipro cessor No de

Board

The pro cessor units and memory units on the MP bus form a classical shared

memory SMP machine

Bus Switching Unit

To help provide ob oard communications the switching unit provides op erational

states

Port A connect Port B

MP Bus connect Port A

NODE DESIGN

MP Bus connect Port B

No Connection

Port Interface Units

Each p ort has a p ort interface unit which p erforms the functions of transmitting

data onto a bus and receiving data from the bus

To receive data this unit recognizes relevant information on the bus and accepts

it into the input FIFO otherwise bus trac is ignored

To transmit data the p ort interface unit arbitrates for the bus and then outputs

data from the output FIFO

Op erational Description

All addresses in the system are partitioned into regions The most signicant

bits of the address determine which Multipro cessor No de Board is to b e accessed

and the least signicant bits determine the address of the memory lo cation on the

Multipro cessor No de Board see Figure Two Multipro cessor No de b oard

numb ers are reserved no de b oard numb er zero always refers to memory lo cal to

the no de b oard and the maximum no de b oard numb er refers to hardware control

memory lo cal to the no de b oard

The No de Numb er is used to index into a routing lo okup table held in static

RAM which is deco ded to determine where the memory lo cation can b e found

There are three typ es of access available to the pro cessor

Lo cal Memory Memory is addressed directly over the MP bus

Remote Memory Discussed in Section

Hardware Control Memory The bus p orts are isolated and the routing lo ok

up tables are mo died by the pro cessors

In addition a Memory to Memory DMA transfer facility is available to facilitate

page sized transfers

CHAPTER PROPOSED HARDWARE

MSB LSB

No de Numb er Memory Lo cation

No de Numb ers

Lo cal Memory

f Hardware Control f

x x O Board Memory

Figure Partitioning of Addresses

MP Bus Lo okUp Port B Lo okUp Port A Lo okUp

Read In In A Out A Out B In B Read In

Figure Contents of Lo okUp Tables

Routing

This section illustrates the op eration of routing data b etween memory and pro cessor

by following the path of a memory access

Transfers b etween no des employ a packet structure A packet comprises a header

a b o dy containing the data and a packet check sum The header contains the source

and destination addresses the packet size and the packet typ e Packet typ es include

read write and an indication of whether the destination is a pro cessor or memory

Packets are constructed and interpreted by control logic in the multipro cessor no de

b oard

Lo cal memory op erations use the intrinsic addressing mechanism of the pro ces sor

NODE DESIGN

Memory accesses are routed through the network in a manner similar to a packet

based storeandforward network

When a pro cessor utters an ob oard address the high order bits of the address

are used to index the MP Bus lo okup table The lo okup table contains bits which

indicate which p ort should b e used to attempt the access Figure

If the output FIFO on the required p ort is b elow the high water mark the p oint

at which it is guaranteed that the largest p ermissible packet will t in the FIFO and

there is no trac currently passing through the switch then the switch is connected

to the appropriate p ort A packet header is constructed and transferred to the FIFO

Data is transferred to the output FIFO A check sum is added to the FIFO If the

conditions are not met the pro cessor should reattempt the op eration later

When there is data in the output FIFO and the bus to which the p ort connects

is idle an attempt is made to arbitrate for the appropriate bus The arbiter is

discussed in Section When the p ort b ecomes the bus master the packet is

broadcast onto the bus

The high order bits of the destination address of the packet on the bus are used

to index into the p ort lo okup tables of all p orts attached to the bus If the p orts

Read In bit is set and the input FIFO is b elow the high water mark then the data

on the bus is read into the FIFO otherwise the data is ignored

The no de numb er of the destination address in the header of the rst packet in

the FIFO is used to index the MP Bus lo okup table If the In bit is set the

switch allows the contents of the packet to b e directed to the memory on the MP

Bus otherwise the switch is set to p ermit the ow of data from the input FIFO to

the opp osite output FIFO

Design Features

The MP bus and switched external memory packet transfer allows b etter utilization

of pro cessor memory resources Both the external and MP buses may b e loaded to

the level providing optimal utilization of the bus capacity

The design intro duces a memory hierarchy based on the numb er of hops b etween

no des This feature intro duces a new degree of exibility in the management of

CHAPTER PROPOSED HARDWARE

b oth memory pages and pro cesses The throughput of a pro cess is maximized by

relo cation of the pro cess andor its data to minimize the memory access time The

optimization of overall system p erformance is complicated by memorys b eing shared

by multiple pro cessors Peak p erformance is achieved by balancing pro cessor load

memory load and pro cess average access time BFS

By employing FIFOs on each p ort the risk of b eing unable to accept data on

a p ort due to trac on the MP bus to the other buss p ort is reduced However

this design decision has the cost of adding latency to every transfer through a mul

tipro cessor no de The presence of FIFOs on each p ort is esp ecially valuable where

large packet transfers are exp ected as it eectively doubles the depth of the FIFOs

for ow through trac hence reducing the risk that a packet will not b e accepted

b ecause a FIFO is ab ove the high water mark

Arbitration

The pro ject has considered many arbiter designs The designs were evaluated against

the following required prop erties

Fairness Each no de must have a similar probability of b ecoming bus master

as the starvation of a no de would prevent the delivery of data

Guaranteed Result A bus master is selected every time an attempt is made

Varying Asynchronous Clo cks Arbiters are synchronous with resp ect to their

lo cal clo ck

The mechanism prop osed in the pap er used a priority based arbiter to resolve

bus master conicts Priorities were rotated after each arbitration to ensure long

term fairness The implementation employed a set of busrequest lines and a set

of acknowledge lines to provide a handshake This ensured that the prioritybased

phase was not undertaken until after all no des on the bus had acknowledged that

they had recorded the request state of all other no des The recording of the request

state of other no des was necessary to reduce the eects of p otential metastabilities

in the comp onents of the arbiter on the result The implementation was decient

NODE DESIGN

in that it reduced the arbitration sp eed to a multiple of the sp eed of the clo ck on

the slowest b oard on a bus A variant on this design was prop osed which used

a clo ck which was indep endent of the no des master clo ck to supply the arbiters

requirements The indep endent arbitration clo ck could op erate at a high sp eed and

could b e run at a uniform rate across the b oards resulting in an improvement in the

sp eed of arbitration and a simplication of the arbitration circuitry

Dr Pose currently has students working on alternative solutions to b e imple

mented in VLSI Work undertaken by Ted Kehl of the Department of Computer

Science and Engineering at the University of Washington in the areas of self tun

ing of VLSI circuits and a mutual exclusion element which restricts the spread of

metastable voltages KB oers the opp ortunity to pro duce a simple arbiter with the required prop erties

CHAPTER PROPOSED HARDWARE

Chapter

Continuing Future Work

This chapter outlines likely future developments in the Monash Secure RISC Mul

tipro cessor pro ject for the system software and the hardware Section covers

software and section covers hardware developments

Soft ware

The Walnut Kernel provides an environment which supp orts p ersistent shared mem

ory This section outlines further work on the kernel itself and on a numb er of

pro jects which exploit its features The section also details the advantages of us

ing a capability based op erating system for these tasks and problems likely to b e

encountered by implementors of these pro jects

Kernel

Development of rent and charging co de have b oth b een p ostp oned in the current

version as their presence was not required during the initial development phase It

is planned to implement these monetary functions and disk reconstruction co de in

the near future

Although the current kernel has functions which allow the manipulation of

money it do es not contain co de to implement charging by kernel op erations and

rent collection Provision has b een made for the implementation of these features

CHAPTER CONTINUING FUTURE WORK

Charging for kernel services is intended to b e p erformed by debiting a pro cess

a xed amount of money p er service when the service is requested This allows the

budget required for the needs of a pro cess to b e easily calculated

The rent collection mechanism is intended to run p erio dically It will examine

each ob ject mounted on a volume and debit the appropriate amount of rent Typi

cally the rent collector will run when the system load is light As the rent collector

may visit ob jects at irregular intervals it is necessary to store a time stamp in each

ob ject to determine when rent was last collected and the amount currently due

At present the Walnut Kernel provides no mechanism for the recovery of cor

rupted volumes Provision has b een made for reconstruction software to b e incor

p orated It is exp ected that the reconstruction software would consult the bitmap

which describ es the allo cation of pages on the volume and scan the volume for pages

containing magic numb ers indicating that the page contains a header page By com

paring this information it should b e p ossible to conrm that a volume has not b een

corrupted when the volume is mounted and failing that restore at least part of the

lost data

Co de User

User Interface

The Walnut Kernel currently supp orts a text based interface which provides screen

multiplexing the interface provides a numb er of virtual screens which may b e

connected to pro cesses and the user may select for viewing any virtual screen by

using a sequence of keystrokes This interface was develop ed by Mr Glen Pringle

Supp ort for a graphical interface is planned This will require mo dication of

the kernel by the addition of co de to control the mo de of the graphics card

At present the text based interface interacts with a shell which supp orts the

execution of programs and basic ob ject management using a simple interactive

command language Where traditional op erating systems use a hierarchical direc

tory structure this shell employs a set based representation directory structure Mr

Glen Pringle is continuing work on the shell

1

High system load or unmounting a disk may result in rent collection op eration b eing delayed

SOFTWARE

Unix Compatibility Library

To allow the exploitation of the signicant quantities of existing C co de available to

UNIX systems a partial emulation of the C libraries which provide an interface to

UNIX has b een constructed

The inputoutput and memory allo cation functions of the standard C library are

based on the functionality found in UNIX op erating system As the Walnut Kernel

is based on a dierent paradigm co de which emulates the features of UNIX using

the functions provided by the Walnut Kernel has b een written This work has b een

only partially completed

Mr Carlo Kopps work on a library that emulates the functionality provided by

UNIX for standard inputoutput is nearing completion

Professor Chris Wallace Mr Glen Pringle and myself have develop ed a partial

emulation of the standard C memory allo cation routines Work is continuing in this

area

Work is also continuing on the development of a replacement for standard C

libraries where existing libraries are unable to op erate in the environment provided

by the Walnut Kernel

Shared Libraries

The Walnut Kernel was designed to facilitate the sharing of memory ob jects The

development of shared libraries under this environment presents a numb er of chal

lenges It is necessary to provide mechanisms which allow the relo cation of co de the

dynamic linking of co de and supp ort the storage of variables lo cal to the instance

of the shared library b eing used In addition it is necessary to consider p olicies for

up dating and p ossibly removing old libraries

Preliminary work has b een p erformed to demonstrate p ossible mechanisms for

implementing shared libraries This work exploited the is supp ort of p osition

indep endent co de

A numb er of p ossible approaches to the problem of implementing shared libraries

have b een put forward These include

Lo cating mo dules of the libraries at a xed virtual address This requires

CHAPTER CONTINUING FUTURE WORK

programs which use mo dules from these libraries to load the mo dule and any pages

required to store data lo cal to the instance of the mo dule at a xed lo cation in the

pro cesss address space This mechanism is simple to implement and allows the most

ecient addressing mo des available on the pro cessor to b e used It has signicant

disadvantages in that it places restrictions on the organization of program memory

and may prevent mo dules from multiple libraries b eing used in the same program

due to clashes in address space usage

Where the pro cessor provides supp ort for p ositionindep endent co de a table of

p ointers lo cated at a known place in the pro cesss address space can b e used to

access data items and subroutines which are not contained within the co de mo dule

The table can b e indexed with the capability index allowing the shared routines to b e

arbitrarily placed in the available address space This mechanism oers exibility in

the structuring of pro cess address space and do es not prevent mo dules from multiple

libraries b eing employed The use of indirect access to data and co de outside the

shared mo dule reduces the eciency of the co de

An alternative mechanism for use on systems supp orting p osition indep endent

co de is to employ a buddy scheme This metho d places a data ob ject at a xed

distance away from the co de ob ject This ob ject would contain all references and

lo cal storage for the instance of the shared mo dule This scheme oers the ma jority

of the features of the tablebased mechanism describ ed ab ove while surrendering

only a marginal degree of exibility in the organization of the pro cess address space

The mechanism also allows faster addressing mo des to b e used This is b ecause co de

can b e generated in the buddy page that makes most ecient use of the available

mo des while allowing the co de to b e customised for access to data in the current

instance of the shared library

Professor Chris Wallace has students currently investigating the implementation

of shared libraries under the Walnut Kernel

With suitable language supp ort it is p ossible to dynamically replace a shared

library Languages such as Erlang provide mechanisms which allow the most recent

version of a function to b e called in preference to the version existing at the time of

2

A functional language develop ed at Ericsson and Ellemtel Computer Science Lab oratories

AVW

HARDWARE

starting the program More traditional languages such as C do not provide direct

supp ort for the replacement of a mo dule

To supp ort the dynamic replacement of mo dules without language supp ort it

is necessary to check for the existence of a new mo dule each time a function in the

mo dule is called This pro cess is p otentially exp ensive and careful evaluation of the

costs and b enets of providing dynamic replacement of mo dules is required

Hardware

Chapter describ ed the design of the prop osed hardware and describ ed work cur

rently b eing undertaken by Dr Ronald Pose of the Department of Computer Sci

ence Monash University and his students This work covers the implementation

of hardware based on the principles outlined in CP and VLSIbased distributed

arbitration circuitry

To optimise p erformance on the environment presented to the op erating system

by this hardware requires the op erating system to dynamically relo cate b oth pro

cesses and data pages To provide b est p erformance it is necessary to balance the

comp eting goals of increasing parallelism by employing a greater numb er of pro

cessors and decreasing the overhead of interpro cess communication by limiting the

numb er of pro cessors The wide range of memory access sp eeds provided by this

design makes this optimisation complex It is a signicant research area in its own

right

Recent research by Dunning and Ramakrishnan of Bowling Green State Uni

versity RD has shown that the assignment of tasks to a multipro cessor system

of mo derate size is an NPcomplete problem It is exp ected that work on assign

ment of tasks will rely on heuristics and concentrate on the avoidance of worst case

p erformance

The original intention was to implement the Walnut Kernel on the prop osed

hardware However to sp eed development a i based platform was selected A

p ort of the Walnut Kernel to the prop osed hardware is still intended

CHAPTER CONTINUING FUTURE WORK

Chapter

Conclusion

The Secure RISC Architecture pro ject is the successor to the PasswordCapability

System The features of the kernel on the earlier system have b een retained The

Walnut Kernel however incorp orates signicant innovations and alterations to en

hance the applicability of the kernel to a wide range of commonly available hardware

and to meet the needs of programmers The hardware comp onent of the pro ject is

scalable from a single pro cessor system to a cluster of multipro cessors The wide

scaling range is gained by abandoning centralised clo cks and distributing switch

ing hardware This chapter identies the features of the Walnut Kernel and of the

prop osed hardware and evaluates them in the context of other systems

The design of the Walnut Kernel incorp orates signicant changes to the original

passwordcapability mo del to allow a more p ortable implementation It demon

strates that a useful system can b e constructed using the passwordcapability mo del

which is easily transp ortable b etween hardware platforms Furthermore the Walnut

Kernel showed that p ortability has not b een won at the cost of security

Signicant departures from the PasswordCapability System include moving from

a system based on capability registers a segment register like mechanism to using

paging for all access control the intro duction of the subpro cess mechanism the

ability to restrict the rights of a capability after it has b een created and the intro

duction of the SRMULTILOAD mechanism and the ability to create capabilities

with nonrandom passwords

The shift from capability registers to a pagebased implementation has made the

CHAPTER CONCLUSION

granularity of protection coarser While ne grained control has b een lost it is not

crippling This change forces programmers to use capabilities which cover larger

regions of memory than those used in the PasswordCapability System If space is

at a premium programmers can aggregate small data structures which have the

same security requirements into a single ob ject Osetting the coarsening of the

protection domain is the ability to load and access a far greater numb er of ob jects

at one time This reduces the numb er of load and unload op erations required in

comparison to the PasswordCapability System and allows the programmer greater

exibility

Diculties found while writing application programs have motivated changes to

the functions provided by the kernel

The message passing mechanism was required to handle messages of dierent

typ es in dierent ways and to guarantee the delivery of high priority messages This

requirement was met by intro ducing a subpro cess mechanism and functions which

allow the reservation of mail b oxes based on target subpro cess or message prex

The subpro cess mechanism provides elegant handling of events asynchronous to a

pro cess The mechanism can also b e used to implement a form of coop erative

multitasking within a pro cess

The initialisation pro cess requires a capability for the physical memory where the

kernel resides Although the passwords for this capability are randomly allo cated

this situation presents a signicant opp ortunity for attacking the system To improve

security it was useful to b e able to remove rights through the restrict op eration

from this and other capabilities The ability to remove rights from a capability after

it has b een created is a fundamental change from the PasswordCapability System

Access to this capability would allow a pro cess unfettered access to the system The

restrict op eration deletes rights from a capability preventing those rights b eing

used by pro cesses that load the capability after the restrict op eration has b een

p erformed Instances of the capability loaded at the time of the restrict op eration

are unaected The op eration enhances the security of the system by pruning rights

present in the capability tree

While writing the screen and keyb oard device manager Glui it was found

that device managers needed greater control over pro cesses using capabilities which

allow access to raw devices A new mechanism to control the scheduling of a pro cess

was intro duced along with a mechanism to limit the use of a capability to a single

pro cess Capabilities without the SRMULTILOAD right can only b e loaded by

pro cesses with a serial numb er equivalent to the password of the capability This

mechanism provides a a metho d of restricting the use of a capability to a sp ecic

pro cess This mechanism is teamed with the protected freeze and thaw op era

tions These op erations allow a pro cess to b e frozen and to prevent another pro cess

from unfreezing it

To make the SRMULTILOAD right useful it was necessary to add the ability to

sp ecify the passwords of a derived capability This mechanism increases exibility

as it allows capabilities with widely known values to b e used as mechanisms for

accessing services and it do es not reduce the security of the system

Another signicant dierence from the PasswordCapability System is that part

of the serial numb er of a capability under the Walnut Kernel is used to represent

the disk blo ck of the header blo ck of the ob ject This change has b een shown to

have a minimal cost in terms of the ability to guess a capability At most one bit of

the serial numb er is lost

The current xed queue size round robin scheduling mechanism is inadequate

for a pro duction system However a user pro cess providing storage for the names

of pro cesses to b e scheduled and long and medium termed scheduling is planned to

overcome the deciencies of current scheduler

Two typ es of windows have b een intro duced into the Walnut Kernel This makes

the programmers task more complex All capabilities can b e loaded into small

windows but only a small fraction of the pro cess address space is available for small

windows Some capabilities cannot b e loaded into a large window The convenience

of the size of large windows is traded for the exibility of small windows

Execute only co de is not available under the Walnut Kernel as its target p opu

lation of pro cessors do es not generally supp ort execute without read in page table

entries This prevents the use of emb edding capabilities in execute only co de as a

mechanism of protecting capabilities Furthermore it prevents the hiding of algo

CHAPTER CONCLUSION

rithms and implementations from users of library co de Mechanisms that provide

these protections using the available facilities are b eing investigated

The Walnut Kernel do es not have an adequate backup mechanism Currently

volumes can b e archived at the blo ck level and restored after a failure of the media

However although individual users can backup the contents of an ob ject they can

not backup an ob ject itself The set of capabilities derived for the ob ject and the

capability for the ob ject is lost The partial backup and restore problem is common

to most capability based systems In addition there is a need to eliminate tamp er

ing with the contents of a backed up ob ject to prevent manipulation of the money

eld

Unlike the PasswordCapability System which was designed around custom

hardware the Walnut Kernel was written with p ortability as a high priority The

design traded the advantageous features of a sp ecic architecture for a more generic

set of functions In the case of the i implementation this meant sacricing the

ne grained control oered by segment registers for the more widely available paging

mechanism

The Walnut Kernel incorp orates a general mechanism for accessing devices by

representing the interfaces to devices as op erations p erformed on blo cks of memory

mapp ed into the kernels address space This mo del of interaction b etween the

kernel and hardware exactly describ es the actions of a multipro cessor system with

sp ecialised IO controllers which write into shared memory blo cks It is also a go o d t

for unipro cessor versions of the kernel as a timer interrupt or a device interrupt can

b e used to activate an interrupt service routine which treats the memory blo ck as

memory shared with the kernel This mo del has the valuable side eect of allowing

interrupt service routines to b e written as if they are separate from the kernel This

simplies the task of p orting the kernel from system to system as the device drivers

are clearly identiable and indep endent of the kernel co de

A simple but eective memory management mo del was adopted for managing

the physical memory Part of the physical memory is reserved for the kernel The

kernel is loaded into this section of memory and remains resident The kernel area

is not paged and is present in all pro cess address spaces The physical memory used

to hold pages of ob jects is managed by allo cating new pages on demand from

the list of free pages Pages are added to the free list by a pro cedure in the kernel

which p erio dically invalidates and disp oses of clean pages and writes dirty pages to

disk The simple exp edient of p erio dically disp osing of page tables and maintaining

a last reference time for each page of an ob ject allows the kernel to ensure that a

page cannot b e accessed and can safely b e removed from the physical memory

The Walnut Kernel allows a far greater numb er of capabilities to b e loaded by a

pro cess than the PasswordCapability System Also the address space of a pro cess is

considerably larger in the Walnut Kernel These facilities are partial comp ensation

for the loss of ne grained access control that was provided by sp ecialised hardware

The p erformance measurements indicate that the Walnut Kernel provides similar

p erformance to a conventional op erating system where the op erations are compa

rable This demonstrates that capability based op erating systems in general and

the Walnut Kernel in particular could b ecome practical alternatives to conventional

op erating systems

The survey clustered op erating systems into categories based on the kernel typ e

and the dominant paradigm of the kernel From a commercial p ersp ective the most

successful op erating system has a monolithic kernel and has a le based paradigm

Small kernel and microkernel based op erating systems have emerged as b oth pro

duction and research systems Capability based op erating systems tended to b e

exp erimental

KeyKOS demonstrated that a capability based op erating system oered features

and facilities demanded by commercial computing users Of particular imp ortance

to the commercial environment were the high level of security ease of sharing data

and the ability to accurately charge for services Furthermore a capability based

op erating system provided these features as a natural extension of the op erating

system

The concepts of small kernels and microkernels are valuable as they allow soft

ware engineering practices to b e applied to systems programming Critical pieces of

co de are isolated and well dened interfaces are used for communication b etween

b oth kernel comp onents and layers of software providing services

CHAPTER CONCLUSION

The Walnut Kernel was designed as a small kernel to provide the functionality

required in a commercial environment The passwordcapability mechanism is more

exible than the alternatives of segregation and tagging Tagged architectures are

demonstrably not cost ecient while segregated architectures are more complicated

and less exible The ability to directly manipulate a capability and treat it as ordi

nary data eliminates the need for the kernel to mediate all op erations on capabilities

If the kernel is required to p erform all op erations on capabilities the set of op era

tions is limited to those explicitly provided for by the op erating system designer

The Walnut Kernel allows user co de to extend the range of op erations available for

manipulating a capability Examples of where this ability might b e useful include

communicating capabilities through an untrusted third party The Walnut Kernel

would allow the most uptodate encryption algorithms to b e employed Alternative

systems are either unable to provide this facility or the algorithm is xed in the

kernel

Single address space op erating systems such as Opal provide a more direct link

b etween capabilities and addresses than the loadunload capability mo del employed

by the Walnut Kernel This simplies the application of the paradigm and reduces

complexity for the programmer Users of a SASOS list a set of capabilities for

the protection domain that the program is allowed to use Users of the Walnut

Kernel are required to explicitly load capabilities into their address space The Wal

nut Kernel enjoys greater p ortability than SASOS op erating systems as an address

space signicantly larger than bits is required to adequately supp ort the SASOS

paradigm The Walnut Kernel functions well in a bit address space

The container mechanism used in Grasshopp er is more general than the loading

of segments provided by the Walnut Kernel Containers are a collection of segments

of ob jects Containers may b e nested within other containers Under Grasshopp er

the lo cus of a pro cesss execution jumps from container to container as it pro ceeds

Similar functionality could b e emulated under the Walnut Kernel by employing a

set of routines within each Walnut Kernel pro cess that load and unload segments

based on addressing exceptions However the implementation would b e less elegant

and less ecient

The hardware prop osed is novel in two resp ects The hardware distributes the

interpro cessor switches allowing for cost eective system growth and it eliminates

the requirements for central clo cking of the system

Current multipro cessors such as the CM use centralised interconnects which

allow easy growth up to the capacity of the interconnect Adding a single pro cessor

when an interconnect is full incurs the cost of adding either a new interconnection

mo dule or replacing the existing interconnect The cost of the interconnect dom

inates the sizing of the system By distributing the switching hardware we allow

the interconnect to grow as the system expands This allows the numb er of mo d

ules to b e right sized for the problem as the cost of the interconnection no longer

determines the most cost ecient size of the system

Eliminating the requirement for central clo cking the multipro cessor allows greater

exibility in the physical layout of the system In addition it lifts size constraints

on a multipro cessor allowing systems to b e constructed using multiple cages and

even cages separated by large physical distances This allows systems to b e easily

put together for large jobs The mechanism prop osed has the additional b enet of

allowing pro cessors to op erate at dierent clo ck sp eeds within the same multipro

cessor This has signicant advantages in that allows investment in earlier mo del

pro cessors to b e retained without losing the full b enet of incorp oration of faster

new pro cessors into the system

The Secure RISC Architecture pro ject incorp orates b oth hardware and software

elements to address the problem of building systems which range in size from single

pro cessor workstations through to large multicabinet multipro cessors The Walnut

Kernel op erating system and the hardware were designed to b e mutually supp ortive

to provide p erformance security and reliability

CHAPTER CONCLUSION

App endix A

Guide User Level Programmers

This app endix is drawn from the technical rep ort entitled The Walnut Ker

nel User Level Programmers Guide This technical rep ort is available from the

Department of Computer Science Monash University

The Walnut Kernel User Level Programmers Guide Title

Author Maurice Castro

No

Revised November

I present it in supp ort of my thesis

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

A Overview

The Walnut Kernel is a capabilitybased op erating system under development in

the Department of Computer Science at Monash University This op erating sys

tem draws on the concepts of and exp erience gained from the PasswordCapability

System

The Walnut Kernel employs bit names PasswordCapabilities for views

onto p ersistent ob jects The random allo cation of names within a sparse name space

provides a known level of statistical security for views and the contents of ob jects

Asso ciated with each name is a set of rights which entitle the holder of the capability

to access a section of the named ob ject in a sp ecied way

The Walnut Kernel was designed as a p ortable op erating system although it

currently runs only on based PCs Programs are compiled on a FreeBSD

system and transferred onto the target machine on oppy disks Work is continuing

on the development of the kernel as well as the development of interfaces shells

and utilities for the system

This do cument contains a programmers manual for the Walnut Kernel The

do cument is sub ject to revision as the kernel alters and currently describ es only the

lowest level of the kernel interface

wledgments Ackno

The author would like to acknowledge the following contributions

Prof CS Wallace coauthor of the kernel

Mr Glen Pringle author of many of systems utility programs

Mr Carlo Kopp author of the UNIX compatibility libraries

The Secure RISC Architecture pro ject is supp orted by a grant from the Aus

tralian Research Council A Maurice Castro is a recipient of an Aus

tralian Postgraduate Research Award

1

M Anderson R D Pose and C S Wallace APasswordCapability system The Computer

Journal

A OBJECTS

A Ob jects

All entities controlled by the Walnut Kernel are ob jects AWalnut Kernel ob ject is

analogous to a segment in segmented computer architecture It comprises an ordered

array of bytes An individual byte is identied by its oset a numb er indexing

the array The rst byte has oset zero An ob ject is dened by the following

characteristics

Maximum Oset The largest addressed oset in the ob ject Note this value is

set on creation and automatically increases as long as there are pages available

in the allo cated space of the ob ject

Limit The largest addressable oset allowed in the ob ject

Maximum Size The maximum numb er of bytes guaranteed to b e available to an

ob ject The numb er of bytes includes storage for the ob jects capabilities and

dop e vectors In practice this value represents the maximum numb er of pages

and header pages guaranteed to b e available to an ob ject The numb er of pages

is calculated by dividing the maximum size by the page size and rounding

upwards

Maximum Capabilities The maximum numb er of capabilities that can represent

this ob ject Note this value automatically increases as long as there is space

available to hold the new derived capabilities

Money The amount of money the ob ject has available Sucient money must b e

present in an ob ject to pay for its resource consumption

Each ob ject has at least one capability that allows access to the ob ject the

ob jects Master Capability Deletion of the ob jects master capability results in

the deletion of the ob ject Other capabilities for views of the ob ject are derived

from the master capability or its descendants

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

A Capabilities

The Walnut Kernel employs password capabilities to identify access rights to ob jects

Each capability see gure A consists of a bit identier comp osed of four bit

values a volume numb er a serial numb er password and password Asso ciated

with each capability is a view which determines the region of an ob ject a capability

applies to a set of user rights and a set of system rights which control how that

capability is to b e used

bits bits bits bits

Volume Serial Password Password

Figure A APassword Capability

A View

A view is the attribute of a capability that denes the region of the ob ject that can

b e addressed by the p ossessor of the capability Views are contiguous regions and

are dened by an oset from the base of the ob ject and an extent The view entitles

the user to address part of an ob ject it do es not guarantee that pages are contained

in that region nor that the pages are readable by user pro cesses

A User Rights

User rights consist of a set of bits which are managed by the kernel The kernel

attaches no meaning to the user rights bits They are intended to b e used by user

pro cesses to implement access to services in a way that is analogous to the control

system rights bits have over access to kernel services

A System Rights

The system rights asso ciated with a capability are enco ded in a bit word basically

as the OR of bits representing particular rights the SRSEND eld is an exception

For the numeric value of the system rights symb ols see gure A

SRDERIVE Allow capabilities to b e derived from this capability

SRSUICIDE Allow this capability to destroy itself and its children

SRDEPOSIT Allow the holder of this capability to dep osit money into the ob ject

A CAPABILITIES

SRWITHDRAW Allow the holder of this capability to withdraw money from

the ob ject

SRREAD Allow the holder of this capability to read from the view

SRWRITE Allow the holder of this capability to write to the view

SREXECUTE Not used

SRUSER Allow user pro cesses to use the view

SRPEEK Allow the holder of this capability to p erform a p eek system call on

the pro cess represented by this capability see A

SRMULTILOAD Allow this capability to b e loaded by any pro cess If this right

is absent then only pro cesses with a serial numb er equivalent to the capabilitys

password may load this capability

SRSEND an bit eld which if nonzero sp ecies the subpro cess to which

messages may b e sent by using this capability This eld has two sp ecial

values xff allow messages to b e sent to any subpro cess of the pro cess and

xfe disallow messages to subpro cess zero but allow messages to b e sent to

any other subpro cess of the pro cess

A Deriving Capabilities

Derived capabilities have equal or lesser rights than than their parent capability at

the time of derivation Suicide right is an exception as this right may b e added to

the children of capabilities which do not hold this right

The rights of a parent capability may b e reduced through the use of the re

strict system call after a child capability has b een derived The child capability is

unaected by the restriction of the parent capabilities rights

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

A Pro cess Structure

A pro cess in the Walnut Kernel is essentially an ob ject which contains state infor

mation relating to the execution of the pro cess The minimal information found

within a pro cess ob ject is

Subpro cess table

Message slots

Table of Loaded Capabilities

Pro cess cash

Lo ck words

Parameter page

Address map

Of these only the parameter page and the address map are directly accessible to the

user pro cess The address map is readonly The parameter blo ck and the remainder

of pro cess ob ject are b oth readable and writable by the user pro cess

The pro cess structure is detailed in diagram gure A

A Pro cess Address Space

The address space of a pro cess op erating under the Walnut Kernel is comp osed of

three regions

Kernel Area is lo cated at the b ottom of the address space and is not addressable

by user pro cesses

Small Window Area is lo cated ab ove the Kernel Area and has a page sized gran

ularity Single pages or multiple pages of ob jects may b e mapp ed into this

region of the address space by a user pro cess These mapp ed regions always

b egin and end on a page b oundary

Large Window Area is lo cated ab ove the Small Window Area It has a coarser

granularity than the Small Window Area The rst large window contains the

Pro cess Ob ject All other large windows are allo cated by the pro cess

On a system with a kilobyte page size large windows have a granularity of

megabytes and small windows have a granularity of kilobytes

Two distinct paradigms are used to describ e how the address is p opulated with

ob jects

The PasswordCapability system used the term Window Registers to describ e

a set of segment registers The PasswordCapability system used the upp er bits of

A PROCESS STRUCTURE

xffffffff

xffb

large windows

xff

Hx

R

Message Area

M

Pro cess ob ject

P Parameter Blo ck

Address Map

A

H

small windows

x

x

xd

Kernel Area

xc The Wall

x

Description Constant Name Value

R remainder of pro cess ob ject PARAMADDRESS x x

M message area EXTRAADDRESS xc

P parameter page PARAMADDRESS x

A address map xf

H pro cess header PROCHDADDRESS x

Figure A Pro cess Address Space This diagram describ es the ma jor features

of the address space seen by a pro cess op erating on a system with kilobyte pages

The message area and the parameter blo ck are collectively known as the parameter page

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

the virtual address to indicate which register was in use We retain this terminology

in the Walnut Kernel The analogy b etween the two systems is imp erfect as the

Walnut Kernel supp orts two classes of window registers and although the numb er

of window registers is xed on the Walnut Kernel the lo cation of the registers in

the virtual address space is not xed under the PasswordCapability System b oth

of these parameters were constant

The second paradigm describ es the op eration of the system On a system with

kilobyte pages it views the address space as two address ranges The address

range from x to xffffff can have ob jects loaded on kilobyte b oundaries

The address range x to xffffffff has ob jects loaded on megabyte

b oundaries

A Parameter Page

This page is comp osed of two parts the parameter blo ck and the message area

The parameter blo ck is a structure dened in the le paramh This blo ck is used

to pass parameters to the kernel when a system call is made The second area is

used to pass additional information to the kernel and to receive information from

the kernel The information passed via the message area varies with the typ e of call

Parameter Blo ck

The declaration of the parameter blo ck structure is found in gure A

The elds are named after the function they are used for in the ma jority of calls

The values contained in the elds are

error The error eld contains an integer error value on returning from a system

call If the value is zero then the system call completed successfully If the

value is greater than zero the system call could not b e completed successfully

If the return value is negative or greater than an internal kernel

error has o ccurred contact the systems maintainer urgently and rep ort the

value The le includekerr orh contains a translation table which allows error

values to b e converted to ascii strings

vol serial pass pass The capability eld comp osed of vol serial pass and

pass contains either a capability b eing passed to a system call or a capa

bility b eing passed back by a system call

srights The system rights eld contains one of

a bitmap indicating the system rights provided by a capability Figure

A denes the symb olic and numeric forms of the system rights bits

a mask which restricts the system rights provided to a derived capability

2

Negative error values are used internally by the kernel to indicate partial completion of a

system call which cannot b e completed b ecause of a transient problem

A PROCESS STRUCTURE

Parameter Structure

typedef struct Paramst

Sw error

Uw vol volume ID

Uw serial serial in volume

Uw pass password

Uw pass password

Uw srights System rights

Uw urights User rights

Sw base Offset of cap from front of object

Sw limit Max addressing offset from base

means to end of object Zero

Sw money money word

Uw type Type of object

Sw maxoff Max addressing offset in whole object

Sw maxsz Max size of defined content

Sw maxcap Max capabilities now allowed

Sw offset An offset in a capability window

Sw subpn A subprocess number

Sw cindex Index of a capl in a process TLC

Uw clocktime

Uw reserve Nonzero shows reserved by subprocess

Param

Figure A Parameter Blo ck Declaration

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

a set of limits used in the creation of a pro cess

an enco ded pro cess state when inquiring ab out a pro cess state

urights The user rights eld contains either

a bitmap indicating the user rights provided by a capability

a mask which restricts the user rights provided to a derived capability

base The base eld contains an oset from the b eginning of a view or on pro cess

creation the time in seconds at which the new pro cess is scheduled to wake

up

limit The limit eld contains one of

the length of a message

the maximum addressable oset of an ob ject

the maximum size of a view

money The money eld contains one of

the amount of money in an ob ject

the amount of money to b e dep osited or withdrawn

the amount of money to b e sent with a message

the amount of money received from a message

typ e The typ e eld contains the ob ject typ e The top bit of this eld is set if

the ob ject is a pro cess The following typ e values are reserved by the kernel

Prototyp e pro cess

ffff Physical memory ob ject

Drive pro cess

Prototyp e pro cess

maxo The maximum oset eld contains the current maximum oset of an ob ject

maxsz The maximum size eld contains the current maximum size of an ob ject

maxcap The maximum capability eld contains the current maximum numb er of

capabilities for a pro cess

oset The oset eld contains an oset into a pro cesss address space

subpn The subpro cess numb er eld contains the destination subpro cess numb er

for a message

A PROCESS STRUCTURE

cindex The capability index eld contains the index into the table of loaded capa

bilities that a capability o ccupies

clo cktime The clo cktime eld after returning from a system call contains the

current time in seconds The clo cktime eld is set to the wakeup time for a

pro cess when a wait system call is made

reserve The reserve eld provides b oth a lo cking function which prevents other

subpro cesses accessing the parameter blo ck and indicates the typ e of kernel

call b eing made The currently available kernel call constants are listed in

gure A

SRDERIVE x System rights bits define

define SRSUICIDE x

define SRDEPOSIT x

define SRWITHDRAW x

define SRREAD x

define SRWRITE x

define SREXECUTE x

define SRUSER x

define SRPEEK x

define SRMULTILOAD x

Bits relating to send rights define SRSEND xFF

Figure A System Rights Constants

Message Blo ck

The message areas contents are interpreted dierently for each class of call There

are currently three classes of information stored in the message blo ck

Messages Messages to b e sent by the send message system call and messages

recovered by the receive message system call are stored at the front of the

message blo ck

System states The save register and load register system calls store the regis

ter set and other state information for a subpro cess at the front of the message

blo ck

ReadWrite Data Bytes to b e transferred by the external read or external

write system calls are stored at the front of the message blo ck

Initialization information Initial values used to set stack and program coun

ters the name of an heir and a list of capabilities to b e preloaded into a

pro cess are stored at the front of the message blo ck

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

Action Codes

define KMAKEOBJ

define KMAKECAP

define KDEL

define KDELDER

define KRESIZE

define KSHRINK

define KWAIT

define KLOADCAP

define KUNLOADCAP

define KCAPID

define KMAKEPROC

define KSEND

define KRECV

define KEXTSEND

define KEXTREAD

define KEXTWRITE

define KBANK

define KRESTRICT

define KCAPSTAT

define KRENAME

define KMAKESUBP

define KDELSUBP

define KLOADREG

define KSAVEREG

define KSETTRAP

define KRECVCLOSE

define KACCEPTMAIL

define KCLOSEBOX

define KCOPYOBJ

define KPEEKPROC

define KSETHEIR

Figure A Dened Kernel Call Constants

A PROCESS STRUCTURE

Wall A The

Every pro cess has a readonly page mapp ed into its address space known as the

Wall This page contains public information including the current time and the

capabilities of public utilities A wall manager places information in the wall The

wall currently contains

y

xc Scheduler Start Variable

y

xc Physical Ob ject Volume

y

xc Physical Ob ject Serial

y

xcc Physical Ob ject Password

y

xc Physical Ob ject Password

xc GLui Magic Numb er

xc GLui Volume

xc GLui Serial

xcc GLui Password

xc GLui Password

xc Name Server Set Magic Numb er

xc Name Server Set Volume

xcc Name Server Set Serial

xc Name Server Set Password

xc Name Server Set Password

xc Name Server Set Oset

xcfe Time in Seconds

xcfe Time in Microseconds

y These lo cations are used by the initialization pro cess After initialization the

capability of the physical ob ject is overwritten by the initialization pro cess and the

value of the scheduler start variable is no longer signicant

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

A Pro cess Structure Conventions

This section covers the conventional layout of a pro cess Section A outlined the

mandatory elements of a pro cess structure

A The Pro cess Ob ject

The following elements of the pro cess ob ject are visable to the user pro cess and are

provided by the kernel

The Pro cess Address Map

The Parameter Page

The Message Area

They form part of the mandatory comp onent of the pro cess structuring convention

used by Walnut Kernel pro cesses

By convention the following items are lo cated within the pro cess ob ject

Startup Co de Area optional This area may contain a small amount of co de

used in starting a pro cess

File Descriptor Table mandatory This area contains the le descriptors for use

by the pro cess Note The rst elements of the File Descriptor Table are

mandatory to allow for standard output standard input and standard error

Private Data Pointer Table mandatory This area contains p ointers to private

data The table is indexed by the capability index of the executing co de and

is used to lo cate data used by the executing co de

Default Heap optional The default lo cation for the creation of the heap

Default Stack optional The default lo cation for the creation of the stack

The structure of the pro cess ob ject is outlined in gure A

A The Pro cess

Conventional pro cesses will b e constructed according to the following rules

The co de ob ject will b e loaded at address x

The data ob ject will b e loaded at address x

Initialized data will b e placed at the front of the data ob ject

A PROCESS STRUCTURE CONVENTIONS

x

Default Stack

Default Heap

x

Private Data Pointer Table

x

File Descriptor Table

x

Startup Co de Area

x

Parameter Page

x

Pro cess Address Map

xf

x

Figure A Pro cess Ob ject This diagram describ es the ma jor features of the

pro cess ob ject

This design allows multiple instances of a pro cess to b e created by sharing the

co de ob jects and using copies of the data ob jects In addition by placing the ini

tialized data at the front of the data ob ject it is p ossible to ensure that the original

data ob ject is compact and hence easy to copy The copy of the data ob ject will

expand as required when uninitialized data is accessed

This arrangement of co de and data allows up to Mbytes of co de to b e sup

p orted With the intro duction of shared co de libraries larger programs can b e sup p orted

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

A Pro cess Creation

This section describ es the pro cess of creating a pro cess and the initial state of a new

pro cess

A Making Pro cesses

A pro cess is created using the Make Pro cess call covered in section A This

section will provide a general intro duction to the creation of a pro cess

Creating a pro cess involves

Creating a new pro cess ob ject

Creating an address space

Loading the new pro cess ob ject into the address space

Creating subpro cess and subpro cess

Loading preloaded capabilities into the address space

Setting initial program counter and stack p ointer values for subpro cess

Setting the wake up time of subpro cess

Loading the new pro cess ob ject into address space of the creating pro cess

This pro cess app ears as an atomic op eration to the pro cess issuing the Make

Pro cess system call If the system call was successful the master capability for

the new pro cess ob ject will b e returned and the new pro cess will b e loaded at the

address given in oset in the parameter blo ck

At the completion of the Make Pro cess system call the new pro cess ob ject is

loaded into the address space of the creating pro cess If the new pro cess ob ject is

larger than megabytes in size only the rst megabytes of the new pro cess ob ject

is visable Thus the pro cess which issued the Make Pro cess system call and the

new pro cess have a region of shared memory

A Initial Pro cess State

Immediately after a pro cess has b een created

The parameter blo ck of the new pro cess will contain

A PROCESS CREATION

vol Volume of master capability for pro cess

serial Serial of master capability for pro cess

pass Password of master capability for pro cess

pass Password of master capability for pro cess

srights Enco ded pro cess creation parameters

urights User rights of pro cesss master capability

limit Maximum size of pro cess ob ject hard limit

money Amount of money in pro cess ob ject pro cess cash

typ e Typ e of pro cess

maxo Maximum oset of view on pro cess ob ject

maxsz Maximum size of pro cess ob ject

maxcap Maximum numb er of capabilities

The message area will consist of a table of preloaded capabilities with the

format

vol Volume

serial Serial

pass Password

pass Password

base Start of the loaded window relative to the capability

limit Size of the loaded window Zero indicates capability limit

oset Lo cation of window in the new pro cesss address space

cindex Index in table of loaded capabilities Zero for automatic allo cation

The pro cess ob ject will b e loaded at the lo cation PROCHDADDRESS

The address space will contain all preloaded capabilities

Only subpro cess and subpro cess will exist

Subpro cess will b egin executing

The pro cess creation parameters are enco ded in the system rights eld

bits bits bits bits

Max subp message slots Max loaded caps auto load caps

msb lsb

Max subp The maximum numb er of subpro cesses for the new pro cess includ

ing subpro cess

message slots The numb er of message slots for the new pro cess As a

message slot is reserved for subpro cess the numb er of message slots must b e

or greater

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

Max loaded caps The maximum numb er of loaded capabilities for the new

pro cess This numb er includes the capability for the pro cess

auto load caps The numb er of capabilities to b e automatically loaded into

the new pro cesss address space including the capability for the new pro cess

When a pro cess is created two equal sums of money are dep osited into the

new pro cess The sums are dep osited into the pro cess cash and the pro cess ob ject

resp ectively The size of one of the dep osited sums is rep orted in the money eld

It is normal practice for the rst action of a pro cess to b e the duplication of the

information passed in at pro cess creation It is particularly imp ortant to store the

capability for the pro cess as it is not p ossible to lo cate the master capability for the

pro cess subsequently

The creation of subpro cesses other than subpro cess and subpro cess is handled

by the application using the Make Subpro cess call

A SUBPROCESS ZERO

A Subpro cess Zero

The Walnut Kernel implements two direct metho ds of communication with the ker

nel system calls and messages to subpro cess zero of a pro cess The system call

mechanism describ ed in the section A allows a pro cess to alter its own state

op erate on capabilities and send messages The subpro cess zero mechanism allows

a pro cess to control another pro cesss state

Subpro cess zero functions are accessed by sending messages to a pro cesss sub

pro cess zero The message contains a function identier and arguments On receipt

of a message to subpro cess zero the kernel interprets the instruction provided and

p erforms the required action Subpro cess zero op erations and messages are the

highest priority function of a pro cess

The currently implemented subpro cess zero functions are

Freeze Prevent pro cess from b eing scheduled

Thaw Allow pro cess to b e scheduled

Wakeup Set the wakeup time of the sp ecied subpro cess to zero

Co o ee Request the pro cess to send a status message using a sp ecied capability

Protected Freeze Prevent pro cess from b eing scheduled until all protected freezes

on the pro cess have b een thawed

Protected Thaw Allow a pro cess to b e scheduled when all other protected freezes

have b een thawed

Figure A lists the identiers and arguments of the messages

Function Function ID Arg Arg Arg Arg

Freeze

Thaw

Wakeup subp

Co o ee vol ser pass pass

Prot Freeze magic

Prot Thaw magic

Figure A Subpro cess Zero Functions and Arguments

A Freeze

On receipt of a freeze message subpro cess zero sets the pro cess state to frozen and

causes the pro cess to b e removed from the scheduler queue

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

A Thaw

When a pro cess receives a message it is placed into the scheduler queue If the

pro cess is frozen the pro cess is typically removed from the queue after the subpro cess

zero messages are parsed On receipt of a thaw message subpro cess zero sets the

pro cess state to normal and pro cess execution resumes

A Wakeup

The wakeup message sets the wakeup time of the nominated subpro cess to the

current time This allows a pro cess to start a pro cess that has susp ended activity

and has closed mail b oxes as the mail b ox allo cated to subpro cess zero cannot b e

closed

One application of this function is to allow the initialization of data structures

within a pro cess ob ject The pro cess is created with a wakeup time of never prevent

ing the scheduling of the pro cess The creating pro cess initializes the required data

structures b efore waking the created pro cess up At that stage the created pro cess

may elect to op en its mail b oxes as pro cesses are created with all but subpro cess

zeros mail b oxes closed

A Co o ee

On receiving a co o ee message subpro cess zero attempts to send a message using

the capability found in the co o ee message If the capability in the co o ee message

allows transmission to any subpro cess of a pro cess then the message will b e sent to

subpro cess one of the nominated pro cess otherwise the message will b e sent to the

subpro cess represented by the capability

The reply message is of the form

status

volume serial

The message consists of a set of words which represent the Co o ee reply identier

the volume and serial numb er of the current pro cess and a pro cess status The

pro cess status is given in gure A

State State ID Value

Normal State PROCSTATENORMAL

In Kernel Call PROCSTATEKERNEL

In Read Fault PROCSTATERFAULT

In Write Fault PROCSTATEWFAULT

Pro cess Frozen PROCSTATEFROZEN

Pro cess in Probate PROCSTATEPROBATE

Pro cess Dead PROCSTATEDEAD

Figure A Pro cess Status

A SUBPROCESS ZERO

A Protected Freeze

On receipt of a protected freeze message subpro cess zero sets the pro cess state to

frozen XORs the magic word with a key held in the pro cess state increments a count

held in the pro cess state and causes the pro cess to b e removed from the scheduler

queue This prevents other parties from thawing the pro cess unless they know the

set of magic words used in the protected freeze op erations applied to the pro cess

A Protected Thaw

On receipt of a protected thaw message subpro cess zero XORs the magic word with

a key held in the pro cess state and decrements a count held in the pro cess state If

b oth the count and key held in the pro cess state are zero then the pro cess is thawed

If the count is zero and the key is nonzero then the pro cess is terminated

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

A Subpro cesses

Subpro cesses are implemented in the Walnut Kernel as threads of execution which

share a single address space This section describ es subpro cesses and their schedul

ing

A Anatomy of a Subpro cess

When a pro cess is created a xed numb er of subpro cess slots are allo cated in the

pro cess structure These slots form the subprocess table which is used to store the

subpro cess states

When a subpro cess is created the creator sp ecies a priority which is used to

determine which subpro cess should b e scheduled the starting address of the sub

pro cess and the the address of the subpro cesss stack p ointer It is the resp onsibility

of the programmer to ensure that the stacks of subpro cesses do not overlap

Subpro cesses share the address space of the pro cess and hence have no protection

from the actions of other subpro cesses of the pro cess

A Op erations on Subpro cesses

MAKESUBP system call Subpro cesses can b e made through the use of the K

and they are destroyed by K DELSUBP Messages are sent to subpro cesses using

K SEND and K EXTSEND There are three typ es of capabilities which can b e

used to send messages capabilities which can send messages to any subpro cess of a

pro cess capabilities which can send a message to any subpro cess of a pro cess other

than subpro cess zero and capabilities which can only send messages to a particular

subpro cess The typ e of capability determines if the subpro cess parameter of the

send op eration is used

A Scheduling

Subpro cesses have the semantics of pro cesses on a time sharing system That is

when a subpro cess of a pro cess is executing no other subpro cess of that pro cess

can b e executing On the Walnut Kernel pro cesses are used to supp ort concurrent

execution

The algorithm for determining which subpro cess to run at the b eginning of a

time slice for a pro cess is as follows

If a subpro cess was executing and there is a nonzero value in the reserve eld

of the parameter blo ck resume execution of that subpro cess

Execute the subpro cess with the highest priority which is not waiting

For subpro cesses of equal priority select the rst subpro cess encountered in

the subpro cess table

A SUBPROCESSES

Before p erforming the algorithm to determine which subpro cess to schedule the

mail b oxes are scanned If a new message has arrived for a subpro cess the subpro cess

is made runnable not waiting

Subpro cesses can ensure that other subpro cesses of the current pro cess are ex

cluded from executing by setting the reserve eld to a nonzero value It is essential

that any subpro cess attempting to make a system call sets the reserve eld to the

appropriate value for the system call b efore accessing other elements of the parame

ter blo ck It is also necessary to test or copy all required values from the parameter

blo ck b efore zeroing the reserve eld after returning from a system call

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

A Messages and Mailb oxes

This section describ es the pro cesses of sending and receiving messages

A Sending Messages

SEND or K EXTSEND system calls A Messages are sent using either the K

message consists of the contents of the message area The length of the message is

variable currently up to words may b e sent and it is sp ecied by setting the

limit eld to the numb er of bytes to b e transferred

Messages are sent to pro cesses represented by a capability The capability may

b e derived to allow messages to b e sent to only one subpro cess or to allow messages

to b e sent to all subpro cesses of the pro cess If the latter typ e of capability is used

then the subpn eld contains the destination subpro cess numb er

A message will only b e sent if there is an empty mailb ox available to receive the

message at the destination pro cess An error is returned if there are no suitable

mailb oxes at the destination pro cess

A Receiving Messages

Messages are retrieved and mailb oxes are cleared by issuing a K RECV system

call A match string can b e sp ecied for the receive system call allowing the user

program to control the order in which messages are retrieved from mailb oxes

When there is a message in a mailb ox waiting to b e received the wakeup time of

WAIT the subpro cess is set to the current time This nullies the eect of any K

system calls

A Mailb oxes

Mailb oxes have indep endent parameters which determine whether or not they will

accept a message state prex and subpro cess

The state of the mailb ox

Op en The mailb ox is prepared to accept a message that meets the other criteria

Closed The mailb ox will not accept messages

A message prex consists of a string of characters

Nonzero length Only messages starting with the prex string are accepted

The length of the prex string is sp ecied in bytes

Zero Length Accept any message meeting the other criteria

Mailb oxes may accept messages for sp ecied subpro cesses

A MESSAGES AND MAILBOXES

Subpro cess Only messages intended for the sp ecied subpro cess are ac

cepted

Subpro cess Accept any message meeting the other criteria

If a message matches the mailb oxs criteria and the mailb ox is empty then the

message is placed in the mailb ox The criteria are used to ensure that mailb oxes are

available for particular typ es of messages The rst available mailb ox that accepts

the message is used

RECV CLOSE K ACCEPT MAIL and K CLOSE BOX system The K

calls are used to manipulate the parameters of the mailb ox

Both the K CLOSE BOX and the K RECV CLOSE system calls close mail

b oxes The K RECV CLOSE receives a message from a mailb ox and then closes

the mailb ox from which the message was extracted The K ACCEPT MAIL sys

tem call op ens a mailb ox and sp ecies the parameters which determine the messages

the mailb ox will accept

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

A Exceptions

This section describ es the handling of exceptions by pro cesses under the Walnut

Kernel The default b ehavior of the Walnut Kernel is to terminate any pro cess

which encounters an exception This b ehavior can b e mo died by using trap hand ling

subpro cesses

Typ es of Exception A

The Walnut Kernel detects the following exceptions

FPFAULT All exceptions relating to errors in arithmetic This may include oat

ing p oint exceptions integer arithmetic exceptions dividing by zero overow

and underow The typ es of errors detected by this exception are pro cessor

dep endent

OPFAULT This exception is raised when an invalid instruction is parsed by the

pro cessor

ADDRSFAULT This exception is used to catch all errors relating to addresses

It is raised under the following conditions

An unmapp ed region of the address space has b een accessed

A write has b een attempted on a readonly area of memory

A read or write has b een attempted on an privileged area of memory

An ob ject could not b e automatically expanded to accommo date the

attempted access due to the lack of unreserved space on the volume

DBFAULT This exception is raised whenever a debug exception is raised by the

pro cessor This exception is pro cessor dep endent

ALIGNFAULT This exception is raised on unaligned accesses This exception

is pro cessor dep endent

A Trap Handling Subpro cesses

A trap handler is a normal subpro cess which has b een nominated to receive trap

messages for a given subpro cess The K SETTRAP system call is used to inform

the kernel where trap messages should b e sent The set trap system call takes two

arguments the subpro cess for which traps are to b e handled and the subpro cess

which will handle the trap

A subpro cess cannot handle its own traps If a subpro cess traps and the trap

message is to b e sent to the same subpro cess then the pro cess will b e terminated

When an exception o ccurs in a subpro cess which has a nominated trap handler

the subpro cess with the fault is marked DEAD its wake up time is set to NEVER

A EXCEPTIONS

and a message is sent to the trap handler The format of the message is discussed

in section A

The trap handler can examine and alter the state of the dead subpro cesses reg

LOADREG and K SAVEREG system calls ister sets through the use of the K

The subpro cess can b e restored to op eration through the use of the K MAKESUBP

system call

A The Trap Message

A ve word message is sent see gure A to the trap handling subpro cess The

words of the message are

Message Typ e this word indicates that the message is the result of an excep

tion The failure message identier is xffff

Subpro cess Numb er the subpro cess numb er of the subpro cess in which the

exception o ccured

Fault Identier a co de which identies the typ e of exception which o ccured

see table A

Pro cessor Error Co de a pro cessor dep endent error co de for nonoating p oint

op erations

Floating Point Error Co de a pro cessor dep endent error co de for oating p oint

op erations

The error co des are pro cessor dep endent and are only returned where relevant to

the cause of the exception

xffff Subpro cess Fault Pro cessor FP

Numb er Identier Err Co de Err Co de

Figure A Structure of the Failure Message

Mnemonic Description Value

FPFAULT Floating Point Fault

OPFAULT Op co de Fault

ADDRSFAULT Address Fault

DBFAULT Debug Fault

ALIGNFAULT Alignment Fault

Table A Error Identier Values

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

A System calls

All system calls implemented within the Walnut Kernel use the parameter blo ck to

contain all the parameters of the call There is only one parameter blo ck p er pro cess

To prevent subpro cesses from altering the parameter blo ck while another subpro cess

is setting up or receiving the results of a system call it is essential that the reserve

eld b e set to a nonzero value while a subpro cess manipulates the parameter blo ck

Setting the reserve eld to a value prevents any other subpro cess of a pro cess b eing

run until the reserve eld is cleared

A Pro cedure

How to make a system call

Put the call numb er in the parameter blo cks reserve eld

Fill in necessary parameters

Call system call

After a successful system call has b een completed

Copy any desired information out of the parameter blo ck

Set the reserve eld to zero

After an unsuccessful system call er r or

Copy the error co de and any other desired information out of the parameter

blo ck

Set the reserve eld to zero

A SYSTEM CALLS

A Available System Calls

This section describ es the currently available system calls on the Walnut Kernel and

the parameters required for those calls

Make Ob ject

Name Symb ol Value

Make Ob ject K MAKEOBJ

Input Parameters

vol Volume on which to create ob ject

srights System rights

urights User rights

limit Highest byte oset of ob ject hard limit

money Initial money

typ e Ob ject typ e

maxo Highest byte oset of ob ject soft limit

maxsz Maximum size of ob ject

maxcap Maximum numb er of capabilities including master

Output Parameters

vol Master capability volume

serial Master capability serial

pass Master capability password

pass Master capability password

srights Master capability system rights

urights Master capability user rights

limit Highest byte oset of ob ject hard limit

money Initial money

typ e Ob ject typ e

maxo Highest byte oset of ob ject soft limit

maxsz Maximum size of ob ject

maxcap Maximum numb er of capabilities including master

Description

This call creates an ob ject of the size sp ecied on the volume sp ecied The

ob ject will have the rights dictated by the srights urights eld

Before using the limit value it is transformed

BI GLI M I T if l imit

l imit

limit otherwise

To create a new ob ject the following preconditions must b e met l imitxf f

maxof f l imit l imit B I GLI M I T and maxsz B I GLI M I T

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

Derive Capability

Name Symb ol Value

Derive Capability K MAKECAP

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

srights System rights mask

urights User rights mask

base Oset from the b eginning of existing view

limit Size of derived view

money Drawing limit of capability

subpn New password if subpn

cindex New password if subpn

Output Parameters

vol Volume

serial Serial

pass Derived capabilities password

pass Derived capabilities password

srights Derived capabilities system rights

urights Derived capabilities user rights

base Cleared by call

limit Maximum size of derived view

money Drawing limit of capability

typ e Drawing limit of derived capability

Description

This capability derives a capability from a given capability The new capability

may have weaker rights andor a smaller view of an ob ject Note that the suicide

right may b e added to a derived capability

Attempts to derive capabilities from a capability without the SRMUTLILOAD

right always have the same pass as the original capability

If limit is set to then the view of the derived capability will extend from the

base to the end of the view provided by the original capability

The following preconditions must b e met v iew l imit base and l imit

A SYSTEM CALLS

Delete Capability

Name Symb ol Value

Delete Capability K DEL

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

Output Parameters

Description

Deletes the capability sp ecied if the capability has suicide right and all of its

derivatives if the capability has derive right

Delete Derived Capabilities

Name Symb ol Value

Delete Derived Capabilities K DELDER

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

Output Parameters

Description

Deletes all of the derivatives of the sp ecied capability if the capability has

derive right

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

Resize Ob ject

Name Symb ol Value

Resize Ob ject K RESIZE

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

limit New limit

maxo New maximum oset

maxsz New maximum size

maxcap New maximum numb er of capabilities

Output Parameters

Description

Resizes an ob ject to the values given in limit maxo and maxsz If maxcap

is greater than the current numb er of p ermitted capabilities then the numb er of

capabilities is increased otherwise maxcap is ignored

Preconditions to b e sp ecied

Shrink Ob ject

Name Symb ol Value

Shrink Ob ject K SHRINK

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

Output Parameters

Description

Shrinks the ob ject to a size just sucient to contain its current contents and

sets the limits to make this the maximum size of the ob ject The ob jects limit

maximum oset maximum size and maximum numb er of capabilities are

altered

Preconditions to b e sp ecied

A SYSTEM CALLS

Wait

Name Symb ol Value

Wait K WAIT

Input Parameters

clo cktime Wakeup time

Output Parameters

Description

Provided there are no outstanding messages this call puts the subpro cess to sleep

until either a message arrives or the wakeup time has b een reached The wakeup

times of and have sp ecial meanings

Surrender the remainder of time slice

Set no wakeup time Awake only when sent a message

Wakeup times are in seconds and are absolute Relative wakeup times can b e

created by adding a value to the time found in clo cktime

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

Load Capability

Name Symb ol Value

Load Capability K LOADCAP

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

base Oset from start of view

limit Size of window to b e loaded

oset Logical address of load lo cation

cindex Capability index

Output Parameters

vol Volume

serial Serial

pass Password

pass Password

srights System rights of capability

urights User rights of capability

base Oset from start of view

limit Size of window loaded

money Drawing right or money provided by capability

oset Logical address of load lo cation

cindex Capability index

Description

Loads a view or part of view provided by a capability into the pro cesses address

space

To nominate the capability index of the loaded capability a nonzero cindex

should b e provided to an empty slot in the table of loaded capabilities If cindex

is zero then a value will b e automatically allo cated

The kernel can b e requested to load a capability at a suitable address to contain

the view of the ob ject The following table gives the values of oset and their

meanings

load anywhere preferably a large window

load anywhere preferably a small window

load as a large window

load as a small window

All other values of oset are interpreted as sp ecic addresses The value of oset

is truncated to give a page b oundary for small windows or a segment b oundary for

large windows

Limit gives the size of the window to b e loaded A limit of zero sp ecies that

the limit sp ecied by the capability should b e used

A SYSTEM CALLS

Unload Capability

Name Symb ol Value

Unload Capability K UNLOADCAP

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

oset Oset of window to b e unloaded

cindex Index in table of load capabilities of capability to b e unloaded

Output Parameters

limit Limit of freed window

oset Oset of freed window

cindex Index of freed window

Description

then the Unloads a capability from address space of the pro cess If oset

capability vol serial pass pass will b e unloaded If oset then the capa

bility lo cated at index cindex in the table of loaded capabilities will b e unloaded

Otherwise the capability at the lo cation oset will b e unloaded

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

Identify Capability

Name Symb ol Value

Identify Capability K CAPID

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

oset Oset

cindex Index in table of load capabilities

Output Parameters

vol Volume of loaded capability

serial Serial of loaded capability

pass Password of loaded capability

pass Password of loaded capability

srights System rights of loaded capability

urights User rights of loaded capability

limit Limit of loaded capability

oset Oset of loaded capability

cindex Index of loaded capability

Description

Fills in the rights limit oset and cindex for a loaded capability If oset

then information for the capability vol serial pass pass will b e returned

If oset then the information for the capability lo cated at index cindex in

the table of loaded capabilities will b e returned Otherwise information for the

capability loaded at lo cation oset will b e returned

A SYSTEM CALLS

Make Pro cess

Name Symb ol Value

Make Pro cess K MAKEPROC

Input Parameters

vol Volume to create new pro cess on

srights Enco ded pro cess parameters

urights User rights of new pro cess

base Start up time for new pro cess

limit Highest byte oset of ob ject hard limit

money Money to b e transferred to new pro cess

typ e Typ e of new pro cess

maxo Maximum oset of new pro cess ob ject soft limit

maxsz Maximum size of new pro cess ob ject soft limit

maxcap Maximum numb er of capabilities for new pro cess soft limit

oset Oset at which to load new pro cess ob ject

cindex Index in table of loaded capabilities for new pro cess ob ject

Output Parameters

vol Master capability volume

serial Master capability serial

pass Master capability password

pass Master capability password

urights User rights of new pro cess

limit Limit of new pro cess ob ject

money Money dep osited in new pro cess

typ e Typ e of new pro cess

maxo Maximum oset of new pro cess ob ject

maxsz Maximum size of new pro cess ob ject

maxcap Maximum numb er of capabilities for new pro cess

oset Oset of new pro cess ob ject

cindex Index of new pro cess ob ject in table of loaded capabilities

Description

Make Pro cess creates an ob ject loads the ob ject into the current pro cesss ad

dress space and lls in the pro cess state information for the new pro cess

Initially this call creates an ob ject of the size sp ecied on the volume sp ecied

with user rights dictated by the urights eld and system rights set to SRPRO

CESSMASTER

Before using the limit value it is transformed

B I GLI M I T if l imit

l imit

limit otherwise

The new ob ject is created if the following preconditions are met l imitxf f

maxof f l imit l imit B I GLI M I T and maxsz B I GLI M I T

The ob ject is then loaded into the pro cesss address space at either a nominated

lo cation or an automatically allo cated lo cation The lo cation is determined by the

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

value of oset If oset is either or then the kernel will allo cate a suitable large

window automatically and load the ob ject at that lo cation otherwise the ob ject will

b e loaded at the segment b oundary sp ecied in oset

The capability index of the loaded capability may b e nominated by sp ecifying a

cindex for to an empty slot in the table of loaded capabilities If cindex is zero

then a value will b e automatically allo cated

A pro cess is then created with the parameters dictated by the srights eld The

srights eld is interpreted as four elds of bits

bits bits bits bits

Max subp message slots Max loaded caps auto load caps

msb lsb

Max subp The maximum numb er of subpro cesses for the new pro cess includ

ing subpro cess

message slots The numb er of message slots for the new pro cess As a

message slot is reserved for subpro cess the numb er of message slots must b e

or greater

Max loaded caps The maximum numb er of loaded capabilities for the new

pro cess This numb er includes the capability for the pro cess

auto load caps The numb er of capabilities to b e automatically loaded into

the new pro cesss address space including the capability for the new pro cess

The rst four words of the message area contain the initial values of the program

counter and stack p ointer for subpro cess The values are enco ded

message area index for PC

initial PC

message area index for SP

initial SP

The index is the index of a capability in the table of loaded capabilities If an

index value of zero is supplied the initial values are treated as logical addresses

instead of as a byte oset from the start of a capability

The next four words contain the capability of the new pro cesss heir The heir is

notied in case of the death of the pro cess The message sent contains the remaining

cash If this eld contains zero then the master capability for the creating pro cess

is used as the heir

The remainder of the parameter page contains a list of capabilities to b e pre

loaded into the new pro cesss address space The list is comp osed of records of the form

A SYSTEM CALLS

vol Volume

serial Serial

pass Password

pass Password

base Start of the loaded window relative to the capability

limit Size of the loaded window Zero indicates capability limit

oset Lo cation of window in the new pro cesss address space

cindex Index in table of loaded capabilities Zero for automatic allo cation

The creating pro cess will have twice the value indicated in money deducted

from its cash This money will b e transferred equally to the new pro cesss cash and

new pro cesss pro cess ob ject

The pro cess is scheduled to wake up at the time given in base with the wakeup

times of and having the sp ecial meanings

Wake up immediately

Set no wakeup time Awake only when sent a message

Information relating to the new pro cess ob ject is returned to the creating pro cess

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

Send Message

Name Symb ol Value

Send Message K SEND

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

money Amount of money to b e sent to pro cess

limit Size of message in bytes

oset Oset

subpn Subpro cess numb er to send message to

cindex Index in table of load capabilities

Output Parameters

srights System rights of loaded capability

urights User rights of loaded capability

money Amount of money sent to pro cess

limit Size of message in bytes

oset Oset of loaded capability

cindex Index of loaded capability

Description

Sends a message to a pro cess which is loaded into the address space of the sender

If oset then the message will b e sent to vol serial pass pass provided

pro cess ob ject is loaded into the senders address space If oset then the

message will b e sent to the pro cess with its pro cess ob ject loaded at index cindex

in the table of loaded capabilities Otherwise the message will b e sent to the pro cess

with its pro cess ob ject loaded at lo cation oset The message length is sp ecied in

limit in bytes The message to b e sent is lo cated at the b eginning of the message

area A p ositive amount of money money is removed from senders cash and

sent with the message

A SYSTEM CALLS

Receive Message

Name Symb ol Value

Receive Message K RECV

Input Parameters

limit Size of match string

Output Parameters

money Amount of money received with message

limit Size of message

Description

Recovers message from a subpro cesss message queue If limit is nonzero then

only a message which matches the rst limit characters found in the match string

will b e recovered The match string is found at the b eginning of the message area

The message received is placed into the message area

If no message is present an error co de is returned

External Send Message

Name Symb ol Value

External Send Message K EXTSEND

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

money Amount of money to b e sent to pro cess

limit Size of message in bytes

subpn Subpro cess numb er to send message to

Output Parameters

money Amount of money sent to pro cess

limit Size of message in bytes

Description

Sends a message to the pro cess vol serial pass pass The message length

is sp ecied in limit in bytes The message to b e sent is lo cated at the b eginning

of the message area A p ositive amount of money money is removed from the

senders cash and sent with the message

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

External Read Memory

Name Symb ol Value

External Read Memory K EXTREAD

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

limit Numb er of bytes to b e read

oset Oset in bytes from start of capability

Output Parameters

vol Volume

serial Serial

pass Password

pass Password

limit Numb er of bytes read

oset Oset in bytes from start of capability

Description

Reads limit bytes from oset oset in capability vol serial pass pass The

bytes read are stored at the start of the message area

External Write Memory

Name Symb ol Value

External Write Memory K EXTWRITE

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

limit Numb er of bytes to b e written

oset Oset in bytes from start of capability

Output Parameters

vol Volume

serial Serial

pass Password

pass Password

limit Numb er of bytes written

oset Oset in bytes from start of capability

Description

Writes limit bytes from oset oset in capability vol serial pass pass The

bytes to b e written are stored at the start of the message area

A SYSTEM CALLS

Bank

Name Symb ol Value

Bank K BANK

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

money Amount of money to b e transferred from capability to cash

Output Parameters

vol Volume

serial Serial

pass Password

pass Password

srights System rights of capability

urights User rights of capability

limit Size of view in bytes

money Drawing limit available to capability

Description

Transfers money from cash from the calling pro cess to the capability vol serial

pass pass Both p ositive and negative amounts of cash may b e transferred

If money is p ositive then the capability must have dep osit right to p erform the

transfer If money is negative then the capability must have withdraw right to p erform the transfer

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

Restrict Rights

Name Symb ol Value

Restrict Rights K RESTRICT

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

srights System rights mask

urights User rights mask

Output Parameters

vol Volume

serial Serial

pass Password

pass Password

srights System rights of capability

urights User rights of capability

Description

Reduces the rights of a capability by p erforming a bitwise and of the rights masks

supplied with the rights bitmaps of the capability vol serial pass pass

The capability named must have suicide right for restrict to op erate

A SYSTEM CALLS

Capability Status

Name Symb ol Value

Capability Status K CAPSTAT

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

Output Parameters

vol Volume

serial Serial

pass Password

pass Password

srights System rights of capability

urights User rights of capability

base Cleared by call

limit Limit of view of capability

money Withdrawal right of capability

typ e Typ e of ob ject

maxo Maximum oset of ob ject

maxsz Maximum size of ob ject

maxcap Maximum numb er of capabilities for ob ject

Description

Returns details of capability vol serial pass pass and asso ciated ob ject

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

Rename Capability

Name Symb ol Value

Rename Capability K RENAME

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

Output Parameters

vol Volume

serial Serial

pass Password

pass Password

base Cleared by call

Description

Changes the passwords of capability vol serial pass pass to a new pair of

random values

A precondition to this call is that the capability has suicide right In addition

the master capability of a pro cess cannot b e renamed

A SYSTEM CALLS

Make Subpro cess

Name Symb ol Value

Make Subpro cess K MAKESUBP

Input Parameters

base Start up time for new subpro cess

limit Priority of new subpro cess

subpn Subpro cess numb er

Output Parameters

base Start up time for new subpro cess

limit Priority of new subpro cess

subpn Subpro cess numb er of new subpro cess

Description

Creates a new subpro cess of the current pro cess If subpn is not zero and no

subpro cess of the current pro cess has b een allo cated that numb er then the subpro

cesss numb er will b e subpn The priority is set to the least bits of limit The

subpro cess is scheduled to wake up at the time given in base with the wakeup times

of and having the sp ecial meanings

Wake up immediately

Set no wakeup time Awake only when sent a message

The rst four words of the message area contain the initial values of the program

counter and stack p ointer for the new subpro cess The values are enco ded

message area index for PC

initial PC

message area index for SP

initial SP

The index is the index of a capability in the table of loaded capabilities If an

index value of zero is supplied the initial values are treated as logical addresses

instead of as a byte oset from the start of a capability

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

Delete Subpro cess

Name Symb ol Value

Delete Subpro cess K DELSUBP

Input Parameters

subpn Subpro cess numb er

Output Parameters

subpn Subpro cess numb er of deleted subpro cess

Description

Deletes subpro cess subpn Note that neither subpro cess nor can b e deleted

Load Register Set

Name Symb ol Value

Load Register Set K LOADREG

Input Parameters

subpn Subpro cess numb er

Output Parameters

subpn Subpro cess numb er

Description

Copies the structure sysstate at the start of the message area into subpro cess

table entry subpn

Save Register Set

Name Symb ol Value

Save Register Set K SAVEREG

Input Parameters

subpn Subpro cess numb er

Output Parameters

subpn Subpro cess numb er

Description

Copies the structure sysstate from subpro cess table entry subpn into the start of the message area

A SYSTEM CALLS

Set Trap

Name Symb ol Value

Set Trap K SETTRAP

Input Parameters

oset Subpro cess numb er to send trap message to

subpn Subpro cess numb er whose trap is b eing set

Output Parameters

oset Subpro cess numb er to send trap message to

subpn Subpro cess numb er whose trap is b eing set

Description

Sets the destination subpro cess for trap messages Subpro cess oset is notied

of faults in subpro cess subpn

Receive Message and Close Box

Name Symb ol Value

Receive Message Close K RECV CLOSE

Input Parameters

limit Size of match string

Output Parameters

money Amount of money received with message

limit Size of message

Description

Recovers message from a subpro cesss message queue and closes the mail b ox the

message is recovered from If limit is nonzero then only a message which matches

the rst limit characters found in the match string will b e recovered The match

string is found at the b eginning of the message area The message received is placed

into the message area

If no message is present an error co de is returned

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

Accept Mail

Name Symb ol Value

Accept Mail K ACCEPT MAIL

Input Parameters

limit Size of match string

subpn subpro cess for which mail b ox is reserved

Output Parameters

Description

Op ens a mail b ox for a subpro cess and sets the acceptance string for the mail

b ox The mail b ox is taken from the p o ol of closed mail b oxes and set to receive

messages for a sp ecic subpro cess subpn or if subpn is xFF the mail b ox can b e

used for any subpro cess

If limit is nonzero then the mail b ox created will only accept messages which

match the rst limit characters found in the match string when the mail b ox is

op ened The match string is found at the b eginning of the message area

Close Mail Box

Name Symb ol Value

Close Matching Mail Boxes K CLOSE BOX

Input Parameters

limit Size of match string

subpn subpro cess for which mail b ox is reserved

Output Parameters

base Numb er of mail b oxes closed by op eration

Description

Closes mail b oxes which match the closing criteria If subpn equals xFF and

limit is zero then all user mail b oxes will b e closed If limit is nonzero then only

user mail b oxes with match strings matching the rst limit characters of the match

string found at the b eginning of the message area will b e closed If subpn is nonzero

then only user mail b oxes for subpro cess subpn are closed

A SYSTEM CALLS

Copy Ob ject

Name Symb ol Value

Copy Ob ject K COPYOBJ

Input Parameters

vol Volume original

serial Serial original

pass Password original

pass Password original

srights System rights mask

urights User rights mask

base Start of copy relative to b eginning of original

limit End of copy relative to base

money Money to b e transferred to copy

typ e Typ e of copy

maxsz Maximum size of copy

maxcap Maximum numb er of capabilities of copy

Output Parameters

vol Volume copy

serial Serial copy

pass Password copy

pass Password copy

srights System rights of copy

maxo Maximum oset of copy

maxcap Maximum numb er of capabilities of copy

Description

Duplicates an ob ject by creating a new ob ject and copying the contents of the

original ob ject to the new ob ject This call copies only the dened pages of an ob ject

and hence pro duces an exact duplicate of the contents of the section of the ob ject

referred to by the capability for the original ob ject The rights elds allow the rights

of the copy to b e reduced as the rights mask and the rights elds are combined by

a bitwise AND to pro duce the copys rights eld The money eld indicates the

amount of money to b e transferred from the pro cess cash to the new ob ject The

maxsz eld sp ecies the maximum size of the new ob ject The typ e eld sp ecies

the typ e of the copy The base eld sp ecies the start of the the copy region which

extends through to limit If the limit and base elds are zero then the complete

ob ject is copied

NOTE

This call will not duplicate pro cesses

This call corrupts the rst four words of the message area

APPENDIX A USER LEVEL PROGRAMMERS GUIDE

Check Pro cess State

Name Symb ol Value

Peek Pro cess K PEEK PROC

Input Parameters

vol Volume

serial Serial

pass Password

pass Password

Output Parameters

srights State of pro cess

base Wakeup time

Description

Returns the state and wakeup time of a pro cess given a suitable capability ca

pability must have SRPEEK right for the pro cess The wakeup time is returned

in base and the pro cess state in srights The pro cess state is enco ded

Value State

No such pro cess

No right to inquire

Pro cess normal

Pro cess in kernel

Pro cess in read fault

Pro cess in write fault

Pro cess frozen

Pro cess in probate

Pro cess dead

Set Heir of Pro cess

Name Symb ol Value

Set Heir K SET HEIR

Input Parameters

vol Volume of heir

serial Serial of heir

pass Password of heir

pass Password of heir

Output Parameters

Description

Set the heir of a pro cess to the capability vol serial pass pass The heir of

a pro cess receives a pro cesss death message and any remaining cash

App endix B

Formal Description of Restrict

The eect of the restrict function on the ability to provide data connement is

formally expressed in this app endix The following notation will b e used

C Capability

R Rights Set of a Capability

M Rights Mask

O Origin of the subtree

Sup erscripts identify the lo cation of an element relative to the origin of the

O

subtree O in the subtree For example C is the second child of the third child

of the capability at the origin of the subtree Figure B illustrates a subtree of the

capability tree using the notation

The notion of the depth of a no de is used in the discussion The formal denition

of the depth of a no de relative to the origin of the subtree is provided in equations

B and B

The depth of the origin is dened to b e zero

O

C B depth

The depth of other elements is found using the recurrent relationship

O O m mn

depthC depthC B

The rights used in this analysis consist of the union of system and user rights

excluding suicide right see sections and for the rational of the suicide

APPENDIX B FORMAL DESCRIPTION OF RESTRICT

C

R

O

M

O O

M M

O O O

C C C

O O O

R R R

O O

M M

O O

C C

O O

R R

Figure B A Subtree of a Capability Tree

rights exceptional b ehavior and the ability to access a range of the contents of the

ob ject Derived capabilities have a subset of the parent capabilities rights and have

access to a subrange of the ob jects address space Neither of these criteria is strict

so derived capabilities may have equivalent rights and ranges to their parents

In practice the selection of pages for a derived capability is p erformed by using an

oset from the base of the parent capability and an extent The extent is restricted

to b e less than or equal to the top of the range of the parents entitlements For

notational convenience the accessible pages of an ob ject will b e considered to b e

a set of access rights and the rights mask will b e extended to represent access to

regions of the ob ject The restriction that the set b e contiguous will b e implicit

throughout this discussion

When a capability is derived its rights are determined by applying the logical

and op erator to the rights of the parent capability and the mask

O mn O m O mn

R R M

this is equivalent to the set op eration

O m O mn O mn

R M R

Thus the following prop erty is true at the time of derivation of a capability

O m O mn

R R

In the PasswordCapability System the relationship is static and hence remains true

O m

The relationship b etween the rights of a capability C and the rights of its

O mn

descendents C can b e stated

O mn O m

R R B

The rights of capabilities display the prop erties of a heap in that capabilities at

a greater depth than their ancestors are guaranteed to b e no more p owerful than

their ancestors This prop erty is used to assure users that in giving a capability to

another user the other user cannot use that capability to generate a capability or

use the capability in a way which allows the other user to gain access to rights not

explicitly conveyed by the capability

The intro duction of the restrict op erator to the Walnut Kernel invalidates the

prop erty dened in equation B To show that the restrict op erator do es not make

the system less able to protect users interests in restricting information ow it is

necessary to prove that a new criterion exists of equal or greater strength than the

heap criterion of equation B

An enhanced notation is required to handle the description of the restrict op

eration This notation intro duces a subscript to the string used to describ e the

p osition of a capability in the subtree of capabilities The new subscript denotes

the numb er of restrict op erations that have b een applied to a no de Figure B

illustrates a subtree of capabilities where the restrict op eration has b een applied

to a numb er of the no des The application of the restrict op eration generates a

new tree subtree of the original subtree The origin of the new subtree overlaps the

restricted no de No new no des can b e generated from the restricted no de however

its children are unaected

When a capability is restricted its rights are determined by applying the logical

and op erator to the rights of the original capability and the restrict mask The

restrict mask is designated by the letter M

O m O m O m

i j i j +1 i j +1

R M R

APPENDIX B FORMAL DESCRIPTION OF RESTRICT

0

0

C

O 0

1 0

M R

O

1

C

O 1

0 0

M

O O 2

1 0 0

R M

O 2 O 1 O 1 O 2

1 0 1 0 0 0 0 0

M M C C

O 1 O 2

0 0 0 0

R R

O 2

0 1

M

O 2 O 1

1 0 1 0

C C

O 2

0 1

C

O 2 O 1 O 1 2 O 1 1

1 0 1 0 0 0 0 0 0 0

R R M M

O 2

0 1

R

O 2

0 2

M

O 1 1 O 1 2

0 0 0 0 0 0

C C

O 2

0 2

C

O 2 1

0 1 0

M

O 1 1 O 1 2

0 0 0 0 0 0

R R

O 2

0 2

R

O 2 1

0 1 0

C

O 2 1

0 1 0

R

Figure B A Subtree of a Capability Tree Enhanced Notation

this is equivalent to the set op eration

O m O m O m

i j i j +1 i j +1

R M R B

As restrict may only b e applied to the last version of a capabilitys set of rights

and there is no inverse op eration the following prop erty is true

O m O m

i j +n i j

R R

Furthermore the prop erty

O m n O m

i j i j

k

R R B

is true as a tree with the heap prop erty can b e constructed that extends from

the origin down to the leaf by selecting subtrees with origins listed in the path as

memb ers of the of the tree instead of the no des that represent the latest revisions

of the restricted capabilities Figure B illustrates such a tree derived from gure

O

0 1 0

B for the no de C

0

0

C

0

0

R

O 1 O 2

0 0 0 0

M M

O 2

0 1

C

O 1

0 0

C

O 2

0 1

R

O 1

0 0

R

O 2 1

0 1 0

M

O 1 2 O 1 1

0 0 0 0 0 0

M M

O 2 1

0 1 0

C

O 1 1 O 1 2

0 0 0 0 0 0

C C

O 2 1

0 1 0

R

O 1 1 O 1 2

0 0 0 0 0 0

R R

O

0 1 0

Figure B ATree with the Heap Prop erty for C

The presence of such a tree assures users that there has b een no loss of security

through the intro duction of the restrict op erator Comp osing equations B and

B provides

O m n O m n

i j +1 i j

k l

R R

O n O n

i i

k l

w her e M M

O m

i j +1

results in a less p owerful capability that is if R If the restriction at m

j

O m

i j

R then the op eration may b e viewed as a way of trimming the tree of p otential

O m O m

i j +1 i j

M branches The branches eliminated could have held rights in R

The ability to trim the tree of p otential capabilities enhances the ability of the

Walnut Kernel to control access to ob jects and services

APPENDIX B FORMAL DESCRIPTION OF RESTRICT

App endix C

Hardware Description

This app endix is drawn from a pap er presented at ACSC The Monash Secure

RISC Multiprocessor Multiple Processors Without a Global Clock This pap er ap

p ears as

Title Monash secure RISC multiprocessor

Multiple processors without a global clock

Author Maurice D Castro and Ronald D Pose

Journal Australian Computer Science Communications

Proceedings of the Seventeenth Annual Computer Science

Conference ACSC Christchurch New Zealand

Date January

Editor Gupta G

Pages

I present it in supp ort of my thesis

APPENDIX C HARDWARE DESCRIPTION

Monash Secure RISC Multipro cessor The

Multiple Pro cessors Without a Global Clo ck

y

Maurice Castro and Ronald Pose

Department of Computer Science

Monash University

Clayton Vic Australia

Phone

Fax

Abstract

The goal of the Secure Monash RISC Multipro cessor Pro ject is to pro

duce a p owerful general purp ose scalable multiuser multipro cessor com

puter The requirement for synchronous clo cks can b e a ma jor limita

tion on b oth physical layout and electrical design Instead of attempt

ing to provide a global clo ck over all pro cessors we are developing a

novel design which has clo cks lo cal to each pro cessor and a self clo cked

bus with asynchronous arbitration The overall system architecture

stresses the ease of scalability by integrating a small switch into the

basic pro cessormemory mo dule eectively distributing the intercon

nection network hardware across all no des

which must b e clo cked synchronously Intro duction

After calculating an approximate size for

a pro cessor no de b oard and hence the size

During the initial stages of the design

of a small medium and large system it

of the Monash Secure RISC Multipro

was apparent that it would b e extremely

cessor it b ecame clear that the exist

dicult to provide a clo ck of the required

ing interconnection networks and bus ar

frequency that would b e suciently well

chitectures for multipro cessing comput

aligned at the pro cessors bus interfaces

ers could not satisfy a numb er of the

to b e useful for transferring data across

design goals of the pro ject adequately

the interpro cessor buses

The design required an interconnection

structure that was easily scalable suf

To overcome these problems a self

clo cking bus structure was prop osed with ciently versatile to solve general prob

asynchronous arbitration This system lems yielded high p erformance for a

allows the retention of the design goal large class of problems and a high de

gree of fault tolerance

of easy scalability and avoids the re

quirement that the system have a global

A novel mesh based interconnection

clo ck This p ermits the construction of

scheme was develop ed for the bus struc

a fully distributed system with a passive

ture A consequence of this scheme was

interconnection scheme

a ma jor clo ck distribution problem there

The use of FIFOs to decouple a bus are multiple paths of diering lengths

mauricebrucecsmonasheduau

y

rdpbrucecsmonasheduau

1

The scheme used for this bus structure will b e the sub ject of a future pap er

DESIGN GOALS

clo ck from a pro cessor clo ck is not un Single active mo dule typ e

usual however the use of a deep FIFO

All the multipro cessor no des are in

capable of holding a numb er of complete

terconnected by passive backplanes Al

transactions with the aim of preventing

though active interconnections oer a

the bus b eing lo cked by a partially com

wider scop e for avoiding the problems

plete transfer is a notable design feature

of clo ck distribution we chose a passive

backplane b ecause of its inherent advan

Design Goals

tages of simplicity and reliability

A single typ e of active mo dule oered

The design goals of the Monash Secure

advantages in the service and design of

RISC Multipro cessor pro ject are

the machine In case of failure a mo d

ule can b e unplugged and replaced eas

Scalability The system will b e

ily The design eort is reduced as only

constructed from mo dular comp o

a single mo dule needs to b e designed

nents enabling the construction of

single pro cessor workstations mul

This arrangement also makes the ex

tipro cessor workstations medium

pansion of the system very simple Extra

size multipro cessors and large clus

mo dules are purchased and plugged to

tered multipro cessors

gether without requiring any active inter

connection comp onents Unlike currently

High Performance The design will

available massively parallel machines all

minimize the p otential for p erfor

the required switching logic is integrated

mance limiting b ottlenecks in the

into the basic mo dule There are no

bus structure

centralized unscalable resources such as

active interconnection networks used in

Flexibility The architecture

other massively parallel machines

should b e suciently exible to

supp ort a variety of algorithms

rather than b eing tuned to a sp e

Satisfying the Design Cri

cic class of algorithms

teria

Fault Tolerance The system

should provide for the failure

The design prop osed for this pro ject at

of pro cessors or communications

tempts to satisfy the pro ject goals by

paths in a multipro cessor and pro

using an interconnection scheme which

vide some means of isolating the

incorp orates b oth the features of a bus

faulty comp onent and continuing

sharing network and a switched network

op eration

Design Criteria

Design Decisions

Scalability

Two key design decisions were made at

To achieve scalability in small machines

the b eginning of the pro ject which have

a classical SMP Symmetric Multi

strongly inuenced the design

Pro cessor design is used for communica

Passive interconnection tion within a pro cessor b oard This gives

2

The FutureBus uses this technique

APPENDIX C HARDWARE DESCRIPTION

cost eective scaling for a small numb er tem utilization Furthermore it is p os

of pro cessors and allows cache consis sible for the op erating system to relo

tency to b e achieved using bus sno oping cate pro cesses and their memory pages

dynamically so that the path length b e

To build larger systems the pro cessor

tween pro cessors and data is minimized

b oards are linked by a meshlike network

hence reducing the overall latency and

This network is essentially a store and

improving p erformance

forward network where data is passed

from pro cessor b oard to pro cessor b oard

over a shared external bus A two part

Flexibility

communication proto col is used to ensure

By adopting a meshlike external con

that lost messages are accounted for It

nection it is p ossible to provide reason

is not p ossible to guarantee cache consis

able p erformance for a variety of algo

tency with this system as only pro cessors

rithms The system as it is b eing imple

on the path of a memory transaction are

mented allows for a variety of intercon

able to see the contents of that transac

nection top ologies with dierent p erfor

tion

mance tradeos

Performance

Fault Tolerance

Performance in a multipro cessor system

A dual p orted design is employed which

may b e limited by several factors Of

ensures that each pro cessor b oard has

most imp ortance to our design were

at least two data paths into the exter

memory bandwidth and memory latency

nal network p ermitting redundant access

The design prop osed has high band

to the shared resources If the external

width as all connections are made with

network is suciently redundant then it

bit wide buses driven at the pro ces

is p ossible to allow continued op eration

sors external sp eed Within pro cessor

given a single pro cessor failure A bus

b oards latency is low as the shared inter

failure can also b e tolerated

nal bus yields a latency prop ortionate to

the sp eed of the pro cessors and the mem

Consequences

ory The latency of interb oard communi

cations is higher as transactions crossing

To achieve maximum exibility in deter

multiple pro cessor b oards are delayed by

mining the congurations of the external

the passage through each b oard

network it was necessary to abandon a

globally clo cked system This is b ecause The latency of external transactions

we have a large numb er of small bus will limit the p erformance of individual

segments each with multiple pro cessors programs however as the target op er

Since each pro cessor spans two buses it ating system is intended to b e multipro

seemed natural to have a single clo ck grammed the p erformance of the overall

However the logistics of distributing such system will not b e limited By having

a clo ck without violating our design aim multiple pro cessors within a no de the fa

of a mo dular system without any critical cilities of the no de may b e utilized while

centralized resources made it necessary an individual pro cessor waits for remote

to consider alternative arrangements A memory When a pro cess is blo cked on a

self clo cked bus was adopted allowing memory transaction another pro cess may

wide variations in physical path lengths b e allowed to pro ceed maintaining sys

MULTIPROCESSOR NODE

bus runs using a split bus proto col pro of the physical buses and eliminating the

vided by the pro cessors on the daugh clo ck distribution problem

ter b oards plugged into the Multipro ces

sor No de Board

Multipro cessor No de

The comp onents on the MP bus form

a classical shared memory SMP machine

The Multipro cessor No de b oard Figure

supp orts a combination of pro cessor or

Bus Switching Unit

memory mo dules on the MP bus Mem

ory Pro cessor Bus A single Multipro

To help provide ob oard communica

cessor No de b oard b ehaves as a classic

tions the switching unit provides op

SMP machine Using the external p orts

erational states

it is p ossible to connect to an external

network of multipro cessor no de b oards

Port A connect Port B

using a passive backplane

MP Bus connect Port A

Each Multipro cessor No de Board has

a lo cal clo ck pulse generator This is used

MP Bus connect Port B

to provide clo ck signals to the pro cessor

and memory daughter b oards the con

No Connection

trol logic and the arbiters This clo ck is

also gated out through the p orts to clo ck

Port Interface Units

the external bus when the p ort b ecomes

a bus master

Each p ort has a p ort interface unit which

The requirement for a global clo ck is

p erforms the functions of transmitting

eliminated by using the FIFOs to decou

data onto a bus and receiving data from

ple the lo cal clo cks from the clo ck found

the bus

on the external bus

To receive data this unit recognizes

relevant information on the bus and ac

cepts it into the input FIFO otherwise

Functional Description

bus trac is ignored

The Multipro cessor No de Board consists

To transmit data the p ort interface

of ma jor functional blo cks connected by

unit arbitrates for the bus and then out

a state machine the control logic The

puts data from the output FIFO

blo cks are

Op erational Description

MP Bus

All addresses in the system are parti

Bus Switching Unit

tioned into regions The most signif

icant bits of the address determine which

Port Interface Units

Multipro cessor No de Board is to b e ac

cessed and the least signicant bits de

MP Bus

termine the address of the memory lo ca

The MemoryPro cessor bus MP bus tion on the Multipro cessor No de Board

is a data address and control signal see Figure Two Multipro cessor No de

bus The data and address paths are b oard numb ers are reserved No de b oard

bits wide with the data and address numb er zero always refers to memory lo

signals multiplexed onto the bus This cal to the no de b oard and the maximum

APPENDIX C HARDWARE DESCRIPTION

Figure Blo ck Diagram of Multipro cessor No de

of routing data b etween memory and pro no de b oard numb er refers to hardware

cessor control memory lo cal to the no de b oard

The No de Numb er is used to index

Transfers b etween no des employ a

into a routing lo okup table held in static

packet structure A packet comprises a

RAM which is deco ded to determine

header a b o dy containing the data and

where the memory lo cation can b e found

a packet check sum The header contains

There are three typ es of access avail

the source and destination addresses the

able to the pro cessor

packet size and the packet typ e Packet

typ es include read write and an indi

Lo cal Memory Memory is ad

cation of whether the destination is a

dressed directly over the MP bus

pro cessor or memory Packets are con

Remote Memory Discussed in

structed and interpreted by control logic

Section

in the multipro cessor no de b oard

Lo cal memory op erations use the in

Hardware Control Memory The

trinsic addressing mechanism of the pro

bus p orts are isolated and the rout

cessor

ing lo okup tables are mo died by

Memory accesses are routed through

the pro cessors

the network in a manner similar to a

In addition a Memory to Memory

packet based store and forward network

DMA transfer facility will b e available to

When a pro cessor utters an ob oard

facilitate page sized transfers

address the high order bits of the address

are used to index the MP Bus lo okup

Routing

table The lo okup table contains bits

which indicate which p ort should b e used This section will follow the path of a

to attempt the access Figure memory access to illustrate the op eration

MULTIPROCESSOR NODE

MSB LSB

No de Numb er Memory Lo cation

No de Numb ers

Lo cal Memory

f f Hardware Control

x x O Board Memory

Figure Partitioning of Addresses

Port A Lo okUp MP Bus Lo okUp Port B Lo okUp

Read In In A Out A Out B In B Read In

Figure Contents of Lo okUp Tables

address of the packet on the bus index the If the output FIFO on the required

p ort lo okup tables of all p orts attached p ort is b elow the high water mark the

to the bus If the p orts Read In bit is p oint at which it is guaranteed that the

set and the input FIFO is b elow the high largest p ermissible packet will t in the

water mark then the data on the bus is FIFO and there is no trac currently

read into the FIFO otherwise the data is passing through the switch then the

ignored switch is connected to the appropriate

p ort A packet header is constructed and

The no de numb er of the destination

transferred to the FIFO Data is trans

address in the header of the rst packet

ferred to the output FIFO A check sum

in the FIFO is used to index the MP Bus

is added to the FIFO If the conditions

lo okup table If the In bit is set the

are not met the pro cessor should reat

switch allows the contents of the packet

tempt the op eration later

to b e directed to the memory on the MP

Bus Otherwise the switch is set to p er

When there is data in the output

mit the ow of data from the input FIFO

FIFO and there is no bus master an at

to the opp osite output FIFO

tempt is made to arbitrate for the ap

propriate bus The arbiter is discussed in

Section When the p ort b ecomes the

Design Features

bus master the packet is broadcast onto

the bus

The MP bus and switched external

memory packet transfer allows b etter uti The high order bits of the destination

APPENDIX C HARDWARE DESCRIPTION

highest priority in the p o ol lization of pro cessor memory resources

Both the external and MP buses may

Guaranteed Result A bus master

b e loaded to the level providing optimal

is selected every time an attempt is

utilization of the bus capacity

made

The design intro duces a memory hi

erarchy based on the numb er of hops b e

Varying Asynchronous Clo cks Ar

tween no des This feature intro duces a

biters are synchronous with resp ect

new degree of exibility in the manage

to their lo cal clo ck

ment of b oth memory pages and pro

cesses The throughput of a pro cesse is

Conclusion

maximized by relo cation of the pro cess

andor its data to minimize the memory

The design criteria of easy scalibility and

access time The optimization of over

high bus utilization are readily satised

all system p erformance is complicated by

by the design describ ed In addition the

memory b eing shared by multiple pro ces

multipro cessor no des through the use of

sors Peak p erformance is achieved by

distributed asynchronous arbitration and

balancing pro cessor load memory load

a self clo cked bus allow a large numb er of

and pro cess average access time

network designs to b e evaluated without

By employing dual FIFOs on the

the need to redesign the multipro cessor

p orts this design attempts to reduce the

no des The use of deep FIFOs increases

risk of lo cking a bus due to trac from

external and internal bus utilization by

the MP bus to the other bus p ort This

p ostp oning the onset of bus saturation

design decision adds latency to every

Although a design lacking global

transfer through a multipro cessor no de

clo cking was initially considered to b e

however it signicantly increases the bus

a less attractive option than a globally

loading required to cause a bus to b e

clo cked design we have found that the

stalled by FIFO b eing full This fea

advantages of disp ensing with the global

ture is esp ecially valuable where large

clo ck greater exibility easier bus de

packet transfers are exp ected as it eec

sign have overshadowed the cost higher

tively doubles the depth of the FIFOs for

latency in data transfers

ow through trac hence halving the

risk that a packet will not b e accepted

due to a FIFO b eing ab ove the high wa

Acknowledgment

ter mark

The Secure RISC Architecture pro ject

is supp orted by a grant from the Aus

The Arbiter

tralian Research Council

Maurice Castro is a recipient of

Each p ort uses a priority based arbiter

an Australian Postgraduate Research

to resolve bus master conicts Priorities

Award

are rotated to ensure fairness

The arbiter has the prop erties

Fairness by rotating the priori

ties on each attempt to select a

bus master each b oard has an equal

chance of b eing the b oard with the

REFERENCES

References Di Giacomo J Digital Bus Hand

b o ok New York p

Patterson D A Hennessy J L Com

MIPS Computer Systems Inc MIPS

puter Architecture A Quantitative

R Micropro cessor Users Manual

Approach San Mateo California

Sunnyvale California p

p

Bolosky J Fitzgerald R P Scott

M L Simple but Eective Tech

Texas Instruments Futurebus In niques for NUMA Memory Manage

terface Family Data Manual Prelimi ment th Symp osium on Op erating

nary Septemb er Systems Principles pp

REFERENCES

App endix D

Glossary

ADC Access Control Descriptors

AOT Active Ob ject Table

APD Active Protection Domain

Copyonwrite A technique which delays the copying of a page until a write to

the page o ccurs A page is made available to more than one pro cess in a read

only mo de when a pro cess attempts to write to the page the original page is

duplicated and the new page is made available to the pro cess to write into

Clist Capability List

Copyonwrite A technique which delays the copying of a page until

Capability A numb er which uniquely identies an ob ject and access rights to the

ob ject or a section of the ob ject

DMA Direct Memory Access

DSM Distributed Shared Memory

FAT File Allo cation Table

IAS Intermediate Address Space

IPC Interpro cess Communication

LRPC Lightweight Remote Pro cedure Call

APPENDIX D GLOSSARY

Mail b ox A section of the pro cess ob ject used to hold messages

Message area The area of the parameter page that do es not contain the parameter

blo ck

Message slot A section of the pro cess ob ject used to hold messages

NUMA NonUniform Memory Access The term is applied to architectures with

memory access times varying according to memory lo cation

OT Ob ject Table

Password capability A capability comp osed of two comp onents a unique identi

er for an ob ject and a randomly allo cated password

PDX Protection Domain Extension

Physical Memory Table This table describ es the current state of each page of

physical memory

Physical Ob ject An ob ject on the physical volume which p ermits access to the

rst megabytes of the physical memory of the system This ob ject is used

to provide access to buers used by low level device drivers

Physical Volume A dummy volume used to contain the physical ob ject

PMT Physical Memory Table

Private page tables Second level page tables asso ciated with a sp ecic pro cess

These page tables are used to provide small windows

Rights A capability has asso ciated with it a set of rights These rights dene the

typ es of access to ob jects that are conferred by the capability

RPC Remote Pro cedure Call

RPD Regular Protection Domain

TLC Table of Loaded Capabilities

Table of Loaded Capabilities A table asso ciated with each pro cess that contains

mappings of capabilities to memory lo cations

SASA Single Address Space Architecture

SASOS Single Address Space Op erating System

UI Unique Identier

View An attribute of a capability The region of an ob ject that p ossession of the

capability makes addressable

Wall A page of memory visible to all Walnut Kernel pro cesses

Window An area of a pro cesss address space where a view of an ob ject can b e loaded

APPENDIX D GLOSSARY

Bibliography

ABC M P Atkinson P J Bailey K J Chisholm P W Co ckshott and

Morrison R An approach to p ersistent programming The Computer

Journal

AMM T Agerwala J L Martin J H Mirza DC Sadler D M Dias and

M Snir SP system architecture IBM Systems Journal

And Mark Anderson A Password Capability System PhD thesis Depart

ment of Computer Science Monash University

APW M Anderson R D Pose and C S Wallace A passwordcapability sys

tem Technical Rep ort Department of Computer Science Monash

University

APW M Anderson R D Pose and C S Wallace A PasswordCapability

system The Computer Journal

ARG V Abrossimov M Rozier and M Gien Virtual memory management

in Chorus Technical Rep ort CSTR Chorus systemes

AVW J Armstrong R Virding and M Williams Concurrent Programming

in Erlang PrenticeHall Hemel Hempstead Hertfordshire

AW M Anderson and C S Wallace Security management in a password

capability system Technical Rep ort Department of Computer Sci

ence Monash University

BFF Alan C Bromb erger A Peri Frantz William S Frantz Ann C

Hardy Norman Hardy Charles R Landau and Jonathon S Schapiro

BIBLIOGRAPHY

The KeyKOS nanokernel architecture In Proceedings of the USENIX

Workshop on MicroKernels and Other Kernel Architectures pages

USENIX Asso ciation

BFS W J Bolosky R P Fitzgerald and M L Scott Simple but eective

techniques for numa memory management In PROC of the Twelfth

SOSP pages Litcheld Park AZ

BGJ David L Black David B Golub Daniel P Julin Richard F Rashid

Richard P Draves Randal W Dean Alessandro Forin Joseph Bar

rera Hideyuki Tokuda Gerald Malan and David Bohman Micro

kernel op erating system architecture and Mach In Proceedings of the

USENIX Workshop on MicroKernels and Other Kernel Architectures

pages USENIX Asso ciation

system Providing BS Peter A Buhr and Richard A Stro ob osscher The

lightweight concurrency on sharedmemory multipro cessor computers

running UNIX SoftwarePractice and Experience

CAC W P Co ckshott M P Atkinson and K J Chisholm Persistent

ob ject management system Software Practice and Experience

Cas Maurice Castro The Walnut Kernel User level programmers guide

Technical Rep ort Department of Computer Science Monash

University revised Novemb er

Cat David Cathro An io subsystem for a multipro cessor Masters thesis

Department of Computer Science Monash University

Syb ex CG John Crawford and Patrick Gelsinger Programming the

Alameda Calafornia

Chi Vernon L Chi Salphastic distribution of clo ck signals Technical Re

p ort Micro electronic Systems Lab oratory CB Sitterson

Hall Department of Computer Science University of North Carolina

Chap el Hill NC

BIBLIOGRAPHY

CLFL Jeerey S Chase Henry M Levy Michael J Feeley and Edward D

Lazowska Sharing and protection in a single address space op erating

system Technical Rep ort Technical Rep ort Department of

Computer Science and Engineering University of Washington Seattle

USA April Revised January

ook Digital Cor Digital Equipment Corp oration VAX Architecture Handb

Equipment Corp oration

Cor Microsoft Corp oration QBasic Nibbles Source co de in BASIC

CP Maurice D Castro and Ronald D Pose Monash secure risc mul

tipro cessor Multiple pro cessors without a global clo ck In Gupta

Proceedings G editor Australian Computer Science Communications

of the Seventeent h Annual Computer Science Conference ACSC

Christchurch New Zealand pages

CPW Maurice Castro Glen Pringle and Chris Wallace The Walnut Kernel

Program implementation under the Walnut Kernel Technical Rep ort

Department of Computer Science Monash University

Also released by SERC CITRI as Technical Rep ort SERC

Cus Helen Custer Inside Windows NT Microsoft Press Redmond Wash

ington

DdBF a Alan Dearle Rex di Bona James Farrow Frans Henskens David

om Stephen Norris John Rosenb erg and Francis Hulse Anders Linstr

Vaughan Protection in the Grasshopp er op erating system In Proceed

ings of the th International Workshop on Persistent Object Systems

DdBF b Alan Dearle Rex di Bona James Farrow Frans Henskens Anders Lin

strom John Rosenb erg and Francis Vaughan Grasshopp er An or

thogonally p ersistent op erating system Computing Systems

Summer

DG Joseph Di Giacomo editor Digital Bus Handbook McGrawHill New

YorK

BIBLIOGRAPHY

DLR Alan Dearle Anders Linstrom and John Rosenb erg The grand unied

Workshop on Hot theory of address space In Proceedings of the Fifth

Topics in Operating Systems pages

DVH Jack B Dennis and Earl C Van Horn Programming semantics for mul

tiprogrammed computations Communications of the ACM

Fab R S Fabry Capabilitybased addressing Communications of the

ACM

FPR Vincent J Fazio Ronald D Pose and Wells John R Monash se

cure risc multipro cessor Performance simulation In Kotagiri R edi

tor Australian Computer Science Communications Proceedings of the

Glenelg Eighteenth Annual Computer Science Conference ACSC

South Australia pages

GC Berny Go o dheart and James Cox The magic garden explained the

internals of UNIX System V release an opensystems design Pren

tice Hall New York

Geh Edward F Gehringer MONADS a computer architecture to supp ort

software engineering Technical Rep ort Monads Rep ort No De

partment of Computer Science Monash University

Gie Michel Gien Microkernel architecture key to mo dern systems design

Technical Rep ort CSTR Chorus systemes

GL Virgil D Gligor and Bruce G Lindsay Ob ject migration and authenti

cation IEEE Transactions on Software Engineering SE

Har Norman Hardy KeyKOS architecture Operating Systems Review

HERV Gernot Heiser Kevin Elphinstone Stephen Russell and Jerry

Vo chtelo o Mungi A distributed single addressspace op erating sys

BIBLIOGRAPHY

tem In G Gupta editor Proceedings of the Seventeenth Australian

Computer Science Conference pages

Hil Dan Hildebrand An architectural overview of QNX In Proceedings

of the USENIX Workshop on MicroKernels and Other Kernel Archi

tectures pages USENIX Asso ciation

Int Intel i Processor Programmers Reference Manual Intel Corp ora

tion Santa Clara Calafornia

JJa William Fredrick Jolitz and Lynne Greer Jolitz Porting UNIX to the

A practical approach Dr Dobbs Journal of Software Tools

pages

JJb William Fredrick Jolitz and Lynne Greer Jolitz Porting UNIX to the

A stripp eddown kernel Dr Dobbs Journal of Software Tools

pages

JJc William Fredrick Jolitz and Lynne Greer Jolitz Porting UNIX to the

The basic kernel Dr Dobbs Journal of Software Tools pages

KB Ted Kehl and Steve Burns A selftuned stoppable clo ck oscillator De

partment of Computer Science and Engineering University of Wash

ington Seattle Washington

Kee J L Keedy A comparison of two pro cess structuring mo dels Technical

Rep ort Monads Rep ort Department of Computer Science Monash

University

Kee Leslie J Keedy The Monads view of software mo dules In A J H

Sale and G Hawthorne editors Proceedings of the Ninth Australian

Computer Science Conference

LDdB Anders Linstrom Alan Dearle Rex di Bona J Mathew Farrow Frans

Henskens John Rosenb erg and Francis Vaughan A mo del for user

level memory management in a distributed p ersistent environment In

BIBLIOGRAPHY

G Gupta editor Proceedings of the Seventeenth Australian Computer

Science Conference

Marshall Kirk McKusick Michael J Karels and LMKQ Samuel J Leer

John S Quarterman The Design and Implementation of the BSD

UNIX AddisonWesley Reading Massachusetts

Lo e Keith Lo ep ere Mach Kernel Principles Op en Software Foundation

and Carnegie Mellon University edition

Mac International Business Machines Personal Computer Hardware Ref

erence Library Technical Reference IBM Bo ca Raton Florida rst

edition

Mic Sun Microsystems System Services Overview revision a edition

MIP MIPS Computer Systems Inc Sunnyvale California MIPS R

Microprocessor Users Manual

MSWK K Murray T Stiemerling T Wilkinson and P Kelly Angel

Resource unication in a bit microkernel Technical Rep ort

TCUSARC City U CS London

MvRT Sap e J Mullender Guido van Rossum Andrew S Tanenbaum Rob

b ert van Renesse and Hans van Staveren Amo eba A distributed

op erating system for the s IEEE Computer pages

MWO Kevin Murray Tim Wilkinson Peter Osmon Ashley Saulsbury Tom

Stiemerling and Paul Kelly Design and implementation of an ob ject

orientated bit single address space microkernel In Proceedings of

the USENIX Symposium on MicroKernels and Other Kernel Archi

tectures pages USENIX Asso ciation

Nor Peter Norton The Peter Norton Programmers Guide to the IBM PC

Microsoft Press Redmond Washington

BIBLIOGRAPHY

OSW P E Osmon T Stiemerling A Whitcroft A Valsamidis T Wilkin

son and N Williams The Topsy pro ject a p osition pap er Technical

Rep ort TCUSARC City U CS London

PFR Ronald D Pose Vincent J Fazio and Wells John R An incrementally

scalable multipro cessor interconnection network with exible top ology

and lowcost distributed switching IEEE Computer Society Techni

cal Committee on Computer Architecture Newsletter Special Issue on

Interconnection Networks for HighPerformance Computing Systems

pages

PH David A Patterson and John L Hennessy Computer Architecture

Approach Morgan Kaufmann San Mateo California A Quantitative

Pos Ronald David Pose A CapabilityBased TightlyCoupled Multiproces

sor PhD thesis Department of Computer Science Monash University

PPTT Dave Presotto Rob Pike Ken Thompson and Howard Trickey Plan

a distributed system In Proceedings of the USENIX Workshop on

MicroKernels and Other Kernel Architectures pages USENIX

Asso ciation

RAa John Rosenb erg and David Abramson The MONADS architecture

Motivation and implementation In The First Pan Pacic Computer

Conference Proceedings volume pages

RAb John Rosenb erg and David Abramson MONADSPC a capability

based workstation to supp ort software engineering In Proceedings

of the Eighteenth Annual Hawaii International Conference on System

Sciences volume pages

RAA M Rozier V Abrossimov F Armand I Boule M Gien M Guille

mont F Herrmann C Kaiser S Langlois P Leonard and

W Neuhauser Overview of the CHORUS distributed op erating sys

tems Technical Rep ort CSTR Chorus systemes

BIBLIOGRAPHY

RD Sub Ramakrishnan and Larry Dunning On the complexity of task

assignment algorithms In IX International Conference on Systems

Engineering pages Las Vegas NV

RJO Richard Rashid Daniel Julin Douglas Orr Richard Sanzi Rob ert

Baron Alessandro Forin David Golub and Michael Jones Mach A

system software kernel In Proceedings of the th Computer Science

International Conference COMPCON

Ro d Thomas Ro den Highresolution timing A fast tight timer for PCs

Dr Dobbs Journal of Software Tools pages

RT D M Ritchie and K Thompson The UNIX timesharing system

Communications of the ACM pages

Sal Jerome H Saltzer Protection and the control of information sharing

in multics Communications of the ACM

SC Julio Sanchez and Maria P Canton IBM Microcomputers a Pro

grammers Handbook McGrawHill New York

SHFG M Snir P Ho chschild D D Frye and K J Gildea The communica

tion software and parallel environment of the IBM SP IBM Systems

Journal

ST D L Schleicher and RL Taylor System overview of the application

system IBM Systems Journal

Tan Andrew S Tanenbaum Operating Systems Design and Implementa

tion PrenticeHall Englewo o d Clis New Jersey

Tan Andrew S Tanenbaum Modern Operating Systems PrenticeHall

Englewo o d Clis New Jersey

Interface Family Data Manual Pre Tex Texas Instruments Futurebus

liminary

Toy Michael Toy Worm Part of Berkeley Unix Distribution Source co de in C

BIBLIOGRAPHY

Tri Walter A Trieb el The DX Microprocessor Hardware Software

and Interfacing Prentice Hall Englewo o d Clis New Jersey rst

edition

Var Peter D Varhol QNX forges ahead Byte pages

vRT Robb ert van Renesse and Andrew S Tanenbaum Short overview of

amo eba In Proceedings of the USENIX Workshop on MicroKernels

and Other Kernel Architectures pages USENIX Asso ciation

VSK Francis Vaughan Tracy Schunke Bett Ko ch Alan Dearle Chris Mar

lin and Chris Barter A p ersistent distributed architecture supp orted

by the Mach op erating system In Proceedings of the Mach Workshop

pages USENIX Asso ciation

Wal Walnut Creek CDROM Freebsd version CDROM Ro ck Ridge

Format Walnut Creek CDROM Palos Verdes Mall Suite

Walnut Creek CA USA