PROCEEDINGS VI

· VI

HOT EL, CALE DON 2 - 3 JULY 1991 VI � SPONSORE D - I BY VI ISM FRDV· GENMIN I VI EDITED BY VI M H Linck DEPARTMENT OF COMPUTER SCIENCE • UNIVERSITY OF CAPE TOWN I I PROCEEDINGS / KONGRESOPSOMMINGS

6th SOUTHERN AFRICAN COMPUTER SYMPOSI�

6de SUIDELIKE-AFRIKAANSE REKENAARSIMPOSIUM

De Overberger Hotel, Caledon 2 - 3 JULY 1991

SPONSORED by ISM FRD GENMIN

EDITED by MHLINCK Department of Computer Science University of Cape Town

TABLE OF CONTENTS

Foreword 1

Organising Committee 2

Referees 3

Program 5

Papers (In order of presentation) 9

"A value can belong to many types" BH Venter, University of FortHare 10 "A Transputer Based Embedded Controller Development System" MR Webster, R G Harley,DC Levy& DR Woodward, University of Natal 16 "Improving a Control and Sequencing Language" G Smit & CFair, University of Cape Town 25 "Design of an Object Orientated Framework/or Optimistic Parallel Simulation on Shared-Memory Computers" PMachanick, University of Witwatersrand 40 "Using Statecharts to Design and Specifythe GMA Direct-Manipulation User Interface" L vanZijl & D Mitton, University of Stellenbosch 51 "Product FormSolutions/or Multiserver Centres with Heirarchical Classesof Customers" A Krzesinski, University of Stellenboschand RSchassberger, TechnischeU niversitatBraunschweig 69 "A ReusableKernel for the Development of ControlSoftware" WFouche and Pde Villiers, University of Stellenbosch 83 "An Implementation of LindaTuple Space under the, Helios " PG Clayton, E P Wentworth, G C Wells andF deHeer-Menlah, Rhodes University 95 "The Design and Analysis of DistributedVirtual Memory ConsistencyProtocols in an Object Orientated Operating System KMacgregor, University of CapeTown & R CampbellUniversity of Illinois at Urbana-Champaign 107 "Concurrency Control Mecchanisms for Multidatabase Systems" A Deacon, University of Stellenbosch 118

"Extending LocalRecovery Techniques for Distributed Databases" HL Victor & MH Re nnhackkamp, University of Stellenbosch 135

"Analysing Routing Strategies in Sporadic Networks" S Melville, University of Natal 148

The Design of a Speech Synthesis System for Afrikaans" M J Wagener, University of PortEl izabeth 167

"Expert Systems for Management Control: A Multiexpert Architecture" I' VRam, University of Natal 177

"Integrating Simularity-Based and Explanation-BasedLearning" GDOosthuizen andC Avenant, University of Pretoria 187

"Efficient Evaluation of Regular Path Programs" P Wocxi,University of CapeTown 201

"Object Orientation in RelationalDatabases" MRennhackkamp, University of Stellenbosch 211

"Building a secure database using self-protecting objects" M Olivier and SH von Solms, Rand AfrikaansUniversity 228

"Modelling the Algebra of Weakest Preconditions" C Brink and I Rewitsky, Universityof CapeTown 242

"A Model Checker for Transition Systems" Pde Villiers, University of Stellenbosch. 262

"A New Algorithm for Finding an Upper Bound of the Genus of a Graph" DI Carson and OR Oellennann,University of Natal 276 FOREWORD The 6th Computer Symposium, organised under the auspices of SAICS, carries on the tradition of providing an opportunity for the South African scientificcomputin g community to present research material to theirpeers.

It was hearteningthat 31 papers wereoffered forconsideration. As before all these papers were refereed. Thereafter a selection committee chose 21 for presentation at the Symposium.

Several new dimensions are present in the 1991 symposium:

* The Symposium has been arrangedfor the day immediately after the SACLA conference. * It is being run over only 1 day in contrast to the 2-3 days of previous symposia. * I believe that it is first time that a Symposium has been held outside of the Transvaal. * Over 85 people will be attending. Nearly all will have attended both events. * A Sponsorship package for both SACLA and the Research Symposium was obtained. (This led to reduced hotel costs compared to previous symposia)

A major expense is the productionof theProceedings of the Symposium. To ensure financialsoundness authors have had to pay the page charge of R20 per page.

A thought forthe future would be consideration of a poster session at the Symposium. This could provide an alternative approach to presenting ideas or work.

I would sincerely hope that the twinning of SACLA and the Research Symposium is considered successfulenough forthis combination survive. As to whether a Research Symposium should be runeach year after SACLA, or only every second year, is a matter of need and taste.

A challenge forthe future is to encourage an even greater number of MSc & PhD students to attei:id the Symposium. Unlikethis year, I would recommend that they be accommodated at the same cost as everyone else. Only if it is financially necessary should the sponsored number of students be limited.

I would like to thank the other members of the organising committeeand my colleagues at UCT forall the help thatthey have given me. A special word of thanks goesto Prof. Pieter Kritzinger who has provided me with invaluable help and ideas throughout the organisation of this 6th Research Symposium.

MHLinck Symposium Chairman

1 I I I/ SYMPOSIUM CHAIRMAN M HLinck, University of CapeTown

ORGANISING COMMITTEE D Kourie, Pr etoria University. PS Kritzinger, University of Cape Town. M HLi nck, University of Cape Town.

SPONSORS ISM GENMIN FRD

2 LISTOF REFEREES FOR 6th RESEARCH SYMPOSIUM NAME INSTITUTION

Barnard,E Pretoria

Becker, Ronnie UCT

Berman S UCT

Bishop, Judy Wits

Berman, Sonia UCT

Brink, Chris UCT

Bodde, Ryn Networks Systems

Bornman, Chris UNISA

Brower, Pict UOFS

Cherenack, Paul UCT

Cook Donald UCT

de Jaeger, Gerhard UCT

de Villiers, Pieter , Stellenbosch

Ehlers, Elize RAU

Eloff, Jan RAU

Finnie, Gavin Natal

Gaynor, N AECI

Hutchinson, Andrew UCT

Jourdan, D Pretoria

Kourie Derrick Pretoria

Kritzinger, Pieter UCT

Krzesinski,Tony Stellenbosch

Laing, Doug ISM

Labuschagne, Willem UNISA

Levy, Dave Natal

3

"'-··-" MacGregor,Ken UCT Machanick,Philip Wits

MattisonKeith UCT Messerschmidt, Hans UOFS Mutch, Laurie Shell Neishlos, N Wits Oosthuizen, Deon Pretoria PetersJoseph Simon Fraser Ram, Natal, Pmb. Postma, Stef Natal, Pmb Rennhackkamp, Martin Stellenbosch Shochot,John Wits Silverberg, Roger Council for MineralTechnology Smit, Riel UCT Smith, Dereck UCT Terry, Pat Rhodes van den Heever, Roelf UP van Zijl, Lynette Stellenbosch Venter, Herman Fort Hare Victor,Hema Stellenbosch von Solms, Basie RAU

Wagenaar,M UPE Wentworth, Peter Rhodes

Wheeler, Graham UCT Wood,Pete r UCT

4 6TH RESEARCH SYMPOSIUM- 1991 FINALPROGRAM

TUESDAY 2nd July 1991 10h00- 13h00 Registration 13h00- 13h50 PUB LUNCH

14h00 - 15h30 SESSION lA SESSION lB Venue: Hassner Venue: Hassner C

Chairman: Prof Basie von Solrns Chainnan:ProfRoelf v d Reever

14h00 - 14h30 14h00 - 14h30 "A value can belong to many types. " "Design of an Object Orientated B H Venter, University of Fort Hare Framework/or Optimistic Parallel Simulation on Shared-Memory Computers" P Machanick, University of Witwatersrand

14h30 - 15h00 14h30 - 15h00 "A Transputer Based Embedded "Using Statechans to Design and Controller Development System" Specify the GMA Direct-Manipulation MR Webster, R G Harley, DC Levy & User Interface"L van Zijl & D Mitton, D R Woodward, University of Natal University of Stellenbosch

15h00 - 15h30 "Improvinga Control and Sequencing 15h00 - 15h30 Language" "Product Form Solutions/or Multiserver G Smit and C Fair, University of Cape Centres with Heirarchical Classes of Town Customers" A Krzesinski, University of Stellenbosch and R Schassberger, Technische Universitlit Braunschweig

15h30 · 16h00 TEA

5 16h00 · 17h30 SESSION 2A

Venue: Hassner

Chairman:Pr of DerrickKourie

16h00- 16h30 "A ReusableKernel for theDevelopment of ControlSoftware" W Fouche andP de Villiers, Universityof Stellenbosch

16h30 - 17h00 "An Implementationof Linda Tuple Space under the Helios Operating System" PG Clayton, E P Wentworth, G C Wells and F de-Heer-Menlah, Rhodes University

17h00- 17h30 "The Design andAnalysis of Distributed VirtualMemory Consistency Protocols in an Object Orientated Operating System" K MacGregor, Universityof Cape Town & R Campbell, University of lliinois at Urbana-Champaign

19h30 PRE-DINNER DRINKS

20h00 GALA CAPE DINNER (Men: Jackets & ties)

6 WEDNESDAY 3rd ,July1991 7h00 • 8h15 BREAKFAST

8h15 - 9h45 SESSION 3A SESSION 3B Venue: Hassner Venue: Hassner C Chairman: AssocProf P Wood Chairman: Prof G Finnie 8h15 - 8h45 8h15 - 8h45 "ConcurrencyControl Mechanismsfor TheDesign of a Speech Synthesis MultidatabaseSystems" A Deacon, System forAfrikaans" M J Wagener, Universityof Stellenbosch Universityof PortElizabeth 8h45 - 9h15 8h45 - 9h15 "ExtendingLocal RecoveryTechniques "ExpenSystems for Management for Distributed Databases"H L Victor Control: A MultiexpertArchitecture" & M H Rennhackkamp,University of V Ram, University of Natal Stellenbosch 9h15 - 9h45 9h15 - 9h45 "Analysing Routing Strategies in "IntegratingSimularity-Based and SporadicNetworks" S Melville, Explanation-Based Learning" Universityof Natal G D Oosthuizenand C Avenant, University of Pretoria

9h45 - 10h15 TEA

10h15 - 11h00 SESSION 4 Venue: Hassner Chairman: ProfP S Kritzinger Invitedpaper: E Coffman

11h00· 11h10 BREAK

7 11h10 • 12h40 SESSION SA SESSION SB

Venue: Hassner Venue: Hassner C

Chairman: ProfC Bornman Chairman: ProfA Krzesinski

11h10 - 11h40 11h10 - 11h40 "Efficient Evaluationof Regular Path "Modellingthe Algebra of Weakest Programs" Preconditions" P Wood, University of Cape Town C Brink & I Rewitsky, University of Cape Town

11h40 - 12h10 11h40 - 12h10 "Object Orientationin Relational "A Model Checkerfor Transition Databases" Systems" M Rennhackkamp, University of Pde Villiers, University of Stellenbosch Stellenbosch

12h10 - 12h40 12h10 - 12h40 "Building a secure databaseusing self­ , "A NewAlgorithm for Finding an Upper protecting objects" M Olivier and S H Boundof the Genus of a Graph" von Solms, Rand Afrikaans University D I Carson andO R Oellennann, Universityof Natal

12h45-12h55 GENERAL MEETING of RESEARCH SYMPOSIUM ATTENDEES

Venue: Hassner

Chairman: Dr M H Linck

13h00 • 14h00 LUNCH

FINIS 6th COMPUTER SYMPOSIUM

8 PAPERS ofthe 6TH RESEARCH SYMPOSIUM

9 Design of an Object-Oriented Framework for Optimistic Parallel Simulation on Shared-Memory Computers

Philip Machanick Department of ComputerScienc e, U niversity ofthe Witwatersrand, 2050 Wits

Abstract

Parallel simulation, if it is to become a mainstream technology, must become reasonably accessible to programmers without unusual skills. Since low-cost shared memory machines are becoming an increasing possibility, a simulation package designed to perform effi ciently on such machines is desirable. This paper presents previous results indicating some performance issues to be taken into account in such a simulator, and presents a design foran object-oriented package, basedon fe atures of C ++, for a simulator using the optimistic model. The major finding is that C ++ is suitable fo r such a simulation package, although there fewa difficulties are identified. keywords: Simulation, Parallel Processing, Object-Oriented Programming Computing Review Categories: 1.6.J, C.1.2, D. 2.2

1. Introduction translated to C++ (malting minimal changes). This approach was not entirely successful in that the Parallel simulation is becoming an increasingly macros ,are language extensions outside of the important area. as demands for simulation increase compiler, and debugging could be difficult (for and parallel machines become more affordable. At example, it was generally necessary to read the the same time. if it is to be useful in organizations expanded macros to interpret error me ssages). In with a relatively low budget. it should not be addition, the macros are general parallel constructs, significantly more difficult to program than rather than simulation constructs, and in this sense sequential simulation. The starting pointthe for work present a low-level interface to the programmer. reported on here is an investigation of techniques for One of the purposes of this paper is to implementing software efficiently on shared-memory demonstrate that C++ is a sufficiently powerful machines, which rely on good caching behaviour to language to achieve the required functionality achieve scalability. Such machines are likely to without extensions (either via macros or compiler become increasingly common-and affordable-in changes). Another purpose is to present a design of a the future, as the speed gap between affordable high­ general package for parallel simulation primitives in performance processors (typically RISC) and an object-oriented language. based on performance­ affordable memory components increases [5]. The oriented research results. Much previous work on Silicon Graphics Iris series is an existing range of parallel simulation has been on unconventional machines approximating to the predicted features. architectures. such . as vector processors [18], the There are also strong indications that Sun is to Connection Machine [6; 3], Hypercubes [14] and the launch a machine in this class soon. BBN Butterfly(16; 22). The approach taken here is \o design a The rest of this paperpresents somebackground. framework for a specific model of parallel followed by the design. The next section outlines simulation, using object-oriented techniques to general development of parallel simulation, with present a high-level interface to the simulation special emphasis on the optimistic model. The programmer. following section reports previous work in This paper builds on experience of implementing a parallel simulation in C++. The implementation of a specific parallel simulation in design follows in the next section. This is followed the object-oriented language C++, and presents a by a summary of related work. In conclusion, some design for a general object-oriented library for such findings are presented. along with plans for future simulations. In the previous work [4], the Argonne Werle. National Laboratories (ANL) macros [17] were

40

41 41

2 2 Kieraa Kieraa Harty: Harty: private private MDJDUJDication, MDJDUJDication, 1991. 1991.

David David Jeff erson: erson: private private communication, communication, 1991. 1991.

1 1 they have have they not not as as yet yet processed processed one . . Lookahead Lookahead can can

bow far far bow into into the the future future they they will will send send a a message message if if

service service times times etc. etc. of of its its neighbours neighbours and and can can predict predict

in in which which a a given given LP LP knows knows the the latencies, latencies, lookahead. lookahead.

app ro ach ach to to reducing reducing this this problem problem is is the the use use of of in in which which antimessage s s are are only only sent sent canc ella t ion , ,

process process blocks blocks wait ing ing for for its its neigh bours . . On e e str ate gy gy by by agg ress ive ive canc ellation ellation laz y y

is is that that there there may may be be co nsid erable erable time lost lost time as a as Anothe r r variation variation is is the the rep laceme nt nt of of the the

Another Another problem problem with with the the conservative conservative method method not not [1 1]. 1].

LP) LP) is is encountered encountered [20] . . doing doing "ahead" "ahead" of of GVT GVT will will turn turn out out to be be to use u or ful

message message wi th a a th higher higher timestamp timestamp (from (from the the same same up up LPs. regardless regardless LPs. of of whether whether the the work work they they are are

messa ges. ges. such such as as canc elling elling them them when when another another (MTW), (MTW), because because it it is is non-disaiminating non-disaiminating in in holding holding

techn iques iques exis t t to to reduce reduce the the number number of of null null value value of of this this approach, approach, called called window window moving moving time time

simulation simulation can can be be flooded flooded with with null messages. messages. null Some Some GVT GVT [2 3]. 3]. There There is is however however some some doubt about about doubt the the

serve serve to to advance advance the the cloc k. k. A A con servative servative placing placing a limit limit a on on bow bow far far any any LP LP can can go go ahead ahead of of

null null mes sages sages which which have have no no semantic semantic val ue, ue, but but work work bas bas already already been done done been to to limit limit rollback, rollback, by by

techniques techniques for for de adlock adlock detection, detection, such such as as sending sending more more favourable favourable for conservative conservative for simulations. simulations. Some Some

method method is that that is it it can can de adlock. adlock. There are There various various performance performance of of conservative methods. methods. conservative even even in in cases cases

A A big big problem problem with with the the basic basic conserv ative ative lik ely ely to to to to push push the the techniqu e e close close to to the the

a timestamp earlier earlier timestamp a than than techniques techniques to to reduce reduce the the amount amount of of rollback rollback are are Ti. Ti.

blocks blocks if if none none of of the events events the in in its its local local event list has list event than than is is saved) saved) is is smooth [9] smooth . The The . implication implication is is that that

recei ve bas bas d d from from each each of of its its neighbours. neighbours. A process process A examples examples where where it it is is not not (when (when rollback rollback costs costs more more

the the minimum minimum timestamp timestamp of of the the most most recent recent event event it it between between cases cases where where ti mewarp mewarp is is a a net net win win and and

sa ving . Analytical results results Analytical . show show that that the the trans ition ition lpj lpj bas a bas associated associated with with it, it, which which is is Ti Ti clock clock value value

supplies supplies synchronization synchronization and and data data transf er. er. Each Each LP LP possible that the that possible overall overall cost cost will will be be higher higher than the than

commu nication nication net wo rk rk between between the the proc esses esses or or avoida nce. nce. in in a a conservative conservative simulation-but simulation-but it it is is

and and the the event event list list is is distributed distributed between between LPs. 'A 'A LPs. would would otherwise otherwise be be idle idle time. or or time. deadl ock ock detection detection

simulated simulated is is mapped mapped onto onto a process process a in in the the program, program, The The intention intention is that time that is warp warp will will use use up up what what

each each {LP) {LP) in in the the system system being being . . wa,p lo gi cal cal pr oce ss ss . .

event event simulation simulation [19 ]. ]. In In the the con�ervative con�ervative model, model, usi ng ng antimessages antimessages and and rollbacks rollbacks is called called is time time

implementation implementation of of a a conventional conventional event-list event-list discrete­ The The mechanism used used mechanism o re to aliz.e aliz.e virtual virtual time time

The The conse rv at ive ive method method is is a a di stri buted buted diff erent erent starting starting point): point): the the V V operating operating system . .

2 2

time- ste ps. ps. experime nt al al operati ng ng syst em em (t hou gh gh from from a a

can can be be im plemented plemented using using barriers barriers to to sync hronize hronize being being inv estiga ted ted in in new new work on on work at at lea st st one one

shared-memory shared-memory or or distributed distributed memory memory machine , they they ,

page page is known known is . Such a a Such . to to scheme scheme contain contain is is fossils fossils 1

reasonably reasonably easily easily over over the the whole machine machine whole [6]. [6]. On On a a in in the the memory memory manager 's 's page page table table when when a whole whole a

Con nection nection Machine , , the the work work can can be be distributed distributed approach approach to to fossil fossil collection collection is is to to clear the the clear dirty dirty bit bit

vectorization vectorization [18 ]. ]. On On a a SIMD SIMD machine machine such such as the as One One prop osed osed relat ively ively low-c ost ost colle cti on. on.

they they can can be be done done simul neous ta ly ly to to maximize maximize memory memory becomes becomes sc arce ; ; this this is is known known as as fo ssil ssil

scheduling scheduling all all the the computations computations of of a a given given type type so so Fossils Fossils must must eventually eventually be be reclaimed. reclaimed. when when fo ssil . .

vect or or machi nes , , the y y are are implemen ted ted by by than than GVT GVT is is no no longer longer needed. needed. and and is is referred referred to to as as a a

on a a on range range of of architectures architectures reasonably reasonably simply . . On On rolling rolling back back possible. possible. Sa ved ved state state which which is is older older

discrete discrete time time inte rvals. rvals. They They may may be be implemented implemented message. message. Old Old state state bas bas to to be be maintained maintained to to make make

pr ocessing ocessing a a large large amount amount of data data of at at spe cific cific state state back to to back the the time time prior prior to to the the cancelled cancelled

Time - stepped stepped simulations simulations generally generally proceed proceed by by messag�s. messag�s. This This cancellation cancellation is is effected effected y rollin by g g

this this work. work. can celling celling the the effect effect of of erroneous erroneous an timessa ge s. s.

stepped stepped simulation simulation have have proved proved useful useful as a a as basis basis for for work work has to be to has undone. undone. This This is is done done by by sending sending out out

some some insights insights derived derived from from research research into into a a time­ processed. processed. there there is a a is and and any any invalid invalid causality causality error, error,

usually usually applicable applicable to to differe nt nt problems. problems. However, However, bas bas an an ear lier lier timesta mp mp than than one one alre ady ady

com pet iti on, on, while while the the time -st epped epped method method is is If If a a message message called a called is is received received which which strag gler gler

are are both both event-d ri ven, ven, and and are are ess e ntially ntially in in over over all all the the LPs LPs is is called called (GV T). T). global global vinual vinual time time

implement implement The The conservative conservative and and optimistic optimistic methods methods and and the the minimum virtual virtual minimum time time its its own own vinual vinual time , ,

re sea c, be rch, cause cause the y y are are rel ativ ely ely easy easy to to neighbours neighbours in simulation simulation in time . Each LP operates operates LP Each . in in

the the least least int erest ing ing from from the the point point of of view view of of con tinue tinue proce ssing ssing even if if even it it is is ahead ahead of of it s s

Of Of these, these, the the time- stepped stepped simulations simulations are are and and unnecessary unnecessary delays delays by by al lowing lowing an an LP LP to to st epp ed. ed.

consider ed: ed: and and The The optimistic optimistic method method (13] (13] avoids avoids deadlocks deadlocks con ser vat ive, ive, op timistic timistic time­

In this this In paper. paper. 3 3 models models of of parallel simulation are simulation parallel [1 1]. 1].

generall y y requires requires appli cation -s pecific pecific knowle dge dge

2. 2. Parallel Parallel Simulation Simulation ults ults [25 ], ], but bas bas but the the drawback drawback that that it it give give good good re s out when it is determined that previous computations 3. Experiencewith a Particle-Based did in fact produce erroneous results. This approach Simulator results in a performance improvement in most cases, despite the extta overbe� of having to check if the This section introduces aspects of previous state bas changed before deciding to cancel [21]. experience with parallel simulation, focussing on A recentproposal, temporal decomposition, is to aspects relevant to the design presented in this allow an LP to be scheduled as a different process paper. In particular. the overall program sttucturing after a specific point in virtual time. The break techniques which are designed to achieve good between two versions of the LP is bandied cachingbehaviour are of relevance. conservatively, in the sense that the later version is The application is MP3D, a particle-based only allowed to start after there is no chance of simulation of a wind tunnel. The simulation was rolling back to the previous version. This strategy originally implemented on a Cray 2 [18]. and the allows for dynamic adjustments to load balance, version initially used had been ported to an Encore even in a statically load-balanced system, by Multimax with relatively few changes from the allowing the LP to be assigned to different vector version. This Encore version bas been used as processors before andafter the split [22]. a benchmark example of a program with poor One drawback of optimistic methods is the caching characteristics on shared-memory machines difficulty in keeping track of state which may [12]. This initial version is called MP3D-O (actually potentially need to be rolled back. in a general with minimal changes from the Encore version to programming model with arbitrary side-effects [ll]. conven it to C++4 ). There is a casefor a higher level model to suppon Each timestep, particles are moved (depending state manipulation, to ensure that state is not on their velocity components), and paired for changed without the knowledge of the rollback possible collision. A probabilistic selection function mechanism. Another problem is some categories of is use� to determine whether the collision actually events cannot be rolled back (for example, if they takes place. The n2 problem of finding collision result in a state change external to the simulation, partners is reducedto 8(n). 1be sttategy is to divide such as output to the screen).The originaltime warp space into unit-cube cells. After particles aremoved, mechanism special-cases such events [13]. The they are randomly p3:U'ed within a cell, rather than nonretractable event (NRE) is a useful addition to attempting to findnearest neighbours [18]. In MP3D­ the model. Such events cannot be rolled back. and O, randomization is achieved by having processors so are only scheduled when GVT bas caught up with contend for particles to process, both in the move them. Aside from providing a modelfor events which and collide phases, and by randomly ordering cause external state changes. they can also be used particles within the array whichin theyare stored. to implement tlow conttol, including controlling the There are three major weaknesses in the degree of optimism [l]. In the limit. if all events are organization of MP3D-O. The first is the fine-grained nonrettactable, the simulation becomes sequential3 • approach. which makes poor use of caches, and the There is considerable scope for optimizing both second is its data structures (following the conservative and optimistic methods, so that both organization originally .used for the vector come closer to solving problems best suited to the implementation) are in the form of sev�ral arrays, other (e.g., Wagner's paper [25] was a rebuttal of a each containing a single attribute of a particle, for claim that a specific problem was only suited to the example, its x spatial co-ordin.ate. The final problem optimistic method). However, the argument that- the is the random positioning of particles in arrays, optimistic method is less dependent on application­ which makes for poor locality. specific optimizations [11] is attracti�e. The first improvement made to MP3D was to Forthis reason, the new work introduced in this group related data together into objects, which are paper is oriented towards the optimistic method. padded and aligned to fit blocks. This Nonetheless, the experience being drawn on is from modification bas the effectof avoidingfa lse sharing, work with a time-stepped method; there is sufficient in which the same cacbe block contains unrelated overlap of issues to make this valid. data items, while increasing the prefetch effect of large cache blocks (all the data relating to one particle may be fetched after a single cache miss). The resulting program, MP3D-l, has improved 3 This is worae than conservative, since a conservative caching behaviour, but is stillpoor. simulator uses neighbours to determine the local clock, whereasan NRE is held up untilthe globalminim.um clock has caughtup with iL A weakercondition on when theNRE may be scheduled is possible, but is likely to lead to a 4 These changes have no measurable impact on requirement for the kind of mechanism found in performance, though the code size is increased, owing to conservative simulators to prevent deadlock. the larger C++ libraries.

42 The second improvement was to reorganize the cachingbehaviom are likely to become increasingly program around the spatial locality inherent in the significant. Some designs of fast uniprocessors model. The resulting revision, MP3D-2. is a maj or already have cache block of 128 bytes [2), and this restructuring of the program. Each processor is given · is a trend which is expected to continue [15). a fixed area of space, containing a collection of It is possible to measure the approximate effects cells called a precinct. Each timestep, a processor of the differences between the programs by moves all its particles, interchanges those which are measurements on machines currently available. no longer in the same precinct with its neighbours5 even if they do not exactly match these and then does collisions. Because the randomness of characteristics. association between particles and processors can no Machines on which measurements have been longer be relied upon to ensme collision partners are made are a DEC Firefly with MicroVAX CPUs. a randomly paired, randomization is done explicitly. Firefly with essentially the same memory system but Each particle contains the necessary information to faster CV AX processors, a 4-processor Silicon find the cell it belongs to (needed once it bas been Graphics SGI 4D/240S and a faster 4D/380S with 8 moved), and each cell contains the identity of the processors. The improved program structuring shows precinct that owns it. The key aspect of this up in improved speedup, and more effective use of implementation is that data stays with the same faster processors. Figures [4] supporting these claims processor as long as possible, thereby avoiding the appear in Table 1. Speedup is given relative to the high number of cache invalidations of the previous uniprocessor version of the same program, as we are implementations. concerned with scalability rather than with algorithm MP3D-2. while a specific restructuring analysis. Speedup for faster processors is given exercise, is a basis for evaluating some general relative to the equivalent run on a slower version of techniques, and the suitability of C++ for the same machine. implementing them. Some of these strategies are : The Silicon Graphics is a good example of the • increasing processor qffini�nsuring modem trend in shared-memory machines. It bas fast objects and groups of related objects are MIPS CPUs, 16-byte cache blocks6 and a relatively processed by the same processor as far as low memory bus bandwidth in relation to the speed possible. rather than reassigned in a very fine- of the processors. The Firefly is an older design. Its grained way cache blocks are only 4 bytes (i.e., too small for • separation-the forcing of unrelated data into . false sharing to be an issue). and the purpose of the separate cache blocks, thereby minimizing caches is more to reduce bus traffic than to reduce false sharing memory latency [24]. Despite. these differences • co-location-placing data which will be between these machines and from expected future accessed at approximately the same time and architectures, the restructuring of MP3D produces with similar read-write characteristics (e.g., good results on both types of machine. suggesting accessed by one processor) in the same cache that the proposed techniques are reasonably general. block. thereby enhancing the prefetch effect These strategies have been realized in MP3D-2 by two major techniques: the use of a space 4. A Design directory to support the relating of objects to the precincts (which correspond to a unit of space The strategy is to start with requirements for allocated to a processor), and the use of overloaded optimistic simulators, and to work towards the memory allocators, which replace the standard C++ insights derived from the MP3D exercise (the low­ allocators. level of the design). Simulations show that MP3D-1 bas a miss ratio The general goals for the design are about 5 times lower than MP3�. while MP3D-2 transparency, efficiency and extensibility. The improves over MP3� by at least an order of intention is to provide a general framework for magnitude. The significance of these changes optimistic simulations, incorporating the most depends on a number of factors, such as the fraction common features, while making allowance for the of references which are to shared memory. However, addition of new features. Efficiency is to be with a tendency is towards greater penalties for achieved by low-level optimizations, which are cache misses (up to hundreds of processor cycles), bidden from the programmer. using the information­ and large cache blocks [15), such changes in biding capabilities of an object-oriented language.

5 The simulation is constructed so that particles will not move more than one cell, so it is possible to determine in 6 Not as large as so� more recent designs, but large advance which precincts are "neighbours". enough for false sharing to be an issue. machine and MP3D-O MP3D-1 MP30-2 number of processors s so.r so.f s so.r so.f s so.r so. f SGI 40/240S 1 677 597 678 2 503 1.3 410 1.5 369 1.8 4 407 1.7 283 2.1 216 3.1 SGI 40/380S 1 544 1.2 437 1.4 569 1.2 (faster CPUs) 2 455 1.2 1.1 315 1.4 1.3 312 1.8 1.2 4 409 1.3 1.0 268 1.6 1.1 179 3.2 1.2 8 431 1.3 265 1.6 141 4.0 MicroVAX Firefly 1 12839 12700 9567 2 8096 1.6 8389 1.5 5116 1.9 4 5818 2.2 5810 2.2 2732 3.5 CV AX Firefly 1 4854 3.9 4161 3.1 3303 2.9 (faster CPUs) 2 4478 1.1 1.8 3964 1.0 2.1 1790 1.8 2.9 4 4002 1.2 1.5 3632 1.1 1.6 1014 3;3 2.7

Table 1. Run timeand speedup: 3 versions of MP3D on 4 machines s is time in seconds, sp.r is speedup with relative to the uniprocessor case and sp.f is speedup of an equivalent run on a similar machine with faster processors

The features in the initial implementation are rollback mechanism, which decides which chosen from those which have been generally antimessages should be scheduled for transmission accepted, with allowance for addition of some of the and local execution. Those for transmission are sent less tested innovations. to the ·Deigbbours. and those for local execution are State changes should generally be made through used to determine how to roll the local state back. state manipulation primitives. to ensure that the The relationships between the components of an rollback mechanism is reasonably simple. LP areillustrated in Figure 1. N onretractable events are supplied to allow a back LPs are grouped together. to maximize spatial door for more general state changes. Although they and temporal locality. Each such group is assigned are a relatively untried concept, a similar to a single processor. in the style of the precincts mechanism would in any case have to be supplied to used in MP3D. Loadbalance is more complicated to handle input and output. In the initial model, determine in an optimistic simulator than a aggressive can cellation is used, and lazy pessimistic or timestepped simulator. As long as an cancellation is left as an extension. Moving time LP does not run out of internally generated events, window is not implemented in the initial model. or stick on an NRE. it will not be idle. However, it is because there is uncertainty as to its usefulness. undesirable that asingle LP should get arbitrarily far There is in any case potential for investigating the ahead, and flood the network with anti-messages. A claims for nonretractable events as a throttling definition of load balance in .terms of the distance an mechanism for excessive optimism (since NREs are LP is ahead of GVT is one possibility. This is not needed anyway). ideal. for the same reason that moving time windows The remainder of this section presents a high­ are potentially a problem: the fact that an LP is far level view of the design. the approach · to ahead is not necessarily a problem. if all the work it implementation in C++. some C++ J,echniques and is doing is valid. However, changing the load an outline of the resulting classes. balance will not result in idling a processor unnecessarily, as happens inMTW. 4.1 high-level design One possibility for dynamic load balance is a temporal decomposition scheme. However, this is The structure of the simulator is broken down into overkill on shared-memory machines. as transferring LPs. Each LP contains an input queue, containing state to another processor is accomplished relatively events not yet processed. stored in timestamp order. cheaply by passing a pointer. and allowing the State associated with the LP is locally stored. An caching mechanism to transfer data on demand event dispatcher schedules and executes events in (saved state for rollbacks in particular will not be timestamp order. and puts events to be passed to transferred if it's not needed). Something ryve neighbours in an output queue, and makes copies of similar to temporal decomposition can in any case the events in the antimessages queue. When a be achieved by scheduling an NRE as pan of the straggler arrives, the dispatcbcr passes control to the protocol for moving an LP to anew precinct.

44 .. ... Input I oulpUf

llP === jj/j :\,=:======,:,=,:,:,:,=:======,=,=,=,=;::=,=t===:=====t,:=:===:=:===,:=::=,=:===,:,,,=:,,======:===,=:::::=,===:,:::==,:=t:::::t,=,=,==:==,====::::t==,:'J(,,,=:=,=,:,:,ni:,:t:,:,=,:,:,:,:=::rnr:!=,=,t,=,:===,=,n:::::?:=::?=ti=F::=::::=t:,=,=,:JI! : Figure 1. High-level view of a logical procNa Only the dispatcher may change state, except in the case of a nonretractable event.

This NRE would in effect force the LP to wait representing the hierarchy of abstraction, from the until GVT had caught up with it before it was simulation programmer's view down to machine­ transferred, ensuring that no check:pointed state specific details. There is a hierarchy of classes, with would need to be copied. inheritance links through each of these layers. An The details of the load balance process will be overview of the class hierarchy and its relationship empirically determined, based on performance to the layers is shown in Figure 3. measurements. Figure 2 illustrates grouping of LPs in precincts, and how this may change dynamically. The layers (with examples of the classes If a precinct is ·

45 (a) LP1 and LP3 in the same precinct are found to be "too optimistic• whereas LP2 is "too conservative". LP4 and LPS are in the same precinct, and appearto have the right amount of work for one processor.

LP 1

(b) LP1 and LP2 now in the same precinct, and LP3 by itself. Figure 2. Allocation of LPs to precinct• and load balance • user_level_simu lator-bigher-level view of for methods) needed by Antimessage have not be the simulation classes. These could be used redefined in an inconsistent way, Antimessage will directly in simple simulations, but more function as intended. typically would be used as a base for user­ One of the ideas used in various parts of the derived classes. Oasses at this level include design is the notion of defining7 an object, the sole Simulation_environment, State, Message and pmpose of which is to generate correct initialization NRE_message. Some of these classes are and termination. using the implicitly called needed to implement the low_level_sim ulator constructors anddestructors. This technique has been classes (e.g.• Antimessage is derivecl from used forimplementing tracing [7]. The general idea Message). is an jecob t is defined, resulting in a call of its constructor at the place of definition. At the close of the scope where the definition occurs. the destructor 4.3some C++techniques is automatically called. For example, in the C++ fragment: Inheritance plays an important role in the design. One example is the implementation of anitimessages. An antimessage is a pan of the { Simulation_environment sim; low_level_simulator layer, and therefore cannot be // code to do simulation redefined by the programmer. However, class Antimessage contains a pointerto a Message object, which may be a derived class created by the 7 In C++, as inC, a IUIDllemay be declared, in thestyle of a programmer. rather than the original class Message. forward declaration in other languages such as Pascal. A As long as themember functions (C++terminology full declaration is called adefini tion.

46 the object sim exists between its definition and the { Lock output (output_lock) ; close of scope symbol. } . Tbe purpose of this // code to do some output construct is to call the constructor for }//lock released here Simulation_,mvironment objects. which does some • Process-its constructor launches a thread. global initialization. Tbe destructor-automatically calling the function passed to it as a generated by the compiler--ensures termination parameter, and its destructor cleans up after code is called. which is useful to ensure that calling process termination. the terminating code is not forgotten by the • list manipulation and other generic data programmer. A similar approach is used for structure primitives, such as dynamically implementing locks (though these are not expected resizeable 3-dimensional arrays. to be used by simulation programmers). Each class may contain static data members The low_level_s imulator level defines the following (class variables. in Smalltalk terminology). This is a classes: useful feature for avoiding the use of global • Precinct-a pre�inct contains a list of LPs, variables. In the design. this feature is used in two and a scheduler to decide which to execute contexts : memory management and maintaining next- It also knows which other precincts are global state. Tbe low-level memory management its neighbours, and manages message routines, defined in the kernel, have the option of exchange between itself and them. taking a free space list as an argument. Where this • LP-a logical process; this relies on the feature is used. the free space list is stored as static semantics of Message and State to implement data member in the class defining the specific a specific simulation, and Precinct defines allocator and deallocator. communication with LPs on other processors. Optional parameters make it possible to use the • Saved_state--contains list of changes to state, same low-level allocators, whether the programmer­ including identities of messages that caused . supplied free space list is to be used or not the changes, stored as anti.messages; the Another useful feature is capability of rollback controller is contained in this class, overloading of the array index operator, [ ] . This as is the fossil collector. makes it possible to a variable-sized 3-dimensional • Anti_message--contains a copy of the original array, with same syntax as a statically declared one. message, plus its own timestamp. The first two indexing operations return slices of the space directory, with decreasing dimensionality Oasscs defined in user_level_simulato r include: each time. and the third indexing operation returns • State-includes state change primitives and the element desired. This capability would be usefJJl recording changes for rollback. The in simulations where spatial relationships between programmer will have to derive a new class LPs are a simple function of their positions in space, from this one to incorporate the semantics of as in MP3D. the state for the actual simulation problem. • Simulation_environment--one object of this type is defined at the startof the program; As 4.4 some C++classes with Lock, its constructor and destructor This section presents an outline of the external respectively initialize the program for interfaces of the classes. simulation, and clean up at the end. 1be kernel defines the following classes: • Message--contains a timestamp and • Object--defines low-level memory identification of its originating LP. Uses management. including alignment and member functions (methods in conventional padding to fit cache blocks. object-oriented terminology) in State to make • Lock_data--defines data structures to changes to the state when its own execute() implement locks. member function is called. • Lock-the constructor for this class sets a • NRE_message-a nonretractable event lock (identified by an object of class message: may make arbitrary state changes, Lock_data). and the destructor releases the including doing input and ou1put; not lock at the close of the cmrent scope. A scheduled until the precinct detects GVT has typical example looks like this: passed its timestamp.

47 layer claaahierarchy ______Object low-lewlallocators Lock kernel u ... conatructor-destructor pair•to .. t, free • lock Process u•• constructor-destructor pair• to launch, clean up att.r thread

Precinct Hat of LP•, Hatof neighbour•

/ow_level_slmulato aved_ state changesto atata, antimenagu Antimessagecopy of meuage, own timeatamp

Message timtamp, .. Id of original LP,knowa how to encui.

user_level_slmulator N/3E_message Slmulation_. envrronment. can't be rolledback UNSconstructor-destructo r pair•to initialize, clean up globally State lncludeaetata change prlmltiwa, recording atatefor rollbeck

Figure 3. The class hierarchy

vectorizing object-oriented code in C++ [10). This is 5. Related Work in contrast to the results on whichthis work is based, A number of object-oriented threads packages have in whic-b. object-orientedprogramming is shown to be been implemented for shared-memory parallel suited to cache-based architectures [ 4). machines. An example is Presto which bas been implemented and evaluated · on a 20-processor Sequent Symmetry [8). Work in this category has 6. Conclusions mostly aimed at efficiency of low-level primitives, The design as presented here has been partially such as locks and unching la threads. In this work. implemented (mainly the kernel level). the emphasis is more on suppon ,for application The goal of showing that C++ is a sufficiently structuring to achieve good caching behaviour. powerful language for such a package has been Other work in implementation of optimistic partially met. All therequired features arepossible parallel simulators on shared-memory parallel to implement, some conveniently and reasonably machines has not addressed the locality and caching simply. For example, the inheritance mechanism issues raised here. Examples include BBN Butterfly makes it possible to hide the antimessage implementations of optimistic simulations [11]. In mechanism from the programmer, while allowing terms of its programming model, the Synapse antimessages to use new classes derived from the simulation toolkit has similar higher-level goals to standard Message class. the design presented here, except it supports the There are however some problems. The most conservative model [26). Synapse is built on Presto, serious is that the requirement that messages should and therefore does not address new implementation not make state changes except through the State issues at the lower level. class, to make rolling back manageable. There is no In otherwork on structuring simulators, attention way this rule can be enforced through the language; has been paid to the difficulties of efficiently a preprocessor or modifications to the compiler

48 would appear to be the only sure ways of Networks on a SIMD Computer, Proceedings of enforcement. Since there is a convenient mechanism the SCS Multiconfe rence on Distributed for implementing general side-effects through NREs, Simulation, January, 133-140. it is possible this limitation will not cause major [4) David R Cheriton,Hendrik A Goosen and problems in practice. Philip Machanick [1991].Restru cturing a Another deficiency is that the class hierarchy Parallel Simulation to Improve Cache Behavior does not exactly match the logical breakdown of the in a Shared-MemoryMultiprocessor: A First problem, because some features which are logically Experience, International Symposium on Shared­ low-level (such as antimessages) need information Memory Multiprocessing, Tokyo (to appear). from higher-level constructs (messages. in this [5] David R Cheriton, HendrikA Goosen andPD case). Although the language makes such an Boyle [1991], ParaDiGM: A Highly Scalable implementation possible, this situation makes it Shared-Memory Multicomputer, IEEE Computer more difficult to describe the logical hierarchy than 24(2) February (to appear). is desirable. [6] L Dagum[19 89]:A Fast SoningAlgorithm for The next phase of this research is the a Hype�onic RarefiedFlow Particle Simulation implementation of the simulation framework on the Connection Machine, Technical Report described here. Experience with using it to RIACS 89.44, NASA-Ames Research Center. implement simulations will be the basis for [7] Margaret A Ellis and Bjarne Stroustrup evaluating how serious these problems are. [1990]. TheAnnotated C++ Reference Manual, In addition. performance measurements on both Addison-Wesley, Reading. standard benchmarks and real problems will be [8] John E Faust and HenryM Levy [1990]. The made investigate validity of the overall program Performance of an Object-Oriented Threads structure, especially the aspects which were Package, ECOOPIOOPSLA Proceedings, successful in the MP3D exercise. October, 278-288. C++ has enbe successful as a basis for one [9] Robert E Felderman and Leonard Kleinrock parallel simulation exercise. The design presented [1991],Two Processor Time Warp Analysis: here indicates that it is a promising basis for a more Some Results on a Unifying Approach. general package. In particular. the potential for Proceedings of the SCS Multiconference on designing a parallel simulation package that is Distributed Simulation. January. 3-10. usable without deep knowledge o.f the underlying [10] DW Forslund et al. [1990], Experiences in technology, including parallel programming and the Writing Distributed Particle Simulation Code in �echanisms of optimistic simulation, is attractive. C++, USENIX C++ Conference Proceedings, This design is support for the view that such' a 177-190. framework is feasible. [11] Richard M Fujimoto [1990], Parallel DiscreteEvent Simulation, Communications of the ACM 33(10) October, 30-53. 7. Acknowledgements [12] HA Goosen and DR Cheriton [1990], I would like to thank David Cheriton for supporting Predicting the Performance of Multiprocessor the underlying work on which this paper is based Caches, South Afri can Computer Journal (2) Henk Goosen also played a leading role in the May, 31-38. architecture-related aspects. Andras Salamon made [13] David R Jefferson [1985]. VirtualTime, helpful comments on an earlier draft. The work on ACM TOPLAS 7(3) July, 404-425. which this paper is based wassupported in part by [14] David Jefferson, Brian Beckman, Fred DARPA Contract N00014-88-K-0619. the Wieland, Leo Blume. Mike DiLoreto. Phil Anderson-Capelli Fund and the Mellon Foundation. Hontalas, Pierre Laroche, Kathy Sturdevant, Jack Tupman, Van Warren, John Wedel, Herb Younger and Steve Bellenot [1987], Distributed References Simulation and the Time Warp Operating Sys­ [1] JR Agre and PA Tinker [1991], Useful tem.Proceedings ofthe 11th ACMSympos ium on Extensions to a Time Warp Simulation System, Operating Systems Principles, November, 77-93. Proceedings of theSCS Multiconference on [15] NP Jouppi [1990], Improving Direct-Mapped Distributed Simulation, January, 78-85. Cache Performance by the Addition of a Small [2] HB Bakoglu,GF Grohoski and RK Montoye Fully-Associative Cache and Prefetch Buffers, [ 1990], The IBM RISC System/6000 Processor: Proceedings of the 17thInternational Symposium Hardware Overview. IBM Journalof Research and on ComputerArchitecture, June, 364-373. Development 36(1) January, 12-22. [16] GregLomow , SamirRanj an Das and [3] Boris Berkman and Rassul Ayani [1991], Richard M Fujimoto [1991],User Cancellation Parallel Simulation of Multistage Interconnection of Eventsin TimeWarp, Proce edings of the SCS

49 Multiconference on DistributedSimulati on, (22) Peter Reiher, Steven Bellenot and David January, 55-62. Jefferson (1991],Temporal Decomposition of (17] E Lusk. R Overbeek. et al.. (1987], Ponable Simulations under the Time Warp Operating Programsfor Paral.lel Processors, Holt, Rinehart System. Proceedings ofthe SCS Multiconference and Winston. on DistributedS imulation, January47- 54. (18) JD McDonald and D Baganoff [1988], [23] L Sokol andB Stucky (1990), MTW: Vectorization of a Particle Simulation Method Experimental Results For a Constrained for Hypersonic Rarefied Flow, A/AA Optimistic Scheduling Paradigm, Proceedings of Themwphysics, Plasmadynamics and Lasers the SCS Multiconference on Distributed Conference Proceedings, June. Simulation, January. 169-173. [19] Jayadev Misra [1986], Disttibuted-Disaete [24] CP Thacker, LC Stewart andSatterthwaite Event Simulation, ACM Computing Surveys 18(1) [1988], Fir efly: A Multiprocessor Workstation. March. 39-65. IEEE Transactions on Computers37( 8) August, [20] Bruno R Preiss, Wayne M Loucks, IanD 902-920. MacIntyre and James A Field (1991),Null [25] David B Wagner (1991],Algorithmic Message Cancellation in Conservative Optimizations of Conservative Parallel Distributed Simulation, Proceedings of the SCS Simulations, Proceedings of the SCS Multiconference on Distributed Simulation, Multiconference on Distributed Simulation, January, 33-38. January, 25-32. [21] PL Reiher, RM Fujimoto, S Bellenot and [26] David B Wagner (1991),The Design of an DR Jefferson [1989). Cancellation Strategies in Object-Oriented Parallel Simulation Optimistic Execution Systems, Proceedings of Environment, Proceedings ofthe SCS the SCS Multiconference on Distributed Multiconjerence on Object-Oriented Simulation, January 112-121. SimulationJanuary, 201-208.

50