THE ISSN 004-8917 AUSTRALIAN JOURNAL

VOLUME 13, NUMBER 1, FEBRUARY 1981 r

CONTENTS

INVITED PAPER 1-6 Software and Hardware Technology for the ICL Distributed Array Processor R.W. GOSTICK

ADVANCED TUTORIALS 7-12 On Understanding Binary Search B.P. KIDMAN

13-23 Some Trends in System Design Methodologies I.T. HAWRYSZKIEWYCZ

SHORT COMMUNICATIONS 24-25 NEBALL and FINGRP: New Programs for Multiple Nearest- Neighbour Analysis D.J. ABEL and W.T. WILLIAMS

26 Program INVER Revisited D.J. ABEL and W.T. WILLIAMS

27-28 A Comparison between PASCAL, FORTRAN and PL/1 D.J. KEWLEY

SPECIAL FEATURES 29 Book Reviews 30-31 Letters to the Editor 32 Call for Papers

V J

Published for Australian Computer Society Incorporated Registered for Posting as a Publication — Category B “This wouldn't have happened, Fenwick, with a Tandem NonStop™ System," When your down, are you out or business? You can bank on it. Timing couldn’t be worse. No question about it. Your busiest season. Customers pounding for service. When you’re on a Tandem NonStop™ System, your information All it takes is one small failure somewhere in the system. One files and your processes are protected from contamination in a disc or disc controller. One input/output channel. way no other system can match. We’ve built in safeguards other For want of an alternative, the system is down and business is suppliers can only dream about. You’ve heard the war stories about lost. Sometimes forever. restart and restore data base on other systems. They’re not exaggerations, but they can be a thing of the past. W ith a Unless you have a Tandem dispersed processing system. Tandem NonStop™ System. NonStop™ Operation to be sure. This is the only system you can buy which keeps on running, right through a failure that would shut down any other system on the market today. Including a failure in a disc, a disc con­ Grow without penalty. troller, an input/output channel or one of the Start with only the computer system you need processors. right now. You can grow it all the way from a local system to a global network. And never lose Tandem takes even a critical failure one cent on your original investment. in stride. No transaction-in-process is lost or duplicated. Same hardware. Same software. With low cost additions, not replacements. No downtime. It’s worth your interest. No lost records. No lost customers. NonStop" Systems For Complete Information Contact: TANDEM COMPUTERS MANAGEMENT INFORMATION SYSTEMS PTY. LTD. 3 Bowen Crescent, 22 Atchison Street, S.G.I.O. Building, DISTRIBUTED IN Melbourne, Vic. 3004 St. Leonards, N.S.W. 2065 Cnr. Turbot & Albert Sts AUSTRALIA BY: Telephone (03) 267 4133 Telephone (02) 438 4566 Brisbane, Queensland 4000 Telephone (07) 229 3830 Software and Hardware Technology for the ICL Distributed Array Processor

R.W. Gostick*

The ICL Distributed Array Processor (DAP) is a radical departure from conventional computer architectures. Advances in both hardware technology and software technology have been combined to produce a new type of high speed computer. This paper concentrates on the relationship between the hardware and software, and shows how the various components have been integrated within the 2900 system.

INTRODUCTION cessors, which use pipelining'and slaving techniques to Many of the changes in computer technology during improve the throughput of a single processor, the DAP is the past few decades have been considered simply as a an actual array of processors. means of achieving improvements to the classic computer In taking the concept of distribution of processing architecture. Changes such as solid state components re­ within the store, or active store, it is easily shown that placing valves and core have reduced costand improved per­ the most cost effective and flexible system arises from a formance but the basic distinction between processing and large number of very simple processors. The DAP takes a store has remained. Although functional distribution of set of 4096 identical processors, connected as a 64 x 64 ‘intelligence’ has increased, with separate processors for array, each with a standard 4K bit store element to make a input/output and peripheral control, the heart of all 2Mbyte store. The processors are unlike the nodes in a modern mainframes is still the central processor (CPU) with distributed processing system in that the processors all its associated store. perform the same operations at the same time. These For many applications, typically the demanding activities are controlled by a central unit, the Master Con­ scientific applications such as meteorology, it has become trol Unit, (MCU) which handles all aspects of instruction apparent that the physical separation of store and proces­ decoding and address generation. sing limits the overall speed of the ‘Von Neumann’ serial The overall structure, with one control unit and architecture. The DAP design recognises that current hard­ multiple co-ordinated processing units, is generically known ware uses the same physical devices, semiconductor chips, as a Single Instruction Multiple Data stream (SIMD) archi­ for both store and processors, and that the physical and tecture. logical division is no longer mandatory. By associating, i.e. distributing processing power within the store the restric­ 2.2 The Processing Element tions imposed by the store highway speeds may be over­ The philosophy of the DAP, of having a large number come. (See Flanders et al 1977.) of very simple processors, leads to the design of the pro­ To turn this simple idea into a productive computing cessing element (PE) shown in Figure 1. As with all com­ system requires development of both hardware and soft­ puters there are the two components of storage and pro­ ware within the new architecture. This paper shows how cessing. The storage comprises a single 4K-bit RAM, the DAP has combined modern hardware technology with considered as 4096 single bit words. The processor itself, to modern software technology through effective integration preserve full flexibility and hardware simplicity, is built within the 2900 system. around a word length of a single bit. The arithmetic within the PE is performed using the 2. HARDWARE STRUCTURE Q (accumulator) and (carry) registers and a store bit. A 2.1 Overall typical single instruction would perform a full add on these The term ‘array processor’ has been used for several three registers, leaving the sum and carry in the Q and C different computer designs, ranging from conventional respectively. High level operations, such as 32 bit add, are computers with added vector instructions, to attached built up from cycles of these basic one bit instructions. This pipelined units through to the large supercomputers such as the CRAY-1 and CDC Cyber 205. Unlike the vector pro- Table 1. DAP-FORTRAN PRECISIONS “Copyright © 1981, Australian Computer Society Inc. General permission to republish, but not for profit, all or part of REAL 24 Bits-64 Bits this material is granted; provided that ACJ’s copyright notice is INTEGER 8 Bits-64 Bits given and that reference is made to the publication, to its date of CHARACTER 8 Bits issue, and to the fact that reprinting privileges were granted by permission of the Australian Computer Society.” LOGICAL 1 Bit *lnvited Paper The author is with DAP Marketing Unit, International Computers Limited, 322 Euston Road, London NWi 3BD, England. Manuscript received 27 October 1980.

The Australian Computer Journal, Vol. 13, No. 1, February 1981 1 ICL DAP

N E S W For real applications, however, many operations are re­ row highway quired to be performed on subsets of the field or arrays of data. On the DAP these conditional operations are con­ trolled by the activity register. The register acts as a simple multiplexer switch, effectively inhibiting selected PEs from obeying a given sequence of instructions. As shown in Figure 2, each processing element has six major connections, four nearest neighbour connections (N,S,E,W) and two highway connections (row and column). The two distinct types of interconnection have } arithmetic and important roles in the solution of problems. Neighbour con­ logic unit nections are fundamental to the solution of field equations as found in applications such as meteorology. For such equations the value at one point in the field depends on activity carry values at surrounding points. The highway connections are used to communicate between the array and the MCU, and

ripple carry also to provide the valuable ‘global testing’ facilities shown adder below. • out

column highway 2.3 The Master Control Unit ~1 - The master control unit, Figure 2, as its name implies, controls the processing functions of the array. In hardware multiplexer terms many of the facilities of the MCU are identical to those of any standard computer, such as instruction fetch­ neighbours ing, decoding, address generation. Most of the instructions,

MCU however, are not obeyed by the MCU itself, but are broad­ cast to the array. Since these bit level instructions are performed repetitively during bit serial processing, the MCU has a special hardware instruction buffer, with automatic incrementing of bit level addresses under control of a hard­ store (S ) ware DO instruction. 4k x 1 The general purpose 64-bit registers within the MCU perform several important functions. One general class of operations includes those between an array of values and a single (scalar) item. For these cases the scalar is held within the MCU and is broadcast across the highways to the array as required. Figure 1. Simplified Processing Element The global testing facility provided by the MCU is used by both low level subroutines (e.g. for overflow detec­ tion) and by high level statements to control the overall bit serial organisation means that in principle any flow of a program. The MCU can test whether a condition arithmetic format or operation may be programmed. The is true in all or any processing elements in two cycles. The first version of the DAP-Fortran system provided the basic first cycle uses the highway logic to perform a logical AND 32 bit fixed and floating point operations, and this will or OR across the rows of the array. This results in a single shortly be enhanced to the full range of precisions shown row (64 bits) which is then held in one of the MCU in Table 1. The ability to select the precision appropriate to the application has considerable benefits to the user in both storage and speed. In particular data is always stored and accessed in its most compact form, with no require­ Instruction ment for packing or unpacking to suit the actual hardware wordlength as on a fixed wordlength machine. The speed of the DAP, unlike conventional machines, is dependent on the wordlength, and applications using low precision, such Instruction as image processing, can benefit by an order of magnitude Modification in speed by choosing the lowest appropriate precision. At the other end of the spectrum the user is not forced to jump right up to double precision (64 bits) when single Broadcast to array precision (32 bits) has insufficient accuracy. Column highways One of the most important aspects of both the hard­ ware and software system is the use of logical variables, Processing defined and stored as single bit items. The software aspects element are as shown in Gostick (1979a) and these are supported by Row highways array the third actual register of the PE, the A (activity) register. The natural mode of working of the DAP is such that all processing elements are performing the same operation. Figure 2. Master Control Unit

2 The Australian Computer Journal, Vol. 13, No. 1, February 1981 ICL DAP

Distributed array processor

Typical 2900 computer 64 * 64 matrix VME/B system control lani

Conventional store Conventional store main program Fortran Active store (2 Mbytes) 2900 Fortran subroutines (or mixed language routines)

Store access Order code DAP-Fortran controller processor subrou tines

Peripheral controllers Figure 3. 2900/DAP Configuration Figure 4. DAP Application Structure registers. The second cycle tests this register for all TRUE (SCL). Within the system full mixed language program­ or FALSE, and jumps as appropriate. ming is possible, and the DAP software has been designed such that the DAP-Fortran system appears simply as 3. SYSTEM INTEGRATION another . By designing the system One of the major differences between the DAP and in this manner the user sees the 2900/DAP as a mixed lang­ previous SIMD machines, e.g. ILLIAC IV (Slotnick et al uage system and is not concerned with the physical rela­ 1972), is the way in which the DAP has been integrated tionship between the two types of processor. into a modern serial computer. Traditionally back-end Any application which is to use the DAP may be con­ array processors have been connected to their host com­ sidered as having serial and parallel components, which puters, which provide the user interface, by means of a require 2900 and DAP facilities respectively. Figure 4 medium-high speed channel. Since the DAP may be con­ shows how such an application is seen within the DAP sidered as an active storage module, it may be attached system. This is, of necessity, a general view, and the as store rather than as a peripheral. Figure 3 shows how the DAP attaches as a 2Mbyte store module to an ICL 2900 series system. This method of attachment means that the DAP FORTRAN system automatically benefits from all the facilities avail­ PROGRAM able within the 2900 VME/B , with ad­ vantages in both system and application development (see below). In hardware terms the total 2900/DAP system may be considered as a hybrid dual combination, with the DAP DAP RUN-TIME store acting as a common memory for the two types of processor. The 2900 system is responsible for all data man­ SUPPORT agement, input/output and loading, and in this respect the DAP store acts purely as normal memory. Any data within the DAP store may be accessed by the 2900 Order Code Processor and Store Access Controller. When a DAP program is activated the 2900 suspends serial processing DAP MANAGER for that process, and activates the DAP at the appropriate entry point. The system will have previously ensured that all code and data segments required for DAP processing have been loaded into DAP store rather than the other store modules. On completion of DAP processing the DAP returns control in the calling process. VIRTUAL STORE DAP PROGRAM Using this common memory system there are no MANAGER BLOCK specific overheads associated with loading the DAP, as there would be using peripheral connection.

3.1 Software Architecture 2900 DAP Any user of a 2900 system interacts with the VME/B operating system using the System Control Language Figure 5. DAP Run-time Structure

The Australian Computer Journel, Vol. 13, No, 7, February 1981 3 ICL DAP

Files/Libraries Macro/Process ICL Libraries ratios of both size of code and amount of processing represented in the various modules will vary between appli­ DAPFORT cations. In particular any application with no DAP-Fortran DAP subroutines sections will run unchanged on the 2900, using standard SOURCE compiler 2900 facilities, and retaining the normal user interface. By rewriting modules in DAP-Fortran such an application may DAP Assembler DAPASM be converted gradually to take advantage of the DAP Macro Assembler Support power. Format Macros Communication between the serial and parallel com­ ponents of the program is via standard Fortran/DAP-

Fortran COMMON blocks for data, with a simple CALL DAP Consolidator DAPCON and RETURN interface. This ensures maximum efficiency Input Consolidator Software by eliminating physical data transfers. Format Library

3.2 DAP Software Interface The integration of the software required for the DAP VME Object RUNDAP with the VME/B system required very few modifications to Module VME/B, since the majority of the DAP specific facilities Format already existed within the architecture of VME/B. New facilities were added in the form of the VME/B DAP Manager. Fortran program The first requirement is to load the DAP segments SOURCE Compiler into a contiguous area in the DAP store. The DAP Manager arranges for non-DAP segments within the store to be re­ located until sufficient contiguous store is available. The Figure 6. DAP Compiling System DAP program is then loaded and locked into the store. The second requirement is to manage the CALL- formed using 4096 bit (512 byte) string operations. RETURN interface between the serial (2900) and parallel The simulator is designed to use the full DAP com­ (DAP) modules. For each entry point within the DAP piling system, and programs are prepared in the standard program a link is made to the DAP Manager (Figure 5) so manner. At execution time, the user indicates that he that a standard CALL from a high level 2900 procedure wishes to use the simulator rather than the actual DAP is diverted into the DAP Manager. The DAP Manager acti­ hardware. Loading takes place as for a normal DAP pro­ vates the DAP by passing the details of the target entry gram, but the final stage of moving segments into the point into a special set of registers within the DAP store DAP does not take place. The simulator takes the place access controller (DAC). These registers form part of the of the DAP Manager in Figure 5, and when a CALL is made standard 2900 ‘image store’ mechanism. The Manager calls to a DAP entry point the simulator is activated. the 2900 event system to prepare for the end of DAP By building the simulator into the existing system, processing and suspends the host program. all the facilities of the actual DAP software are available, On completion of DAP processing, either normally including the full diagnostic system (see Section 5). The or abnormally, the DAP sets the image store registers speed of the simulator is, of course, slow compared with corresponding to the type of completion, and interrupts the DAP, by a factor of 103 — 104, and hence it is the 2900. The interrupt is passed via the event system regarded as an aid to testing rather than a substitute for either to the calling program, which resumes at the the hardware. CALL point, or else to the diagnostic system (see Section Further facilities have .been built in to the simula­ 4). tor to examine the low level behaviour of the DAP. On final completion of the host program the DAP Tracing is performed automatically at the instruction segments are unlocked, and the store may be reused for level, and at the end of each run histograms may be further DAP or 2900 use. produced showing the number of times each group of one or more instructions has been obeyed. The simulator 3.3 DAP Simulator also gives an estimate of the execution time expected on The feasibility of the DAP was established using a the hardware. pilot 32 x 32 PE array attached to a host emulator. During The simulator is currently being used both inter­ the building of the production DAP hardware and software nally as a design aid for future DAP designs and as a pro­ system it was advisable to have a system to test the soft­ gram development tool for 2900 sites without DAPs. ware independent of the actual hardware. Such a system would have the dual advantages of being able to test the 4. DAP-FORTRAN SYSTEM large software suite and act as a reference for the perfor­ The language and use of DAP-Fortran have been des­ mance of the actual hardware. cribed in several earlier papers (Gostick 1979a, Flanders et A software simulator for the DAP had been written al 1977). This paper concentrates on its implementation, as an earlier tool. The system simulated a general N x N and in particular the way the system takes advantage of and DAP .at the bit level, and was used as the basis for the integrates with the VME/B system. production version of the simulator. The basic action of the simulator is to take the individual instructions from the 4.1 Compiling Strategy DAP program, decode, and ‘obey’ them using standard The DAP hardware, with its bit level processing, pro­ 2900 instructions. Array instructions are typically per- vides great flexibility at the software level. To take advan-

4 The Australian Computer Journal, Vol. 13, No. 1, February 1981 ICL DAP tage of this flexibility it was decided to use a two stage OPEH REPORT IDENTIFY 1 compiling system. At the top level the DAP-Fortran com­ OPEH UK 203 ON 1977/10/20 AT 23:34:55 piler is responsible Tor the syntax and semantics of the INTERRUPT ERROR: -500 language, while at the bottom level a large set of assembler DESCRIPTION: ZERO DIVIDE written subroutines take care of the actual implementation of the arithmetic and other facilities. Figure 6 shows the PROGRAM AT LINE: 533{OFFSET:416) IN PROCEDURE:FOURTL actual stages used in the creation of a DAP program (N.B. OF MODULE:FOURTL the SCL system hides the actual stages from the novice user). SUMMARY OF ROUTE LEADING TO THE ERROR (REVERSE ORDER) The output from the compiler, instead of code, is a set of macro calls to the assembler. These macros, called the FORTRAN SUBPROGRAM FOURTL{MODULE FOURTL) AT LINE 533 FORTRAN SUBPROGRAM FORI2S(MODULE FOR12S) AT LINE 421 assembler macro format (AMF), correspond loosely to the FORTRAN MAIN PROGRAM FFT(MODULE FFT) AT LINE 45 instructions on a conventional multiple address machine. A typical call is of the form END OF ROUTE SUMMARY REPORT OF CURRENT STATE OF PROGRAM ADD MM X = M1 Y = M2 R=M3A=1L = 32T=E FORTRAN SUBPROGRAM FOURTL(MODULE FOURTL) AT LINE 533 ICENT 0 IDIM which is the call to add two matrices of 32 bit (L=32) float­ ISIGN 1 ISYM N ing point (T=E) words, pointed at by registers M1 and M2, 0 NCURR NFACT 0 NFCNT and store the result in the address pointed to by M3 under NPREV 0 NREM control of the activity register (A=1). On a serial 3 address NWORK 97 machine this corresponds to a register-to-register add, but FORTRAN SUBPROGRAM FORI2S(MODULE FOR12S) AT LINE 421 on the DAP complete matrices are being added. I * 1 ISGN - 1 K = 0 N - 32 The AMF is interpreted by the assembler, which uses NSTRNS - 1 NTYPE - 0 the set of built-in macros to generate either direct code, for FORTRAN MAIN PROGRAM FFT(MODULE FFT) AT LINE 45 simple cases such as assignment, or calls to specific sub­ A « 0.0 AA « 3.535534 routines for complex operations. The library of sub­ I FORM « 0 routines caters for all the combinations of operations, nu­ JDIM = 1 meric formats and word lengths, and hence represents a NDIM = 1 NFSYM - 0 large body of code. NTOT = 0 Splitting the overall compilation system into these J =1 differing stages gives several major advantages during the NCLBT * 0 early life of a product such as the DAP. For the initial N1 - 33 development the overall task could be broken down into CN = (0.0, 0.0) several distinct products, with clearly defined interfaces, which simplified the management of the project. Similarly, Figure 7. OPEH Oiagnostic Report significant enhancements such as the multiple data lengths could also be treated as separate developments in the compiler and subroutines. that when a program fails the same quality of diagnostics Investigations into new compiler techniques or new should be provided, regardless of the language or languages optimisations could be carried out by replacing one or used for the program. A typical high level report is shown more of the system supplied AMF macros during the in Figure 7. assembly stage, and only if the tests were successful would The OPEH system is simple in concept, and power­ the compiler or production AMF macros be changed. ful in its realisation. The use of the system depends on The DAP assembler language, APAL, is used to write information provided by the various compilers. By default, the low level subroutines and to process the compiled AMF. each compiler produces along with the loadable code It is also available to users, although the facilities in DAP- (Object Module Format) a diagnostic module for each Fortran necessitate descending to APAL only on rare program or procedure. The diagnostic module contains occasions, usually when writing small utility functions. information which relates the code to the original source of the program. When a program is running normally these 4.2 Implementation and Diagnostics diagnostic modules remain on disc with no consequent To aid the user working with several languages on the overhead on the system. If a program aborts, or if the 2900, many of the facilities provided by the various com­ program itself generates a request for diagnostics, two pilers are common. To support this at the implementation components of OPEH are activated. The first, language level, VME/B provides a common compiler environment independent, component analyses the cause of failure and (CCE), which handles many of the aspects of the user examines the state of the program stack. The current interface. Facilities such as the SCL interface, reading files, instruction address indicates the active module and hence generating listings and error messages all use the CCE, the relevant diagnostic module. This module in turn indi­ which eases the task of the developer as well as the final cates the language used to generate the code module. user. Since all compilation for the DAP is performed by Specific to each language in the system are language de­ the 2900, the CCE can be used for the DAP compiling pendant modules within OPEH. These modules are used to system. analyse the state of each active subroutine in terms of the A further facility common to high level languages original language, with the appropriate data types and on the 2900 is the diagnostic system, OPEH (Object relevant formats for diagnostic printing. Program Error Handler). The objective of the system is At each stage of the process the user has total control

The Australian Computer Journal, VoL 13, No. 1, February 1981 5 ICL DAP

of the system through options which can be set at compile tional computer. The implementation has taken full advan­ time, execution time or even dynamically during the run. tage of the advanced software facilities available within the Options range from inhibiting all printout, through to prin­ host computer to provide a full high level system. The close ting the values of every variable in every active routine in connection between the serial and parallel hardware and the program. A further option allows the user to handle his software systems optimise throughput of the total system own error recovery for some or all classes of errors (the while retaining the familiar user interface. Errortrap facility). The task of interfacing the DAP compiling system to REFERENCES the OPEH system is similar to that for any other language. 1. FLANDERS P.M., HUNT, D.J. REDDAWAY, S.F. and The DAP-Fortran compiler and assembler generate diagnos­ PARKINSON, D. (1977), “Efficient high speed computing tic records along with the loadable OMF. with the Distributed Array Processor” in High Speed Com­ puter and Algorithm Organisation. Academic Press, p. 113. When the DAP program meets an error condition the 2. GOSTICK, R.W. (1979a), “Software and Algorithms for the interrupt generated in the 2900 contains a simple error Distributed Array Processor” ICL Tech Jnl, Vol. 1, p. 116. code. Dependant on the type of error and the options 3. GOSTICK, R.W. (1979b), "Supercomputer diagnostics”, currently in use the DAP OPEH modules are loaded and the Infotech State of the Art Report on Supercomputers, p. 135. 4. SLOTN1CK, D.L., BOUKN1GHT, W.J., DENENBERG, S.A., software stack within the DAP is examined before the 2900 MclNTYRE, D.E., RANDALL, J.M. and SMAEH, D.L. stack. By making use of the OPEH system in this way, the (1972), “The 1LLIAC IV system”, Proc IEEE, Vol. 60, No. 4 DAP immediately has the benefits of modern diagnostics, p. 369. and the DAP system appears as another component of the 5. DAP Fortran Language ICL TP6918 (1979). mixed language system, as required in Section 3.1. The DAP language dependant modules within OPEH BIOGRAPHICAL NOTE and the DAP-Fortran and APAL languages contain specific R. W. Gostick graduated from Imperial College in facilities for array processing and bit level processing. 1971 with a B.Sc in Physics and subsequently gained a Details of these facilities are given in (Gostick 1979b, ICL Ph.D in Applied Physics. In 1975 he joined ICL as a Con­ 1979). sultant on scientific computing. Since 1977 he has worked SUMMARY as a Consultant with the Distributed Array Processor The DAP has used the concept of SIMD architecture Marketing Unit, concerned with applications of the DAP to provide processing power within the store of a conven­ to scientific computing.

6 The Australian Computer journal, Vol. 13, No. 7, February 1981 On Understanding Binary Research

B.P. Kidman*

A formal approach to specifications and verification leads to the identification of underlying basic search schemes and hence to precise understanding of different versions of binary search including uniform binary search. Programs based on these schemes are simple, general and adaptable. Keywords and phrases: Binary search, program understanding, program verification. CR categories: 1.3, 3.74, 5.24.

1. INTRODUCTION 2. BASIC SEARCH SCHEMES Binary search is a well known technique and many Identification different versions of the algorithm have been published Stated loosely, the given problem is to search table R and used (Knuth, 1973). The underlying principle is for an element (with key) equal to the search key x. More quite simple, but that the precise details can be trouble­ precisely and generally, the ordered table or table segment, some is demonstrated in inadequacies or errors in pub­ R[b:t], may be empty and may or may not contain more lished programs, further discussed subsequently. More­ than one element with a particular key; the search for an over, the adaptability, the assumptions and even the element (with key) equal to x shouldjocate the index of correctness of any given program are not obvious from the element if found, and also the position where x would visual inspection. be inserted into the ordered table. The value of a more formal approach to specifica­ Bottenbruch (1962) first published a program along tions and correctness is examined in this light. Precise for­ the lines of these algorithms, which work by keeping upper mulation of the loop invariant is itself useful documenta­ and lower table indices, i and j say, to define the segment tion particularly as its form is closely related to those of the table remaining to be searched. At each step, m is aspects of the program which seem most difficult to get selected in the middle of i. .j, x is compared with R[m] and right. Identification of basic search schemes is based, in part, on the underlying invariant assertions. Correct pro­ grams can be derived easily from the basic schemes, and if scheme(1) s cheme(2) necessary, tailored to specific requirements and tuned for efficiency while maintaining correctness. Verification is i:=b; j :=t; i: =b-*l; j :=t+l; straightforward. while Kj+l do while ib then if i>b-l then more of Alagic and Arbib (1978), Hantler and King (1976), found:=x=R[i-1] f ound:=x-R[i] else found:“false; else found:“false; or Kidman (1978), for example. We express algorithms in Pascal, and assertions in the scheme(3) notation of predicate calculus; for reasons of clarity examples are presented as general Pascal programs rather {b>=0} i:~b ; j : — c-M ; than more general schemes. In logical assertions we abbre­ while l for implication. j:=m else i: =m-f 1 “Copyright © 1981, Australian Computer Society Inc. end; General permission to republish, but not for profit, all or part of if i>b then this material is granted; provided that ACJ’s copyright notice is found:=x=R [i-1] given and that reference is made to the publication, to its date of else found:“false; issue, and to the fact that reprinting privileges were granted by permission of the Australian Computer Society.” Figure 1. Basic scheme programs *The author is with the Department of Computing Science, University of Adelaide, Adelaide, SA 5001. Manuscript received 20 June 1980 and revised 18 August 1980.

The Australian Computer Journal, Vol. 13, No. 1, February 1981 7 Binary Search

i or j adjusted accordingly. The search loop terminates assumption: ordered(R,b,t) & I2 & i xb—1 => x>R[i] {12} schemes (see Figure 1) differing in the relations between & (i+j)/2b—1 {12} general schemes but use the specific relational operator < in & b i<(i+j)/2—1 {axiom, i for a table with reversed ordering; using < causes the The following axioms for integer division are assumed element with the highest index to be selected from a group in the proofs with equal keys, using < that with the lowest index. In all schemes, j-i is reduced at each step of the scheme (1) : i i<(i+j)/2 i < (i+j)/2 < j operator may be used for computing m in schemes (1) and scheme (3) : i i < (i+j)/2 < j. (2), but scheme (3), which appears, in one sense, to be a combination of (1) and (2), is associated with the calcu­ The axioms for schemes (1) and (2) will hold for any inte- lation of m by the operation L (i+j)/2 J. We note that in - division operator, that for scheme (3) holds for many programming languages (including some popular i+j)/2 Jonly. implementations of Pascal), the integer division operator R Also for all three schemes, from this same basis it truncates the quotient towards zero, leading to a different follows that j—i is reduced over each loop iteration, and result quotient when i+j is negative. For this reason, in the hence we can establish formally that loop execution ter­ scheme (3) Pascal program in Figure 1, the table indices are minates. For example, over the path whose verification con­ restricted to non-negative values by the initial constraint dition was given above, the resetting of j to (i+j)o7V2 b 0. It should be emphasised that the other symmetric always results in reducing the search interval (see the “combination” of schemes (1) and (2) is only a valid ter­ second axiom). minating program if m is calculated as I (i+j)/2 ”1. 3. MODIFICATIONS TO BASIC SCHEMES Any of the basic schemes (Figure 1) can be trans­ Loop invariants and verification formed into various program schemes in which loop Loop invariants for these programs are given below. iteration is somehow terminated when the table element In each assertion the first two terms closely parallel the tested happens to equal x, before the binary search is com­ conditional resetting of i and j in the program loop, and in pleted; many published algorithms are of this kind. Below expressing the current state of the search they can be we categorise these and in Figures 2 and 3 give program viewed as the principle components of the invariant. With examples based on scheme (2) of Figure 1. The programs reference to scheme (2), in an intermediate state of the could, of course, have been based on schemes (1) or (3). search we have R[i] < x < R[j], on the assumption that (a) Loop termination is determined by an additional con­ both branches of the conditional have been exercised. dition kept in a boolean flag (Wirth, 1973). (b) Loop termination is controlled by an explicit test for IV. (j. xb => x>R[i—1 ]) equality in the loop control condition (Wirth, 1976). & jb & (b ixb—1 =>x>R[i]) special value of j-i is set which will terminate the & jb—1 & (bixb=>.x>R[i-1) the underlying search controlled as in the basic & jb & (bij-1, and from the last term of I2, i while not found and (ixb—1 =>;x>R[i]. {assertion IA2} repeat begin ra: = (i+j)div 2; m: = (i+j )div 2; This, with the fact that R is ordered, is sufficient for us to if xR[m] then i: The validity of these assertions is simple enough to i: “m {assertion IB2) check by visual inspection; for illustrative purposes we else found:=true until (i“j-l) or (x“R [m] ); give, for (2), one of the formal verification conditions (for end; found: =x«R[m] ; the path inside the loop corresponding to x

8 The Australian Computer Journal, Vol. 73, No. 7, February 7987 Binary Search

(c) (d) has enabled some simplification of the assertions. It is also possible to express uniform binary search i:«b-l; j:=t+l; i:=b-l; j:-t+l; in a form analogous to the basic schemes by excluding the while 1 ! {assertion ID2) Verification of uniform binary search begin m: = (i+j)div 2; begin ra:“(i+j)div 2; if xR[m] then to regard the second and third as the principle terms. if x=R [m] then j:=m i^m end else goto 2 IU1 ; ordered(R,b,t) end; end; & (m+f+1 xb => x>R[m—f—1 ]) & (m+fb—1+oddn) 2:{found} & (f>0) & (found => x=R[m]) Figure 3. Adaptations (c) and (d) of scheme (2). & ~odd(m+t+f)

We note again their close relationship to the program loop IU2 : ordered(R,b,t) statement. IB2 is identical to 12 of the basic scheme which & (m+f+1)< => xb => x>R[m—f—1 ]) (b) to ensure that m is defined for the control test; thus, & (m+fb—1+oddn) unlike the while programs, (b) will not correctly handle an & (fSK)) empty table. The invariants for (a), (c) and (d) now include & ~odd(m+t+f) the term i>b—1=>x>R[m) rather than i>b—1=>x>R[m] of the basic scheme (2). With reference to the program in Figure 4, on termin­ ation f=0 or found, and, in general, excepting boundary IA2 : j• xb—1 => x>R[i] cases, with not found we have & found => x=R[m] R[m—1 ] < x < R[m+1 J IB2 : like I2 Thus, as R is ordered, the search will be completed after IC2 : i (j xb-1 => x>R[i] one further comparison of x with R[m], as expected. Note & b (i=j => x=Rfml) that the second bounds condition in the invariant, m—f^b—1+oddn, includes specific reference to whether or l,D2 : j xb—1 => x>R[i] not the number of table elements is odd; in fact, on loop exit f=0 and m^b—1+oddn, so that with an even number of 4. UNIFORM BINARY SEARCH table elements (oddn=0), m may go out of table index In uniform binary search (Knuth, 1973), the search bounds. The last term in the invariant expresses a further interval is defined, not by its upper and lower index subtle characteristic of the final value of m (with f=0), bounds, but rather in terms of one index m, marking the namely, that m is one of t,t—2, .... middle of the unsearched table segment, and the half width For the program in Figure 5, the condition on loop f, of that segment. While the basic principle of this is clear, exit is making the necessary adjustments to m and f requires extreme care. At the start of the k-th step, roughly f R[m—1] 0) do trolled by the key comparison; an adde'd complication is a {assertion IU1} possible out of bounds table reference requiring the setting begin of a sentinel in table R when the number of elements is if x=R[m] then found:=true even. else begin For ease of understanding and verification we have one:=ord(odd(f)); f:=f div 2; transformed Knuth’s algorithm into a structured Pascal 1f x=b) then found:=x=R[m]; former by an adjustment in the variable "one” (of value 1 or 0). The initial assumption b

The Australian Computer Journal, Vol. 13, No. 1, February 1981 9 Binary Search and hence the two appropriately guarded key compari­ {b£t} sons after loop exit. In verifying these programs we f:=t-b+l; oddn:=ord(odd(f)); assume the following axioms for integer N f:=f div 2; m:=b+f+oddn-l; while f>8 do N>0 => N/2 + N/2 + ord(odd(N))=N {assertion IU2) N>0=>,(ord(odd(N))=1) V (ord(odd(N))=0) begin N>0 => odd(M+N)=odd(M—N)=odd(M+ord(odd(N)). one:=ord(odd(f)); f:=f div 2;

On this basis and with the initial assertion, ordered(R,b,t), if x=b then found:=x=R[m]; being irrelevant because the consequence is shown to hold if not found and (m>b) then directly; oddf stands for ord(odd(f)). As f is reduced over begin found:=x=R[m-l]; m:=m-l an execution of the loop, it is obvious that the loop must end; terminate. Verification of the program in Figure 5 is similar. Figure 5. Uniform binary search without equality test. for path “x0 & ~found & x x x>R[m— f/2— oddf—f/2—1] {1U1,axiom} the algorithm or program does. On the other hand, in & m-f/2-oddf+f/2 < t {IU1 ,f>0,axiom} uniform binary search we have found that the principle & m—f/2—oddf—f/2 >b—1 +oddn {axiom} components of the loop invariant (for example & (b f/2>0 {f>0} “x x=R[m] {~found} the program (for the examples above, “if x0,axiom} m:=m—f—one” in which “one” may be 0 or 1). It is sug­ gested that this slight mismatch may be related to diffi­ for path “x>R[m] ” culties in intuitive understanding of details of the pro­ assumptions: IU1 & f>0 & ~found & x>R[m] gram. conclusions: ordered (R,b,t) {lU1} & . . .=> xx>R[m+f/2+oddf—f/2—1 ] { IU1,x>R[mj,axiom} The principle components of the loop invariants & m+f/2+oddf+f/2 < t {IU1,axiom} of binary search loops can be regarded as reflecting the & m+f/2+oddf—f/2 > b—1+oddn { IU1 ,f>0,axiom} state of the search both for intermediate and final states. & (b f/2X) (fX)} For standard forms of the algorithm we can think in terms & found => x=R[m] {~found} of a simplified invariant, such as the following which is & ~odd(m+f/2+oddf+t+f/2) { IU1,f>0,axiom} applicable to scheme (2) for path “x=R[m] ” R [ i 1 0 & ~found & x=R[m] conclusions: IU1 with true {x=R[m]} with i=j—1 on loop termination; for uniform binary replacing found search (Figure 5) we have

5. DISCUSSION R[m—f—1 ] < x < R[m+f+l J Comparison of programs and proofs The need for care over detail is characteristic of pro­ with f=0 on termination. Hence in the former case one gramming and the binary search algorithms are no excep­ extra table access after loop exit finalises the search, and tion in this regard. The details of uniform binary search, in the latter case two. When the search loop includes a test despite the simple underlying principle, are exceptionally for equality, the informal invariants have < replaced by hard to understand and verify by visual inspection and < and there are consequent changes to search finalisation. tracing. This difficulty is not parallelled in the correct­ While this informal invariant gives us an overview of ness proof presented in outline above, which, though what the programs do, to understand fully the detailed relatively longer than for more standard versions, is never­ form and behaviour of a particular program we need to theless, straightforward. The loop invariant is similar in look at the formally stated and rigorously established form but contains an extra term, and there are two dif­ logical assertions given earlier. In these assertions the rela­ ferences in the proof of the invariance of these assertions tions between x and the table elements are embedded in for uniform binary search; firstly variables in all terms of logical implications which reflect whether or not the the invariant are subjected to change along both flow paths branches of the conditional statement inside the loop have in the loop, and secondly the table ordering is required at been executed; the informal invariant, in fact, only applies this stage of the proof. after both branches have been executed.

70 The Australian Computer Journal, Vol. 13, No. 1, February 1981 Binary Search

Program details in basic schemes the loop statement (Figures 2 and 3), on search failure, One of the particular troublespots of binary search loop execution proceeds as in the basic schemes resulting programs lies in the initialisation of the boundary indices in a total of L table accesses (L+1 for (b)), but on success, i and j in scheme (2) and scheme (3) programs. The there is approximately one less table access on average for initialisation in our basic schemes is arranged so that the moderately large N (Flores and Madpis, 1971). Note that search will cover the whole table without making any this is counting the two array references in the programs assumptions about the search key with respect to the of Figures 2 and 3 as one table access. However, the loop table. The value of looking at the formal statement of the statements and the loop control tests in these programs loop invariants is that the preconditions of the implica­ involve more processing than the simpler basic schemes of tions in these assertions involve the initial values specifi­ Figure 1, so that in many situations the basic scheme cally. Often published programs based on scheme (2) do programs will be more efficient than the programs which not conform in this regard. For example, Dijkstra (1978) involve one less table access. demonstrates how to develop a scheme (2) program from In uniform binary search the bottom level of the a simplified invariant when i and j are initialised to b and t, decision tree is probed (outside of the loop in our pro­ not b—1 and t+1 as in Figure 1. This gives a correct pro­ grams) on all unsuccessful paths irrespective of whether gram because of an initial assumption, R[b] xb => x>R{i] manner (see above) consistent with the relational operator used in the comparison of x with the table entry. On loop termination, if i=b then x

The Australian Computer journal, Vol. 13, No. 1, February 1981 11 Binary Search

DIJKSTRA, E.W. (1978): “Programming Language Systems”, ed. WIRTH, N. (1976): “Algorithms + Data Structures=Programs”, M.C. Newey, ANU Press, Canberra, p. 1. Prentice-Hall Inc., Englewood Cliffs, New Jersey. FLORES, I., and MADPIS, G. (1971): “Average Binary Search Length for Dense Ordered Lists”, Comm. Assoc. Comp. Mach,, vol. 14, p. 602. HANTLER, S.L. and KING, J.C. (1976): “An Introduction to BIOGRAPHICAL NOTE Proving the Correctness of Programs”, Assoc. Comp. Mach. Barbara Kidman was awarded a B.Sc. degree with Computing Surveys, Vol. 8, p. 331. First Class Honours in Physics from the University of KERNIGHAN, B.W. and PLAUGER, P.J. (1978): “The Elementsof Programming Style”, McGraw-Hill, New York. Adelaide in 1948. Over the period 1949-57 she was engaged KIDMAN, B.P. (1978): "An Introduction to Program Verification”, in research in biophysics, first at the University of Oxford Proc. 8th Australian Comp. Conf., Canberra, Vol. 2, p. 877. (1949-51) and then at the University of Adelaide. She ob­ KNUTH, D.E. (1973): “The Art of Computer Programming”, Vol. tained a Ph.D. degree from the University of Adelaide in 3, Addison-Wesley Publ. Company, Reading, New York. McKEEMAN, W.M. (1974): “Compiler Construction”, Lecture 1955. From 1966 to 1970 she worked as a computer pro­ Notes in Computer Sc., Vol. 21, p. 251. grammer at the University of Adelaide. She was appointed WEGBREIT, B. (1977): "Constructive Methods in Program Verifica­ lecturer in Computing Science in 1971 and later became tion”, IEEE Trans, on Software Engineering, Vol. SE-3, senior lecturer. Her research interests are the theory and p. 193. WIRTH, N. (1973): “Systematic Programming”, Prentice-Hall Inc., practice of programming, programming languages and Englewood Cliffs, New Jersey. Computer Science education.

i2 The Australian Computer Journal, Vol. 13, No. 1, February 1981 Some trends in System Design Methodologies

I.T. Hawryszkiewycz*

Criteria for effective design methodologies are defined and some design methodologies com­ pared against these criteria. The criteria call both for disciplined set of integrated design stages as well as for techniques that link program and data base design in a semantically consistent way thus reduc­ ing program complexity. Some shortfalls of existing methodologies in meeting these criteria are discussed and suggestions made for future development. Such suggestion includes calls for closer integration of program and data base design, and for the development of adaptable systems that allow designers to adapt the computer interface to the semantics of the user lever. Keywords and phrases: data analysis, design methodologies, database logical structure, systems design, software engineering. CR categories: 3.50, 4.33, 4.34.

INTRODUCTION tures both at the user and at the computer system level and There has been considerable effort in the last few the effect of interposing the design methodology semantics years to improve the system development process and between these two levels. Little theory or few design guide­ ensure that new systems meet user requirements within lines exist in this area. Some problems that can result from agreed time and cost estimates. Such efforts are expected such semantic relationships are outlined in this paper and to continue given greater investment in new and increasing­ shortcomings of existing methodologies discussed. ly more complex systems together with the increasing ratio Such shortcomings indicate possible future trends. of software development to hardware cost. General consen­ One possibility of avoiding semantic problems is to develop sus is that many problems in systems development are computer systems that are adaptable to a variety of user caused by semantics; the use of conceptual schema in three level data — the difficulty of correctly capturing all user require­ base architectures is one example of this trend. By adapt­ ments in clear and unambiguous ways, ing the semantics at the computer system level to that of — the lack of standards in reducing user requirements to the user level eliminates the need for any intermediate a system design, and methodology semantics. — unclear project control and administration. Apart from such fundamental semantic issues, design Consequently improvements in system development methodologies must also fit within the general system usually concentrate on these three areas. This paper deals development environment. To do this they should be divis­ with the second of these areas; it considers the role of ible into well defined steps that form a controlled develop­ design methodologies in standardising design processes. To ment cycle. Design methodologies must then satisfy criteria do this, design methodologies provide a set of design steps determined by fundamental semantic considerations as well through which a designer can proceed. Standard rules are as those imposed by the environment in which design takes made available for conversion from one step to the next to place. Let us commence with a general framework that eliminate problems that often arise with ad-hoc describes the latter. conversions. The state of development of design methodol­ ogies is still in its infancy with a number of methodologies THE GENERAL FRAMEWORK proposed but none yet proven complete or superior in any A design methodology is here considered to be one of way. Perhaps one reason for this is that the principles of the three components that make up system development; these design process and information systems structure are not components are yet developed; hence theory to evaluate methodologies • the system development cycle, which defines the does not yet exist. It is then not generally clear what reporting stages through which a project proceeds. methodology is best suited for a given problem; for The tasks at each stage together with their inputs example, using an Entity-Relationship based methodology and outputs are defined and the documentation at with a given problem may yield a different design than.say each stage is specified; this documentation is subse­ a reduction methodology with the’same problem. quently used in reviews that precede management To evaluate methodologies in this context requires an approval to commence following stages. understanding of the relationship between semantic struc- • a project management system to monitor the progress of a project and take corrective action whenever “Copyright © 1981, Australian Computer Society Inc. problems arise. The project management system is General permission to republish, but not for profit, all or part of this material is granted; provided that ACJ’s copyright notice is closely integrated with the system development cycle given and that reference is made to the publication, to its date of as it uses the reports produced at each stage of system issue, and to the fact that reprinting privileges were granted by development. permission of the Australian Computer Society.” *The author is with the School of Information Sciences, Canberra CAE, Belconnen, ACT 2616. Manuscript received 31 October 1980 and revised 23 December 1980.

The Australian Computer Journal, Vol. 13, No. 1, February 1981 13 Design Methodologies

Table 1. Criteria to evaluate Design Methodologies vant to computing systems design, where it is necessary to represent both the events and the structure of a given 1. Specifications at all levels should be complete both in information system. Apart from modelling both the events — the level of detail, and — the semantic power necessary to express the and the data structure the seventh criteria calls for the design. design methodology to create data structures which allow 2. A formal notation should be available at each level to events to be modelled by programs that clearly and unambiguously specify the design at the • are not unnecessarily complex, and level. • satisfy processing time and response requirements 3. The formal notation at the level should be clearly understood by the designers and/or users involved at during execution. that level. 4. There should be continuity between levels providing DESIGN METHODOLOGIES the designer the ability to formally convert one level Three terms are used here to describe a design to the next. 5. The design should use principles based on sound methodology — process, technique and specification level. theory to provide clear and acceptable criteria to make The design methodology is seen as a process made up of design choices. 6. Both events and data structure must appear in the any number of specification levels. Each specification level specification. uses some technique to elaborate the design, and some lang­ 7. A linkage must be maintained between the events and uage is used at each level to formally describe this design; data structure throughout the design process. • the language is part of the techniques used at that level. The concept of specification level and language is applicable to any design process, formal or informal. Most • a design methodology that encompasses the tech­ early computer system design processes consisted of only niques used to design the system. The goal of such two levels, namely methodologies is to integrate the design techniques — the user level where the user problem is described in into a rigorous software engineering process that terms natural to the user, and reduces the user specification to a computer-based — the computer level, which requires the user problem information system through a number of design to be expressed, using the language of some particu­ levels. lar system. The general trend is to integrate these three compon­ Experience has found design processes composed of ents both vertically and horizontally; vertically by inte­ these two levels to be wanting; translation from one level to grating all the tasks within each component and horizon­ the other is complex with consequent introduction of tally by integrating tasks between components. errors. Hence, as in engineering design, the trend in compu­ One way to achieve such integration is to commence ter system design has been to introduce additional levels of with the design methodology by grouping the design specification to convert user specifications to a system techniques into levels that mark off some well-defined design in incremental and more controllable steps. design objective. These levels are then adapted to the stages It is, however, not clear what the optimum set of of the system development cycle and subsequently to a pro­ levels is to be nor what techniques are to be used at each ject management system. specification level. On the one hand the goal must be to reduce the number of specification levels as each level CRITERIA FOR EVALUATING DESIGN requires both documentation and learning the technique METHODOLOGIES and associated language thus adding to the workload; on The effectiveness of a design methodology in serving the other hand a sufficient number of levels must be provi­ this role can be judged in terms of criteria consistent with ded to ensure an ordered reduction from user requirements the general framework. to the system. As a result there is a large variety of design In the first instance, a design process must possess the stages. Atkinson2, for example, reports a survey that shows necessary administrative support functions to link it to the ranges of 4 to 11 for reporting stages and 40 to 216 for SDLC or project management function that make up the activities in the system development process. Perhaps one environment. This will to some extent depend on the reason for such variation is the dependence of the design organisation that is using the design methodology. Such process on the chosen techniques as some techniques organisations usually have a set of established procedures require more steps than others to produce a design. Under that must be followed in a development process and the these circumstances there seems to be little point in design methodology must conform with them. proposing an ideal set of levels and an empirical approach Design methodologies must also, however, satisfy a appears to be attractive when choosing a design methodol­ second set of criteria — those that measure their cohesive­ ogy. One should not, however, leave the reader with the ness in integrating a set of techniques into an ordered and impression that any empirical set of levels are the ideal. effective design process; criteria for this purpose are sug­ They may suit today’s state of knowledge and technology gested in Table 1. but an alternate set of levels may prove attractive given The first four of these criteria are general and applic­ alternate technologies or design techniques; some such able to any design process/computer or otherwise. So is the approaches are discussed later. fifth, namely an underlying theory. In terms of theory, computer system design is relatively immature when com­ THE EMPIRICAL APPROACH pared to engineering design processes, which are founded In the empirical approach, system design methodolo­ on well understood principles such as those aerodynamics, gies can in the first instance by subdivided into three mechanics or circuit theory; these provide well-founded categories criteria on which alternative design choices can be made. — problem specific methodologies — those concerned The last two criteria in Table 1 are particularly rele- with some particular problem kind

14 The Australian Computer Journal, Vol. 13, No. 1, February 1981 Design Me thodologies

INFORMATION ANALYSIS ----- > LOGICAL DESIGN ------^ PHYSICAL DESIGN .add pointers .decide on block structure DIVISION

DIVISION

vSETl

[Si) FIND SECTION RECORD. DEPARTMENT

DEPARTMENT

SET2

(S2) FIND ACCOUNT RECORD.

SECTION SECTION ACCOUNT

.SET3 /SET4

LINK STORE LINK INSERTING INTO SETS SET3, SET 4. ACCOUNT DESIGN STRATEGY

DATA FLOW CONSTRUCT APPLY COST DIAGRAM ACCESS PATH FUNCTION

compute . logical record access . storage requirements . response estimate

Figure 1. Entity-Relationship based methods

— computer system methodologies — those oriented schema and associated program structure charts, and towards a particular computer system — physical design to choose the physical access that in — general methodologies — these are independent of a some way satisfy performance requirements. problem kind or computer system. Given this empirical framework and the set of criteria Problem specific methodologies deal with well- in Table 1, some general trends can be detected in design defined problems; some examples of these are discussed by methodologies. For example, most methodologies now Leavenworth and Sammet11. Computer system oriented formalise data flow analysis by using graphical techniques methodologies on the other hand deal with particular (criteria 2), which on the most part are meaningful to both systems; Raver and Hubbard14 describe a method for designer and user (criteria 3). Examples include data flow analysis and design of IMS databases. diagrams (De Marco4, Gane and Sarson5) or the SADT This paper is primarily concerned with methodolo­ diagrams described by Ross15. Most of these methods use gies that are independent of both the problem kind and a top-down elaboration to capture the. level of detail as particular computer system. Most writers describe such required by criteria 1. Data flow analysis techniques are methodologies in terms of four empirical design levels, primarily concerned with event analysis; hence criteria 6 namely and 7 usually get secondary consideration; data structures — data flow analysis to determine the events and the in data flows, are tabulated in data dictionaries; criteria 7 data flows associated with them is met by maintinaing cross-references between events at — information (or data) analysis to provide a formal data structures through “WHERE-USED” reports. model of the organisations data structure Reduction to the logical data level on the other hand — logical design to develop an initial data structure and shows more variation. It is at these steps that users require-

The Australian Computer Journal, Vol. 13, No. 7, February 1981 75 Design Methodologies

LOCATE-ACCOUNT contain data on actual expenditures by the sections. FIND ACCOUNT RECORD This kind of information structure is typical of many DO SECTION-SEARCH flow structures through organisations as for example, job flow through a job shop, where expenditure incurred at the SECTION-SEARCH end of each number of different processing stages is recor­ DO UNTIL END-OF-SET4 = TRUE ded. BEGIN The conversion from the information structure to the FIND NEXT RECORD IN SET4.* data structure can be formalised although the conversion IF NOT END-OF-SET4 THEN rules obviously depend on, and are constrained by, the database management system used. Figure 1 illustrates BEGIN conversion rules to network structures; here FIND OWNER RECORD IN SET3 SET • each entity becomes a record type DO SECT—TO-DIVISION-MOVE • a1 :N relationship becomes a set that contains the two END entity records, and SECT—TO—DIVISION-MOVE • a M:N relationship becomes an association structure FIND OWNER RECORD OF SET2 SET composed of a LINK record that maintains the inter­ FIND OWNER RECORD OF SET1 SET section data in the M:N relationship. DO PRINT In most methodologies, events at the logical level are linked to the logical access structure by access paths; Figure 2. Finding Responsible Divisions however there are usually no formal rules to convert event descriptions to the access path. The formal syntax for the merits are first expressed by the semantics of the particular access path is also generally under development. A cost analysis technique; they are then converted to a logical function is then applied to estimate performance and this is structure with further constraints imposed by the semantics used to iteratively optimise the design. Figure 1 includes an of the data model of an available dbm system. Furthermore access path for adding intersection data for a given any mismatches between the semantics of the user structure ACCOUNT and SECTION. Two initial steps S1 and S2 find and the semantics of subsequent levels usually manifest the appropriate SECTION and ACCOUNT record and then themselves in complex programs that express event step S3 is used to STORE the intersection data. The cost semantics in program logic, which is consistent with the function depends on the dbm system and some agreed data semantics imposed by the methodology and later the method for estimating storage accesses for each kind of step data model. This will be discussed later. are used in the analysis. Teorey and Fry18 and Davenport^ It is this interrelationship between user semantics, the describe some documentation procedures that can be used semantics of information analysis and those of the data to arrive at such estimates. model and its subsequent effect on program structures that Viewed against the criteria of Table 1 such makes comparison between methodologies difficult. This is methodologies particularly so as there is both a lack of clear semantic • have a formal and relatively well understood for­ theory of user information structures or the equivalence malism for representing data structure both at the between these structures and various semantic and data information analysis and logical levels models. Hence care must be exercised when evaluating • use linkages between the data and event structures at success claims of various methodologies to ensure that such the logical level to develop design data that is used to success was achieved in a general sphere rather than being a optimise performance requirements rather than to fortuitous match between the methodology semantics and develop simple program structures the semantics of a particular problem. Within this environment two broad analysis tech­ niques can be identified, namely DIVISION • E-R (Entity-Relationships) based methodologies, and • data reduction methodologies either based on — event reduction, or — data reductions.

E-R BASED METHODOLOGIES DEPARTMENT As shown in Figure 1, a design based on E-R tech­ niques commences by describing data by entities and relationships together with their attributes. Figure 1 shows four entity kinds - DIVISION, DEPARTMENT, SECTION and ACCOUNT. There are three relationships that show that SECTION • one DIVISION can supervise N DEPARTMENTS • one DEPARTMENT can supervise N SECTIONS • ACCOUNTS are associated with SECTIONS; each M ACCOUNT can be associated with many sections and M each section can be associated with any number of j ACCOUNT accounts. Thus, say, a CAPITAL account can be associated with all sections as can a LABOR account. The relationships between sections and accounts Figure 3. Expanded Requirements

16 The Australian Computer Journal, Vol. 13, No. 1, February 1981 Design Methodologies

DIVISION units, irrespective of type, and hierarchial reporting struc­ LINK2 tures. The data structure semantics have relationships that SET1 distinguish between organisational units. Such differences SET3B SET4B require unnecessary code complexity to express event SET6 DEPARTMENT semantics in terms of data semantics. It is possible to LINK1 reduce such complexity by using data semantics that more SET 2 naturally express data structures. As an example, Figure 6 SET3A SET4A' models the same user requirements using functional depen­ SECTION ACCOUNT dence (F-D) diagrams together with the role concept8. For convenience only the key attributes together with func­ SET 3 SET 4 tional relationships are modelled. The role concept is used LINK to express • the hierarchial relationship between units; a Figure 4. Reducing Figure 3 to a Network Structure MANAGED-UNIT and MANAGER-UNIT roles are introduced (their domain is the same as ORG- UNIT); these represent ORG-UNIT roles of being lack formal conversion rules from event description at managed and managing. A functional dependency the data flow level to program code at the logical from MANAGED-UNIT to MANAGER-UNIT models level, and the unit to which MANAGED-UNIT reports. have a set of semantics that requires users to express • the association between accounts and units is mod­ their requirements by entity-relationship concepts. elled by the roles USED-BY-UNIT (whose domain is It is perhaps interesting to consider this last comment the same as ACCOUNT) and ACCOUNT-USER in light of the interaction between the semantics used in (whose domain is the same as ORG-UNIT); a func­ analysis, the conversion rules to logical structures and the tional dependency from each combination of final program produced. As an example consider the pro­ ACCOUNT-USER and USED-BY-UNIT models the gram structure to find those DIVISIONS whose SECTIONS expenditure incurred by the unit or an ACCOUNT. incurred expenditures on a given ACCOUNT. The corres­ In Figure 6 DIV-NO, DEPT-NO, and SECT-NO are ponding program structure is shown in skeleton form in the identifiers of the different types of organisational unit. Figure 2. Here For reader’s convenience a tabular representation of • the account is located the structure is given in Figure 7. The logical network struc­ • all SECTIONS associated with an account are then ture, which is derived from this information model is found through their association with the ACCOUNT shown in Figure 8. Here entities and relationships are again record through sets SET-4, and converted to record types; the richer model semantics now, • the responsible DIVISION is then found by finding however, require additional conversion rules. The types of owners in sets SET-2 and SET-1. unit are converted to record types, (DIVISION, DEPART­ This code looks relatively straightforward. However, MENT, SECTION) as are roles (MANAGER and consider the kind of complications that result with ex­ MANAGED). The relationship between role and role tension to the user structure shown in Figure 3. Now there is a more realistic picture where some SECTIONS may report directly to DIVISIONS without an inter­ LOCATE-ACCOUNT. FIND ACCOUNT RECORD. mediate DEPARTMENT. Expenditure may now also be DO SECTION-SEARCH. directly incurred by DIVISIONS and DEPARTMENTS as DO DEPARTMENT-SEARCH. well as SECTIONS. DO DIVISION-SEARCH. Applying the earlier reductions to the E-R diagram SECTION-SEARCH. in Figure 3 yields the logical data structure shown in DO UNTIL END-OF-SET4 = TRUE Figure 4. There are now different relationship records, BEGIN LINK, LINK1 and LINK2 one for each of the M:N FIND NEXT RECORD IN SET4. relationship between an ACCOUNT and type organisa­ IF NOT END-OF-SET4 THEN BEGIN tional unit. There is also an additional set, SET6, to rep­ FIND OWNER RECORD IN SET-3 SET. resent direct reporting links from SECTIONS to IF SUPERIOR = "DEPT" DO SEC-TO-DEPT-MOVE. DIVISIONS. IF SUPERIOR = "DIV" DO SEC-TO-DIV-MOVE. Such additional structures obviously effect programs END that correspond to events. To find the DIVISIONS whose SEC-TO-DEPT-MOVE. subordinates incurred expenditures on an ACCOUNT the FIND OWNER RECORD OF SET2 SET. program in Figure 2 must now be converted to that shown DO DEPT-TO-DIVISION-MOVE. in Figure 5. It includes a different paragraph for each of the DEPT-TO-DIVISION-MOVE. different relationships between ACCOUNT and a kind of FIND OWNER RECORD OF SETl SET. organisational unit. In addition different hierarchial DO PRINT. searches to a division must be made; these depend on DEPARTMENT-SEARCH. whether a SECTION or DEPARTMENT is the unit that incurred an expense. . similar to SECTION-SEARCH The program is now much more complex. It may then be argued that this program complexity results DIVISION-SEARCH because of a mismatch of data structure and event semantics. The event semantics define expenses incurred by Figure 5. Finding Responsible Division from Figure 3

The Australian Computer Journal, Voi. 13, No. 1, February 1981 17 Design Methodologies

MANAGER-UNIT DIV-NO

MANAGED-UNIT »| ORG-UNIT 4. J DEPT-NO

SECT-NO

ACCOUNT-USER

USED-BY-UNIT *| ACCOUNT

Figure 6. Using Functional Dependence

DIV1

1 1 DIVISIONS DEPT DEPT 2 f | NAME BUDGET UNIT-NO r——1 DIV1 150 1 SEC2 SEC3 SEC4 SEC1

DEPARTMENTS

NAME PERSONNEL UNIT-NO REPORTING DEPT1 62 2 MANAGER MANAGED DEPT 2 55 3 1 2 1 3 SECTIONS 1 4 2 5 NAME FUNCTION TARGET UNIT-NO 2 6 SEC1 MACHINE 17 4 3 7 SEC2 ASSEMBLE 105 5 SEC 3 PAINT 32 6 SEC4 PLAN 17 7 ACCOUNT-USE ACCOUNT-USER ACCOUNT-USED AMOUNT ACCOUNTS -BY-UNIT ACCOUNT -NAME 2 LABOR 123 LABOR 1 MATERIAL 75 MATERIAL 1 LABOR 80 6 MATERIAL 315

Figure 7. A Tabular Representation of Figure 6

18 The Australian Computer Journal, Vol. 13, No. 1, February 1981 Design Methodologies

ACCOUNT ORG-UNIT

BETA SETB. TYPE

LINK

DIVISION DEPARTMENT SECTION

MANAGER MANAGED HIER Figure 8. Network Data Structure based on Functional Dependence

then also express the events in terms of such seman­ player is expressed by set membership of role within role tics again requiring complex translation given mis­ player (sets M-R and M-D). Entity types are related to their matches between event and data semantics. generic parent through multi-member sets (TYPE in Figure For example, Figure 11 describes the information 8). Readers will note that ACCOUNT only plays one kind structure of Figure 3 using DeMarco’s method. The inputs of role; hence no distinction is made between the entity to the system create accounts, divisions, departments and and the role records in the logical structure. The new pro­ sections and also define hierarchial relationships between gram, with the same function of that in Figure 5, is now them and the expenditures. There is only one output — the shown in Figure 9. The code in paragraph FIND- expenditure to an account by all units associated with a EXPENDITURES expresses the semantics of association division. The specification structure is influenced by the of accounts with organisational units without distinguish­ repeating group structure semantics of most data diction­ ing between type of unit. Thus the need to search three aries. Figure 11 also illustrates the results after DeMarco’s relationships between units and accounts is replaced by reductions are applied to the output requirement. The one search of relationship between ACCOUNT and ORG- result is four files; a program similar to that in Figure 5 UNIT. Similarly different hierarchial searches are replaced would need to search these files to answer the query. by one recursive procedure FIND-DIVISION. The program However, as each file now contains the responsible division, remains unchanged if additional units are introduced into the hierarchial search of Figure 5 is no longer necessary. the organisation. In fact the program in Figure 9 applies This is symptomatic of most event reduction processes both to the information structure shown in Figure 1 and which tend to orient files to reports by including the Figure 3. Hence there is more data independence. What majority of items required in one report in the minimum we have done so far is illustrate the strong relationship files. that exists between the semantics used in information Methodologies that produce one file for each report analysis and the programs that model of events. In appear attractive especially where requirements are for a general, most methodologies do not consider this rela­ small number of reports as in batch systems. Their useful­ tionship and concentrate on data structure design given ness in producing suitable designs where the report require­ the information analysis semantics; any mismatches ments are large and ad-hoc is however questionable. between data and event semantics must be subsequently implemented using program code. This is also true of LOCATE-ACCOUNT. reduction methods. FIND ACCOUNT RECORD. DO UNTIL END-OF-SET = TRUE REDUCTION METHODS BEGIN The most common approach is to reduce data FIND NEXT RECORD IN SETA. structures to 3NF relations. The same alternatives as IF NOT END-OF-SET DO FIND-EXPENDITURES. END. those used in E-R reductions are also possible here; for example Figure 7 and Figure 10 illustrate two alter­ FIND-EXPENDITURES. natives 3NF representation for the requirements in BEGIN Figure 3. Again it is up to the program to convert the FIND OWNER RECORD OF SETB SET. event semantics into semantics that match the chosen IF ORG-UNIT-TYPE = "DIVISION" THEN PRINT ELSE FIND-DIVISION structure. PRINT. There is also a class of reduction method that END commence with event descriptions and reduce these descriptions to data structures. Perhaps the most FIND-DIVISION. popular of these methodologies in this country is DO UNTIL ORG-UNIT-TYPE = "DIVISION" that developed by DeMarco4. Here events are des­ BEGIN FIND FIRST MANAGER RECORD WITHIN M-R SET. cribed by a data dictionary formalism, which is FIND OWNER RECORD OF HIER SET. reduced to a data structure using a prescribed set of FIND OWNER RECORD OF M-D SET. steps. However, the data dictionary formalism requires END. events to be expressed in terms of the data dictionary semantics, which are repeating groups. Programs must Figure 9. Finding Responsible Division from Figure 8

The Australian Computer journal, Vol. 13, No. 1, February 1981 19 Design Methodologies

DIV-DEP ACC-DEP DIVISION DEPARTMENT ACCOUNT DEPT AMOUNT DIV1 DEPT1 LABOR DEPT1 123 DIV2 DEPT2 ACC-DIV DIV-SEC ACCOUNT DIVISION AMOUNT DIVISION SECTION MATERIAL DIV1 75 DIV1 SEC1 LABOR DIV1 80

DEP-SEC ACC-SEC DEPARTMENT SECTION ACCOUNT SECTION AMOUNT DEPT1 SEC2 MATERIAL SEC3 315 DEPT1 SEC3 DEPT2 SEC4 replaces ACCOUNT-USE in Figure 7 replaces REPORTING in Figure 7

Figure 10. An Alternate 3NF Reduction

Summary of Current Methodologies of these are used to optimise database design given an In summary design methodologies have introduced initial structure; examples include Alter1 or Mitoma and discipline in the design process in the main satisfying cri­ Irani13. teria 1 to 4 in Table 1. At the same time it may be noted Formal techniques to convert specifications from one that additional development is necessary to overcome some level to the next also exist; most generate programs for one of their shortfall especially those identified by criteria 5 to particular processing method. Some are restricted to par­ 7. Such shortfalls have been here identified as ticular data structures. Jackson’s method10 is one of the • insufficient attention to semantic relationships better known methods; it generates programs that produce between data structure and programs with resulting reports from certain classes of data structures. Other repor­ program complexity; this is often caused by an ted examples of this class of generator includes an inter­ inflexible set of semantics used by each methodology active program generator for IDMS files described by and • the workload involved in the conversion between levels. WRITE DIV + ACCOUNT + DIV-EXPENDITURE DEPT + ACCOUNT + DEPT-EXPENDITURE PROPOSALS FOR NEW METHODOLOGIES SECT + ACCOUNT + SECT-EXPENDITURE Given this, it can be expected that future develop­ ACCOUNT + account attributes ments will be oriented towards • filling the gaps in existing methodologies DIV + division attributes • automating some levels of design, and DEPT + department attributes • introducing new techniques to overcome the semantic SECT + section attributes DIV + DEPT ) constraints imposed by particular methodologies. ------I impact of hierarchial DIV + SECT-IN-DIV ) relationship between AUTOMATED AIDS DEPT + SECT units Automated aids appear as an attractive alternative in the first instance. However, aids are usually extensions of READ manual design processes, and as such exhibit the same ’ ACCOUNT + {DIV + DIV-EXPENDITURE + {DEPT + DEPT-EXPENDITURE semantic problems. j------+ tSECT + SECT-EXPENDITURE}} Most aids are limited in their applicability and are L + {SECT-IN-DIV + SECT-EXPENDITURE}} restricted to some particular problem, or a particular computer system or only a subset of design level. Most available aids are related to one design level or at most have ACCOUNT + DIV + DIV-EXPENDITURE ACCOUNT + DIV + DEPT + DEPT-EXPENDITURE a limited scope of conversion between two adjacent speci­ REDUCES fication levels. The PSL/PSA system for example17 con­ 1__ » ACCOUNT + DIV + DEPT + SECT + SECT-EXPENDITURE TO centrates mainly on user requirements; its goal is to validate ACCOUNT + SECT-IN-DIV + SECT-EXPENDITURE requirements specifications. There are also aids that con­ centrate on levels other than the requirements level. Many Figure 11. Reduction Process

20 The Australian Computer journal, Vol. 13, No. 1, February 1981 Design Methodologies

Enterprise Level Command Standard Code USER LEVEL - pseudo code ENTITY (el) in SI Rl-PRESENT = FALSE. FOUND = TRUE. INITIALIZE IDENTIFIED BY KEY-ITEM-1 = 'x' MOVE 'x' TO KEY-ITEM-1. Get Account [WITH PROPERTIES DATA-ITEM-1 = 'w' WHILE FOUND = TRUE AND Rl-PRESENT DATA-ITEM-2 = ' z'] = FALSE PERFORM FRI. DO-WHILE there-are-more-transactions FRI. FIND NEXT DUPLICATE WITHIN Rl RECORD C Get-next-transaction Rl | record type J IF FOUND THEN Rl-PRESENT = TRUE - 4 Find unit [IF FOUND AND IF NOT DATA-ITEM-1 l DO FIND-RESPONSIBLE-DIVISION Standard Representation of Entity = 'w' END OR NOT DATA-ITEM = ' z' THEN Rl-PRESENT = FALSE]. FIND-RESPONSIBLE-DIVISION REPEAT-UNTIL unit='division' ENTITY (e2) IN R2 the currency of Rl has been estab- — £ find supervising-unit [WITH PROPERTIES DATA-ITEM-1 ='w'} RELATED TO ENTITY (el) BY command RELATIONSHIP R1-R2. FOUND-1 = FOUND-2 = TRUE. ENTERPRISE LEVEL [WITH PROPERTIES DATA-ITEM-2 ='z'J R2-PRESENT = FALSE. WHILE FOUND-1 AND FOUNI>2 AND NOT INITIALIZE R2-PRESENT PERFORM FIND-R2. FIND ENTITY ACCOUNT WHERE ACCOUNT-ID = '777' 1 Rl 1 | R2 j FIND-R2. WHILE FOUND-1 AND NOT R2-PRESENT DO-WHILE-there-are-more-transactions PERFORM FIND-Rl-R2. r FIND NEXT TRANSACTION RELATIONSHIP Sl\ /S2 IF FOUND-1 THEN -- ► J FIND ENTITY UNIT RELATED TO TRANSACTION FIND OWNER RECORD OF S2 SET. DO FIND-RESPONSIBLE-DIVISION IF FOUND-1 AND FOUND-2 THEN L R2-PRESENT = TRUE. END { LINK j fIF FOUND-1 and FOUND-2 THEN IF NOT DATA-ITEM-1 = 'w' FIND-RESPONSIBLE-DIVISION THEN R2-PRESENT = FALSE.] REPEAT-UNTIL unit = 'division' Standard Representation FIND-R1-R2 FIND ROLE MANAGED OF ENTITY UNIT of Relationship FIND NEXT RECORD WITHIN SI. IF FOUND-1 THEN R2-PRESENT = TRUE. FIND ENTITY MANAGER RELATED TO MANAGED. [IF FOUND-1 FIND UNIT TAKING ROLE MANAGER. THEM IF NOT DATA-ITEM-2 = 'z' END THEN R2-PRESENT = FALSE]

Note: FOUND-1 and FOUND-2 are flags SHORT PROGRAM set by data base system. FOUND-1 is true if a member INITIALIZE record is found in SI and MOVE '777' TO ACCOUNT-ID. false otherwise; FOUND-2 is FIND ACCOUNT RECORD true if an owner of S2 is found and is false otherwise. DO-WHILE there-are-more-set-members C FIND NEXT LINK RECORD WITHIN SETA SET. * i —► -j FIND OWNER RECORD OF SETB SET. L DO FIND-RESPONSIBLE-DIVISION Figure 13. Sample translation from Enterprise Level Commands to END Standard Code FIND-RESPONSIBLE-DIVISION REPEAT UNTIL unit = 'division' BEGIN sion to code. One such possibility is illustrated in Figure f FIND NEXT MANAGED RECORD WITHIN M-D SET. 13. Here a sample set of enterprise level commands is ---- » ) FIND OWNER RECORD OF HIER SET. i FIND OWNER RECORD OF M-R SET. given together with standard code, using the CODASYL END network proposal, to which each such command is trans­ lated. The first command is to find an entity with a given Figure 12. Event Specification Reduction value of KEY-ITEM-1. Here standard representations are used for each enterprise concept; the standard represen­ tation for an entity kind is the record type. The FIND Horvath9. These methods are not general but apply to par­ ENTITY command is converted to a FIND RECORD ticular computer systems; hence information analysis must command addressed to the record type together with tests use terms applicable to particular computer systems. for the existence of such records. An option is allowed to find those entities that have a given value for the properties; IMPROVING EXISTING METHODOLOGIES these also convert to standard code. The second command Let us then look at the possibility of improving is to FIND ENTITY of one kind, which is RELATED to an existing processes. At first one may conjecture a more entity of another kind through a relationship. The two ordered approach for logical design conversion with more entity kinds are represented by record type R1 and R2; the emphasis on the linkage between program and database standard representation for the relationship is two sets, S1 design. To do this it is necessary to develop techniques to and S2, and the association record, LINK. The correspon­ reduce events in parallel with data reductions. One such ding standard code traverses from record R1 (which must possibility is illustrated in Figure 12. Here user require­ have earlier been found through the code that corresponds ments are initially expressed in pseudocode, which is to the FIND ENTITY command) through sets S1 and S2 to converted to an enterprise level semantics and subsequent­ R2 records. Further details of such commands are given in ly to “short programs”. This corresponds to data conver­ Reference 6. Mackenzie12 also describes a similar system, sion from E-R diagrams to their logical structure. A set of which translates programs using a relational model to pro­ commands is used to describe events in terms of entities grams using a CODASYL-like network model. and relationships to match the E-R data reduction. These developments however do not solve the prob­ The conversions illustrated in Figure 12 are attrac­ lems caused by the semantic mismatches, which were tive as they lead to the possibility of automatic conver- described earlier. To overcome such problems requires a

The Australian Computer Journal, Vol. 13, No. 1, February 1981 21 Design Methodologies

DATA BASE requests DIVISION, PROPERTIES; (DIV-NO, BUDGET)); ADD ADMINISTRATOR USER INTERFACE semantic INTERFACE ENTITY (ENTITY-NAME: DIVISION, PROPERTIES: (DIV-NO: DIV1, BUDGET: 150); The feasibility of the adaptable system rests on the availability of a set of con­ defines cepts that can be used to describe arbitrary semantic or model model data models. The internal level in Figure 14 is a represen­ Semantic level tation of these concepts; the data model definition and DEFINITION to define a qperation definition are then defined as transformations in OF SEMANTIC OF OPERATIONS data or STRUCTURE semantic model terms of these concepts. This feasibility has been proven7 by a system that uses two concepts. A structure abstrac­ tions is used to represent data model structures such as INTERPRETER RELATION, RECORD and so on and occurrence abstrac­ of user operations into internal tions to represent occurrences of these structures. The structure operation internal level implements these abstractions by trees with specialised nodes. Data model operations are then defined in terms of tree transformations using a special language; INTERNAL STRUCTURE details are given in Reference 7.

CONCLUSION Figure 14. An Adaptable System A set of criteria for evaluating design methodologies was defined and some shortfalls of existing methodologies when compared to these criteria were identified. In par­ general semantic method with applicability to a wide set ticular the interaction of user methodologies and data of user problems. Data models are not usually sufficient for this purpose; most debates16 on data models conclude model semantics can sometimes have undesirable effects on the final designs especially program structures. It was that any single data model lacks the richness to adequately proposed that an adaptable system, which provides an model large classes of user requirements. Perhaps rather interface in terms of user semantics, be developed to than relying on one data model the goal should be to minimise such effects. develop design methodologies that permit semantic flexi­ bility at the user level. These methodologies in turn require REFERENCES computing systems that can be adapted to'the semantics 1. ALTER, S. “A Model for Automatic File and Program De­ chosen by the user. sign in Business Application Systems”, Communication of the ACM, Vol. 20, No. 6, June 1979, pp. 345-353. ADAPTABLE SYSTEMS 2. ATKINSON, R.W. “Structured Approach aids Implemen­ The general structure of an adaptable system is shown tation of Business Systems", Canadian Datasystems, May 1979, pp. 44-53. in Figure 14. Here the user chooses the semantic terms to 3. DAVENPORT, R.A. “Logical Database Design — From En­ express the user requirements. The database adminis­ tity Model to DBMS Structure”, Australian Computer trator then defines an interface that allows user require­ Journal, Vol. 12, No. 1, February 1980, pp. 2-14. ments to be expressed in the same semantic terms. Thus, 4. DeMARCO, T. “Structure Analysis and System Specifica­ tion”, Yourdon Press, 1978. suppose E-R concepts are chosen to express the problem; 5. GANE, C., SARSON, T. “Structured Systems Analysis, the dba then develops an interface that includes commands Prentice-Hall, 1979. that allow entities and relationships to be defined as the 6. HAWRYSZKIEWYCZ, I.T. “Generating Programs for Ab­ user data model and operations expressed using syntax stract Specifications”, School of Information Sciences, Canberra College of Advanced Education, Technical Report consistent with E-R concepts. No. 7, 1980. To do this a high level definition facilities must be 7. HAWRYSZKIEWYCZ, I.T. “Alternate Implementations of provided to the dba; these include a the Conceptual Schema”, Information Systems, Vol. 5, No. 3 • a language to define a semantic or data model, and 1980, pp. 203-217. 8. HAWRYSZKIEWYCZ, I.T. “Data Analysis - What are the • a language to define basic operations on this model. Necessary Concepts”, Australian Computer Journal, Vol. 12, An interpreter uses these definitions to transform an No. 2, February 1980, pp. 2-14. internal structure consistently with the basic operations 9. HORVATH, P.J. “DESP: A COBOL Program Generator for initiated by users. Readers may note the resemblance IDMS Data Bases and Serial Files”, Database, Winter-Spring, 1980, pp. 46-55. between this approach and the ANSI/SPARC architecture; 10. JACKSON, M.A. “Principles of Program Design”, Academic the semantic level in Figure 14 being similar to the concep­ Press, New York, 1975. tual level in this architecture. 11. LEAVENWORTH, B.M., SAMMET, J.E. “Overview of Non­ Given these definition facilities the dba would define procedural Languages”, Proceedings ACM SIGPLAN Sympos­ ium on Very High Level Languages, Santa Monica, California, operations such as March 28-29, 1979. OPERATION: DEFINE ENTITY KIND (definition in 12. MacKENZIE, H.G., “An Approach to Implementing a Rela­ terms of definition language specifying that the user tional Database System”, M.Sc. Thesis, ANU Department of nominate a parameter ENTITY NAME and any Computer Science, September 1979. ' 13. MITOMA, M.F. and IRANI, K.B. “Automatic Data Base PROPERTIES names). Schema Design and Optimisation”, Proceedings Conference OPERATION: ADD ENTITY (definition in terms of the on Very Large Data Bases, Boston, 1975, pp. 286-320. definition language specifying that user nominate 14. RAVER, N., HUBBARD, G.U. “Automated Logical Data actual property value). Base Design: Concepts and Applications”, IBM Systems Journal, No. 3, 1977, pp. 287-312. The user would then execute these operations as 15. ROSS, D.T., SCHOMAN, K.E. Jnr., “Structural Analysis for follows: DEFINE ENTITY KIND (ENTITY-NAME: Requirements Definition”, in Software Design Techniques

22 The Australian Computer Journal, Vol. 13, No. 1, February 1981 Design Methodologies

eds. Freeman, P. and Wasserman, A.E. IEEE Computer BIOGRAPHICAL NOTE Society, California, 1977. pp. 86-95. 16. RUSTIN, R. (Editor) “Data Models: Data-Structure-Set Igor Hawryszkiewycz received his B.E. and M.E. versus Relational”, Proceedings of the ACM SIGMOD degrees in Electrical Engineering from the University of Workshop on Data Description, Access and Control, Ann Adelaide and his Ph.D. in Computer Science from the Arbor, Michigan, May 1-3, 1974. Massachusetts institute of Technology in 1973. He first 17. TEICHROW, D„ HERSHEY, E.A. "PSL/PSA: A Computer- Aided Technique for Structured Documentation and Analysis worked at the Research Laboratories, and subsequently the of Information Processing Systems”, IEEE Transaction on Data Processing Branch, of the PMG Department (now Software Engineering, January 1977. Telecom), researching computer networks and database 18. TEOREY, T.J., FRY, J.P. "The Logical Record Access systems. Currently he is Principal Lecturer for Information Approach to Database Design”, ACM Computing Surveys, Vol. 12, No. 2, June 1980, pp. 179-211. Systems at the Canberra College of Advanced Education. His interests are in systems design, database systems and distributed systems.

The Australian Computer Journal, Vol. 13, No. 1, February 1981 23 Short Communications NEBALL and FiNGRP: NEW PROGRAMS FOR We have adopted the formulation MULTIPLE NEAREST-NEIGHBOUR ANALYSES

D.J. Abel and W.T. Williams* *D.J. Abe/ is with CSIRO Division of Computing Research, Townsville, Qld 4814. W.T. Williams is with the Australian It should be noted that this ej- is not precisely equal institute of Marine Science, Townsville, Qld 4810. Manu­ to the average of_or for repeated sampling. (Clearly our script received 8 August 1980 and revised 23 September method is comparing two lists which have differing first 1980. elements.) We have, however, determined experimentally that the difference in these two quantities is of the order NEBALL assumes the prior existence, for ji elements (r^=S! 10~4 for even small n (ji = 25). 150), of an inter-element distance matrix of order n. Each element is then associated with the complete ordered set of its (n — 1) This expression has a singularity at r = n/2. It will be nearest neighbours. For every pair of elements the number of obvious that computation of expression (1) involves, inter matches in successive subsets of r_(.r_= 1 to ri) is compared with alia, the computation of 1n(n—2r+s). However, if r > n/2 random expectation, and a measure of discrepancy, ZV>is written then, for small values of s, the expression inside the brack­ away. It is shown that A,- is constrained within the limits ± 0.25. Program FINGRP accepts the matrix of Ay values and submits it to ets may be negative; this is simply because, once the half­ a non-hierarchical classificatory procedure at externally-specified way mark is reached, there must be some matches. The levels of similarity. The result is a non-exclusive set of homogen­ appropriate procedure is to use the quantity (n—2r+s) as a eous groups of elements. flag; if for any combination of r and s it becomes negative, Keywords: Multiple nearest neighbours, minimum spanning tree, classification, ordination. no contribution is made to the summation. CR categories: 3.12, 5.19, 5.32, 5.4. There will be available then a vector of er values dependant only on £ and, for each pair of elements, a INTRODUCTION vector of or values. We denote the quantity (or — er) by It is well known that classical ordination techniques, Ar. For comparison we have elected to use the Kolmorogov- such as principal component analysis and its analogues, Smirnov procedure (see, e.g., Siegel 1956, p. 128), and to encounter difficulties if the data-set under study contains write away the signed value of Ar for which |Ar| is maxi­ maked discontinuities or non-linear relations. As a result mum. We shall now show that the value of Ar is con­ biologists have increasingly turned to graph-theoretic strained between the limits ±14. approaches for the study of variation in complex systems. For two identical elements, we have empirically The earliest of these was the minimum spanning tree (Prim determined that the maximum value of Ar will occur at 1957). This uses only the first nearest neighbour of each r = n/2 (if n is odd it will occur at (n—1 )/2, but this need element; if the system contains an intricate network of not convern us). The numerator of expression (1) at this cross-relations it is apt to produce a richly-branched tree, value then becomes with many peripheral elements, that is difficult to interpret. Improved results have recently been obtained by computing a network based on the first two nearest neighbours (Williams, Burt, Pengelly and Robinson 1980; Williams 1980), but this too may prove unsatisfactory for very com­ but this plex systems. In one application of this technique (Clifford and Williams 1980) a puzzling situation was resolved by appeal to a list of the first seven neighbours. This suggests an obvious extension: to devise algorithms which use, for each of n. elements, the complete ordered set of (n — 1) neighbours; each element would then be examined in the which, by appeal to the coefficient of xr in the expression context of every other element in the system. In this of (1 + x)r (1 + x)r_1, communication we first describe a new program NEBALL which implements such a procedure.

THEORETICAL The ordered list of neighbours for each element will be a permutation of the integers 1 ton (each element being regarded as its own first nearest neighbour). Consider now (We are indebted to Mr Russel John, of the CSIRO Division two such lists and let us for the moment denote them by of Mathematics and Statistics, for this proof.) If we now (a;, . .., an) and (b;, ..., bn). For each integer rjn 1

30 The Australian Computer journal, Vol. 13, No. 1, February 1981 Letters to the Editor

then it is easy to see how you can have made the rather part to the high reliability and modifiability that this biased and uninformed attack in your response to my methodology offers. Individual client’s requirements differ original letter. to varying degrees but maintenance to standard programs is I believe that the series of events, which started with a safe and easy task. the type of articles being printed in the Journal and cul­ However, the concept of state-control is more impor­ minating in the editor’s emotional proposed reply to my tant as a design tool than as a maintenance aid. ‘Michael- letter, indicates the extent of the gulf that extends between Jackson-type’ structured techniques are of little use in the editor’s viewpoint of acceptable articles and the view­ multi-path interactive programming environments. point of the majority of the Australian computing industry. John Southgate, All that is being attacked is the balance of published Industrial Year Student, NSWIT articles between worthwhile research papers for all data Wacher Partners Pty Ltd processing personnel and esoteric papers similar to those which have been published. Your present response does It is indeed encouraging to find concepts previously very little to assist this debate. I look forward to hearing considered the proper domain of computer science being from you in due course. applied to the world of commercial data processing as G.E. McGuiness described in a recent ACJ article by Juliff. There is .no doubt that this technique when properly used enhances EDITOR’S FOLLOW-UP COMMENTS both program development, reliability and maintainability. I am very pleased to see that Mr McGuiness actually It is, however somewhat disconcerting to note the complete shares my view that publication of research papers of “all lack of references in that article. I have been teaching the kinds’’ is a task worthy of the Society's resources. Unfor­ file updating technique described in Example 3 since 1976 tunately, he says at the same time that he is against to classes at the University of New South Wales and within “esoteric’’ research papers. One cannot have it both ways. the ACS Professional Development Seminars in Basic Since i have already discussed Mr McGuiness’ main Systems Analysis. A preliminary version of the algorithm point, the balance of material in the Journal, in my Novem­ was published in February 1978 (Grouse 1978) and subse­ ber Editorial, there is no need to repeat the remarks here, i quently included in “Basic Information Systems Analysis” will just make the point that the Journal’s contents reflect by Brookes, Grouse, Lawrence and Jeffery, published by the supply of material, if practitioners would not write the University of New South Wales. The latter text has been papers and submit them to the Journal for consideration, used as notes for the above-mentioned Professional then there would not be any published. Readers might like Development Seminars for two years. to refer to some earlier discussions on the matter, on p. 113 of the August 1979 issue. References GROUSE, P.J. (1978), “Flowbacks — A Technique for Structured TWO COMMENTS ON PAPER BY J ULIFF Programming”, SIGPLAN Notices of the ACM, 13, 2, (Feb 1978). pp 46-56. I refer to the essay “Program Control by State Trans­ JULIFF, P. (1980), “Program Control by State Transition Tables”, ition Tables” by Juliff, in which the author states that he Austral. Comput. J., 12, 146-152. could find no piece of work which covers this topic. P.J. Grouse The book “Compiler Design Theory” written by P.M. Associate Professor, Lewis et al, and published in 1976 by Addison-Wesley is one Department of Information Systems, such work which gives an excellent presentation of this and The University of New South Wales similar program construction models such as finite state recognisers and pushdown stack machines. AUTHOR’S REPLY Juliff, in his examples, defines program state explicit­ I am indebted to both Prof. Grouse and Mr Southgate ly with an identifier CURRENT-STATE which is used to for providing readers with references to additional material index a 2-dimensional transition table. An equally valid on this topic. I have been aware of Prof. Grouse’s treatise model is to define the states implicitly i.e. for different lo­ on Flowbacks for some time and should have acknowledged cations in the program, test the input symbol and go direct­ it in my paper. Like Mr Southgate, i have also used this ly to a program location (without accessing any table) that technique extensively in on-line interactive software and will process the symbol. agree with his observations as to its value as a design too! This of course negates the ease of maintenance that a and aid to maintenance. transition table offers, but offers advantages in speed of I also acknowledge private correspondence from Mr execution and memory economy, where a high number of A. Henzell which provided the further references: states exist and a high percentage of state symbol combina­ DAY, A.C., The Use of Symbol-State Tables tions result in a transition to the same error state. Austral. Comput. j., 13, 1970; A Programmer’s View of Automata, At Wachers, we have been using state transition tables BARNES, B.H., Computing Surveys, Vol. 4,1972. to control commercial on-line interactive screen programs. P. Juliff The success of software implemented in this way is due in Prah ran CAE

The Australian Computer Journal, Vol. 13, No. 7, February 1981 CALL FOR PAPERS SPECIAL ISSUE ON DATABASE MANAGEMENT The Australian Computer Journal will publish a special issue on "Database Management” in November 1981. The issue will be under the Guest Editorship of Professor A.Y. Montgomery, Department of Computer Science, Monash University. Research papers, tutorial articles and industry case studies on all aspects of computer databases and information systems are welcome, either as full papers or as short communications. Prospective authors should write to: Professor A. Y. Montgomery, Guest Editor ACj Special issue on Database Management Dept, of Computer Science, Monash University, Clayton, Victoria 3168 Australia to notify him of their intention to submit material for the issue and provide a brief summary of their intended contribution. To allow adequate time for processing, full papers must be received by 1 August 1981, and short communications by 15 August 1981. Material received after the deadlines may be considered for publication in later issues of the Journal. Papers should be prepared in accordance with the guidelines published in the May 1980 issue of the Journal.

CALL FOR PAPERS

SPECIAL ISSUE ON SOFTWARE ENGINEERING

The Australian Computer Journal will publish a special issue on "Software Engineering” in early 1982. Research papers, tutorial articles and industry case studies on all aspects of the subject will be welcome, and both full papers and short communications will be considered. Prospective authors should write to: Professor P. C. Poole, Guest Editor, A CJ Special Issue on Software Engineering, Department of Computer Science, University of Melbourne, Parkville, Victoria, 3052, Australia. to notify him of their intention to submit material for the issue and provide a brief summary of their intended contribution.

32 The Australian Computer Journal, Vol. 13, No. 1, February 1981 rnui

foam

W $0

u r * nag sirra

This is the 200m2, Snaplock Partitioning was also as anti-static carpet. access floor installed in installed at the same time as Cemac also has a patented, June 1980 at the Perth offices the floor. adjustable pedestal base which of Woodside Petroleum Limited. Cemac Tate’s all-metal ensures a level floor on a It supports computers used access floor system offers four sloping slab. So, for access to process data for the panel sizes, with or without flooring, or complete office North-West Shelf gas field stringers and the widest range interior layout (floors, development. of finishes-high pressure systems furniture, ceilings, The floor uses Series 800 laminate, wood parquet, cork partitions and task or ambient metal panels on 300mm and vinyl asbestos - as well lighting), call Cemac Interiors. pedestals and has a Heritage II, monolithic, anti-static carpet Brochures and details from Cemac Interiors: finish. Licensees' Sydney 2903788 Adelaide_____ 45 3656 Melbourne 4198233 Hobart______295444 Brisbane_____ 2215099 Perth______444 7888

CEIN0022 The Australian Computer Journal is an official publication of the CONTRIBUTIONS: All material for publication should be sent to Australian Computer Society Incorporated. the Editor for processing. Prospective authors may wish to consult manuscript preparation guidelines published in the May 1980 issue. OFFICE BEARERS: President: G.E. Wastie; Vice-Presidents: The paragraphs below briefly summarise the essential details. R.C.A. Paterson, M.G. Jackson; immediate past president: A.R. Benson; National treasurer: R.A. Hamilton; Chief executive officer: R.W. Rutledge, PO Box N26, Grosvenor Street, Sydney, 2000, telephone (02) 267-5725. Types of Material: Four regular categories of material are published: Papers, Short Communications, Letters to the Editor and Book EDITORIAL COMMITTEE: Editor: C.K. Yuen, CSIRO Division of Reviews. Generally speaking, a paper will discuss significant new Computing Research, PO Box 1800, Canberra, ACT, 2601. results of computing research and development, or provide a com­ Associate Editors: J.M. Bennett, T. Pearcey, P.C. Poole, A.Y. prehensive summary of existing computing knowledge with the aim Montgomery, J. Lions. of broadening the outlook of Journal readers, or describe impor­ tant computing experience or insight. Short Communications are SUBSCRIPTIONS: The annual subscription is $18.00. All subscrip­ concise discussions of computing research or application. A letter to tions to the Journal are payable in advance and should be sent the Editor will briefly comment on material previously appearing in (in Australian currency) to the Australian Computer Society Inc., the Journal or discuss a computing topic of current interest. PO Box N26, Grosvenor Street, Sydney, 2000. A subscription form may be found below.

PRICE TO NON-MEMBERS: There are now four issues per annum. Refereeing: Papers and Short Communications are accepted if The price of individual copies of back issues still available is $2.00. recommended by anonymous referees, Letters are published at the Some already out of print. Issues for the current year are available discretion -of the Editor, and Book Reviews are written at the at $5.00 per copy. All of these may be obtained from the National Editor’s invitation upon receipt of review copies of published books. Secretariat, PO Box N26, Grosvenor Street, Sydney, 2000. No trade All accepted contributions may be subject to minor modifications discounts are given, and agents should recover their own handling to ensure uniformity of style. charges. Special rates apply to members of other Computer Societies and applications should be made to the Society concerned.

MEMBERS: The current issue of the Journal is supplied to personal Proofs and Reprints: Page proofs of Papers and Short Communi­ members and to Corresponding Institutions. A member joining part­ cations are sent to the authors for correction prior to publication. way through a calendar year is entitled to receive one copy of each Fifty copies of reprints will be supplied to authors without charge. issue of the Journal published earlier in that calendar year. Back Reprints of individual papers may be purchased from the Printer numbers are supplied to members while supplies last, for a charge of (Publicity Press). $2.00 per copy. To ensure receipt of all issues, members should advise the Branch Honorary Secretary concerned, or the National Secretariat, promptly of any change of address. Format: Papers, Short Communications and Book Reviews should MEMBERSHIP: Membership of the Society is via a Branch. be typed in double spacing on A4 size paper, with 2.5cm margins Branches are autonomous in local matters, and may charge dif­ on all four sides. The original, plus one copy (preferably two for ferent membership subscriptions. Information may be obtained Papers) should be submitted. References should be cited in standard from the following Branch Honorary Secretaries. Canberra: PO Box Journal form, and generally diagrams should be ink-drawn on 446, Canberra City, ACT, 2601. NSW: Science House, 35-43 tracing paper or board with stencil or Letraset lettering. Papers and Clarence St, Sydney, NSW, 2000. Qld: Box 1484, GPO, Brisbane, Short Communications should have brief Abstracts, Keyword lists Qld, 4001. SA: Box 2423, GPO, Adelaide, SA, 5001. WA: Box and CR categories on the leading page, with authors’ affiliations as F320, GPO, Perth, WA, 6001. Vic: PO Box 98, East Melbourne, a footnote. The authors of an accepted Paper will be asked to Vic, 3002. Tas: PO Box 216, Sandy Bay, Tas, 7005. supply a brief biographical note for publication at the end.

This Journal is Abstracted or Reviewed by the following AUSTRALIAN COMPUTER services:

JOURNAL Publisher Service Subscription/Change of Address Form ACM Bibliography and Subject Index of Current Computing Literature. Name...... ACM Computing Reviews. Current Address...... AMS Mathematical Reviews. CSA Computer and Information Systems Abstracts. Data Processing Digest. □ Please enrol me as subscriber for 1981, I enclose $1 8.00 ENGINEERING INDEX INC. Engineering Index. □ Please record my new address as shown above. 1 attach INSPEC Computer and Control Abstracts. below the mailing label for the last received issue. INSPEC Electrical and Electronic Abstracts. SPRINGER- Zentralblatt fur Mathematick und ihre Grenz- VERLAG gebiete. ATTACH LABEL HERE Copyright © 1981. Australian Computer Society Inc.

Production Management: Associated Business Publications, 28 Chippen Street, Chippendale, NSW, 2008. Tel: 699-5601,699-1154. Send all correspondence regarding subscriptions to PO Box AH advertising enquiries should be referred to the above address. N26, Grosvenor Street, Sydney, 2000, Australia. Photocopies of this form acceptable. Printed by: Publicity Press Ltd, 29-31 Meagher Street, Chippendale, NSW, 2008.