I I 78-5896 MOONEY, James Donald, 1946- DESIGN OF A VARIABLE HIGH LEVEL LANGUAGE COMPUTER USING PARALLEL PROCESSING. The Ohio State University, Ph.D., 1977 Engineering, electronics and electrical

University Microfilms International,Ann Arbor, Michigan 48106 DESIGN OF A

VARIABLE HIGH LEVEL LANGUAGE COMPUTER

USING PARALLEL PROCESSING

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of the Ohio State University

By James Donald Mooney, B.S.E.E., M.Sc,

The Ohio State University 1977

Reading Committee: Appr

Prof. K.J. Breeding, Chairman Prof, M.T. Liu Adva.se r Prof0 R.B. McGhee Department of Electrical Engineering ACKNOWLEDGMENTS

I wish to thank my adviser, Dr, Kenneth Breeding, for his help and encouragement over the extended of preparing for this dissertation. Thanks to Dr. Breeding and Dr, M.T. Liu for some fruitful discussions and good advice, and also to Dr. R.B. McGhee for his service on the reading committee.

I am grateful to The Ohio State University for supporting me in most of my graduate studies with a

University Fellowship and a Graduate Research Associateship.

This work was also supported in part by the U.S. Air

Force Office of Scientific Research, under Grant No.

AFOSR-77-3400.

Thanks are due to my employer, Dymo Graphic Systems, for providing time to complete my studies and the use of equipment and facilities.

My last and deepest thanks are to my wife Joani, for years of patience and encouragement, for extensive help in preparing the drawings, and for being there to it all worthwhile. VITA

November 29, 1946 . . . . Born - Jersey City, New Jersey

1968...... oo.oo B.S.E.E., University of Notre Dame, Notre Dame, Indiana

1969...... M.SCo, The Ohio State University, Columbus, Ohio

1969-1971 ...... Graduate Research. Associate, Computing Center, The Ohio State University, Columbus, Ohio

1971-1977 ...... Systems Programmer, Programming Manager, Dymo Graphic Systems, Wilmington, Massachusetts

FIELDS OF STUDY

Major Field: Electrical Engineering

Studies in Digital Systems. Professor Kenneth J. Breeding

Studies in Computer Science. Professor Ming T. Liu

Studies in Control Theory. Professor John Bacon

Studies in Mathematics. Professor Henry D. Colson TABLE OF CONTENTS

ACKNOWLEDGMENTS ...... ii

VITA ...... iii

LIST OF T A B L E S ...... vii

LIST OF F I G U R E S ...... viii

Chapter Page

I. INTRODUCTION...... 1

1.1 Background and Motivation ...... 1

1.2 Organization of the Dissertation .... 4

II. SURVEY OF PREVIOUS W O R K ...... 7

2.1 High-Level Language Machines ...... 7

2a2 Related Research ...... 26

2.2.1 Languages and Theory ...... 26

2.2.2 Computer Architecture ...... 29

III. O V E R V I E W ...... 33

301 Design Considerations ...... 33

3.2 The Proposed S y s t e m ...... 39

IV. THEORETICAL MODEL ...... 44

4.1 The Language C l a s s ...... 44

4.2 The Directed Graphs of a Grammar .... 47

4.3 Further Definitions ...... 54

4.4 Syntactic Analysis ...... 62 iv TABLE OF CONTENTS (continued)

Chapter Page

4.5 The Execution T r e e ...... 69

4.6 Semantic Processing ...... 73

4.7 Execution ...... 80

4.8 Extensions to the Model ...... 92

4.9 Summary ...... 97

V. PROCESSOR FOR A SMALL LANGUAGE ...... 99

5.1 Language Definition and Representation . 99

5.2 Analysis Section ...... 117

5.2.1 General Features...... 118

5.2.2 The Analysis Control Unit .... 120

5.2.3 The Token Processors...... 126

5o3 Execution Section ...... 139

5.3.1 General Features...... 139

5.3.2 The Operand Evaluators...... 141

5.3.3 The Execution Processors...... 149

5.4 Example of Program Flow ...... 151

5.5 Simulation and Performance ...... 159

5.6 Summary ...... 165

VI. PROCESSORS FOR REAL PROGRAMMING LANGUAGES . 166

6.1 FORTRAN I V ...... 166

6.1.1 General ...... 167

6.1.2 Token Classes ...... 169

6.1.3 Syntax and Analysis ...... 170

v TABLE OF CONTENTS (continued)

Chapter Page

6.1.4 Execution ...... 178

6.1.5 Summary ...... 179

6.2 ALGOL 6 0 ...... 179

6.2.1 G e n e r a l ...... 179

6.2.2 Token Classes ...... 181

6.2.3 Syntax and Analysis ...... 182

6.2.4 Execution ...... 187

6.2.5 Summary ...... 189

6.3 Other Languages ...... 190

VII. CONCLUSIONS AND FUTURE DIRECTIONS ...... 193

7.1 Summary and Evaluation ...... 193

7.2 Suggestions for Future Work ...... 198

APPENDIX A: THE UPPER BOUND PROBLEM ...... 200

APPENDIX B: MEMORY SYSTEM STRUCTURE ...... 214

APPENDIX C: LANGUAGE DEFINITION TABLES ...... 222

APPENDIX D: ANALYSIS MODULE PROGRAM LISTING.... 237

APPENDIX E: EXECUTION MODULE PROGRAM LISTING . . . 259

APPENDIX F: FORTRAN IV SYNTAX GRAPHS ...... 284

APPENDIX G: ALGOL 60 SYNTAX GRAPHS ...... 291

REFERENCES ...... 296 LIST OF TABLES

Table Page

2.1 Chronology of HLL Machines . 25

5.1 Execution Processor Functions for SLANG . . 112

5.2 ACU Signal Descriptions ...... 124

5.3 ACU Register Equations ...... 125

5.4 TP Register Equations ...... 129

6.1 Extended Primitive Functions for FORTRAN • 180

6.2 Extended Primitive Functions for ALGOL . . 190

A. 1 Count for FORTRAN Graphs ...... 212

vii LIST OF FIGURES

Figure Page

2.1 Anderson ALGOL 60 System ...... 8

2.2 Bashkow et al FORTRAN machine ...... 12

2.3 Tree-Structured Store of Iliffe ...... 15

2.4 Direct PL/I Processor of Sugimoto .... 16

2.5 Cellular APL Computer ...... 18

2.6 SYMBOL Computer System ...... 20

2.7 Fournier's GPM System ...... 24

3.1 Proposed System Block Diagram...... 40

4.1 Grammar Graph Format 49

4.2 Example Graph System ...... 51

4.3 Equivalent Grammar Graphs ...... 55

4.4 Further Example of Equivalence ..... 56

405 Token Processor Syntactic Recognition . . 66

4.6 An Execution Tree ...... 71

4.7 Primitive Steps in Tree Building .... 75

4.8 TP Flow with Semantic Processing .... 79

4.9 Operand Evaluator Basic Flowchart .... 83

4.10 Function Decomposition ...... 89

4.11 Operand Evaluator Complete Flowchart . . 90

5.1 Graph Syntax for SLANG ...••.••• 102

5.2 Tree Cell Prototypes for SLANG ..... 107

viii LIST OF FIGURES (continued)

Figure Page

5.3 Factorial Program in SLANG ...... 113

5.4 Data Structure for Factorial Program . . • 114

5»5 New and Modified Graphs for SLANG ..... 116

5.6 Analysis Section Block Diagram ...... 119

5.7 Analysis Control Unit Flowchart ...... 122

5.8 ACU Signal Diagram ...... 123

5.9 Token Processor Block Diagram ...... 128

5.10 Token Recognizer Flowchart ...... 133

5.11 Semantic Processor Flowchart ...... 134

5.12 Task Generator Flowchart ...... 135

5.13 Execution Section Block Diagram ...... 14.0

5.14 Operand Evaluator Overall Flowchart .... 142

5.15 OE Normal Process Flowchart ...... 143

5.16 OE Conditional Process Flowchart ...... 144

5.17 Operand Evaluator Block Diagram ...... 145

5.18 Execution Processor General Flowchart . . . 152

5.19 Snapshot of Analysis ...... 157

5.20 Simulator Block Diagram ...... 160

6.1 FORTRAN Preprocessing Example ...... 168

6.2 Execution Tree for a DO-loop ...... 173

6.3 Subprogram Definition and Call ...... 176

6.4 Johnston Contour Model ...... 183

6.5 Execution Tree for Johnston Algorithm . . . 184

ix LIST OF FIGURES (continued)

Figure Page

A.1 Splitting of Partial P a t h s ...... 203

A„2 Non-closure of Graphs ...... 204

A.3 Modification to IGLIST ...... 209

A.4 Modified Expression Graphs ...... 211

B.1 Memory System Block Diagram ...... 215

B.2 Data Entities ...... 218

B.3 Structure Cells ...... 220

x CHAPTER I

INTRODUCTION

1.1 Background and Motivation

Shortly after the earliest stored-program computers were developed, it became clear that their potential as problem solvers depended on effective communication between man and machine. The earliest programs had to be expressed manually in binary machine code. Before long, assemblers appeared, allowing programs to be written in a reason­ able notation. But the usefulness of the computers took a giant step forward with the development of the first high-level language interpreters, beginning with FORTRAN.

From then on, the goal was to problems in a notation convenient for the subject, and have the computer itself do the translation.

As computing evolved, a vast array of high-level languages was conceived and implemented on various com­ puters. Until recently, however, computing hardware was scarce and extremely expensive. It was inconceivable to redesign the hardware to support new languages when soft­ ware compilers could produce the same results.

Economics also made it impractical to build new computers which required very complex logic in hardware, or differed radically from earlier designs. Almost all machines built conformed to the principles of a "von

Neumann" design. They processed a linear sequence of

instructions through a single central processor, allowing

no parallel activity.

Any possibility of designing hardware to interpret

the new languages was also blocked by a lack of theory

to guide the design. Early FORTRAN compilers were amor­

phous monsters, with all statement types treated as un­

related special problems. Formal language theory was

still beyond the horizon.

However, the lack of theoretical foundations made

the software compilers likewise unmanageable and trouble-

prone, ond because of the need, a formal theory of

language was soon born. Progress was made in formalizing the syntax (structure) of programming languages, and com­ pilers became more organized. Much more recently,

techniques for modeling the semantics (meaning) of

languages have also begun to appear.

Most importantly, declines have occurred gradually and then dramatically in the costs of computer hardware.

LSI technology is now making possible ever increasing logic complexity in smaller and cheaper chips. By com­ parison, the cost of writing software is now much more significant.

This decline in hardware costs has led to the building of more elaborate computer systems and a freer exploration of novel architectures. Much work is now being done in systerns which achieve high processing speed by parallel activity in many independent processors.

There is also renewed interest in hardware designs much better suited to high-level languages, to simplify or eliminate compiling. Many such designs are being proposed. However, most are dedicated to only a single language, and often this language is designed or greatly modified to suit the hardware machine.

More variations in memory design are also being studied as they become more economical. Large size associative memories are starting to appear. Memories of this can be accessed according to the contents of the entries rather than by their relative position.

The unifying effects of progress in language and computing theory, together with continuing declines in hardware cost, should now make possible hardware systems which can process a wide variety of high-level languages, providing advantages over compiler-based methods. Such systems should be able to switch easily between languages.

Moreover, the languages should include currently popular programming languages with little or no change. A few such systems have been proposed.

We will seek in the present work to draw together this need for multilingual hardware and the possibilities of a nonstandard architecture with associative storage and many parallel activities. This will lead to the design

of a fast system which will directly process high-level

languages, whose definition can be changed will. Among

possible advantages of such a machine over conventional

architectures are high speed of execution, elimination

of time for compiling, and direct monitoring for error

conditions to provide more reliable programs without soft­ ware overhead.

1.2 Organization of the Dissertation

The next chapter, Chapter II, is concerned with a historical survey of high-level language processors and

related research. Previous work is described in a number of areas relating to the present research.

Chapter III presents an overview of the system to be proposed. This chapter also discusses some of the questions and tradeoffs that were considered in developing the design.

Chapter IV presents the theoretical model of the proposed language processor. A number of useful concepts are defined, and a class of languages is established which will be accepted by the basic model. A model for the analyzer is presented, followed by models for the extraction and execution of the semantic content of the language. Various ways of extending the model are then discussed. Chapter V is concerned with the detailed development of a physical design sufficient to process programs in a simple language# The language definition is presented along with a suitable form for its representation. We then develop a register-level design for the major components of the analyzer, the Analysis Control Unit and

Token Processors, including several subsystems which these units will contain.

A design for the execution control section is similarly proposed. Flow of an example program through the complete system is described. The design is then validated, and its performance analyzed, using a software simulation.

Chapter extends the basic design to handle specific problems in implementing two common languages, FORTRAN IV and ALGOL 60. Necessary enhancements to the system are described. These are required by the semantic features of the languages, and their failure to conform to a theoretical syntax class in all details. The implications of selected features in other common languages are also discussed.

Chapter VII gives an evaluation and summary of the overall work, and presents some suggestions for further research.

Finally, some necessary discussions are deferred to the Appendices. Appendix A considers the problem of 6 determining how many Token Processors are sufficient for any program in a given language. Appendix B presents a possible structure for the Storage Control Unit. CHAPTER II

SURVEY OF PREVIOUS WORK

2.1 High Level Language Machines

This section will review some significant previous work in the development of machine architectures oriented toward execution of high-level languages. The treatment given here is not exhaustive. An effective survey of progress up to 1973 is given by Carlson30.

Consideration of machine design to support high- level programming languages is almost as old as the languages themselves. Probably the earliest such design was for the NCR 304 (1957), described by Yowell^* This machine was actually built. It provided some powerful high-level operations, such as a "merge" instruction.

Proposals for machines fully oriented to particular languages appeared around 1961. In that year Anderson^ presented a design for a computer that could execute programs in ALGOL 60. A diagram of Anderson's system is given in Figure 2.1.

Anderson's scheme begins with the program text in a special memory, the "program memory." Programs are scanned by the "symbol counter" for individual tokens.

When a token is accumulated in the "window register" it PROGRAM MEMORY ADDRESS TABLE VALUE MEMORY

7 \ 7K

SYMBOL Iteration Registers CNTR. K— £

N / ARITHMETIC WINDOW REGISTER UNIT Counters

\/ V / Arithmetic Registers Address STACKS

Figure 2.1 Anderson ALGOL 60 System goes into one of three basic stacks: control, operator, and operand. Execution actions are sequenced from these stacks.

All data is in a separate "Value Memory," and address­ es are in an "address table." This table gives the current associations between variable names and storage locations.

Dormant associations which may exist because of ALGOL's block structure are kept in the value memory. The means of swapping these associations was not described.

The details of the Anderson design are sketchy, and the machine was not implemented. It is significant because it was the first design for a machine which directly executed a high-level language.

Also in 1961, the Burroughs B5500 was introduced^.

The B5500 was a commercial, general-purpose machine which departed considerably from conventional architecture to provide support for high-level languages, especially block- structured languages like ALGOL.

Among the concepts pioneered by the B5500 was a hard­ ware stack used to store operands and intermediate results.

Instead of a sequential stream of instructions, the

Burroughs used a Polish string of "syllables" as its machine language. It was thus well suited to executing the "intermediate language" that newer compilers were producing on their way to normal machine language. 10

In 1963, Mullery et al^ offered a design for a processor called ADAM for executing a specially designed high-level language. This processor provided for variable- length data and introduced control symbols within the data to describe its structure. This concept was later used and expanded in the SYMBOL system described below.

In 1964, Hodges^ presented a machine having as its instruction set the list-processing language IPL-Vo

The most widespread language at this time was FORTRAN.

This language was not as well behaved as ALGOL. However, it was soon considered for hardware implementation due to its popularity. In 1965 Melbourne and Pugmire^ reported the design of a small, microprogrammed machine whose effective machine language was essentially FORTRAN. A two-pass software translation would perform certain conversions, such as converting identifiers to storage addresses and translating expressions to Reverse Polish form. This phase also prepared tables and marked the end of DO ranges. This was like an assembly phase, much simpler than a high-level compiler.

The machine had fixed length data which could be operators, numbers, or characters. The design included a special keyboard for program input, with keys reserved to represent the FORTRAN keywords. Thus problems of source code interpretation were greatly simplified. 11

In 1967, Bashkow et al^ reported on a proposed design

for a FORTRAN machine. Although this system was not

implemented, the design was presented in considerable detail. It was a hardwired system capable of interpreting a program directly without preassembly. However, to simplify the design, they restricted consideration to a

limited subset of FORTRAN. The subset had no FORMAT statement, which probably could have been better imple­ mented in software. More seriously, it made no provision

for functions, subroutines, or COMMON blocks.

The design of Bashkow*s system is shown in Figure 2.2.

The machine separates translation and execution as two distinct modes. In the LOAD mode programs are read in completely and translated into machine code. The machine code is close to the original text and consists of program code, data storage, and a symbol table. The load phase

resolves symbol references, and converts statement type designators to special codes. When the END statement is encountered, the system switches to EXECUTE mode, and the program code is interpreted and executed by various circuit modules depending on the statement type.

Most of the systems up to this point were complete specially designed machines. In 1967 Weber7 presented an implementation of EULER using microprogramming on an IBM 360/30. EULER is a highly dynamic extension MEMORY ADDRESS REGISTER

LOAD SYMBOL TABLE EXECUTE MEMORY

PROGRAM AREA I/O BUFFER

TTT

ARITHMETIC INPUT UNIT Data OUTPUT MEMORY BUFFER k- REGISTER 7 V

DECODER

Read/Print

Figure 2.2 Bashkow et al FORTRAN machine 13 of ALGOL with, constantly changing storage requirements.

Weber's system implemented a subset of EULER with only integer arithmetic, no garbage collection, and omitting several other features. The system had two major phases in microcode: a translator which converted source programs to a Reverse Polish intermediate form, and an interpreter which executed this form. These phases were supported by input-output routines coded in machine language•

The EULER machine divided storage into a Program

Area, Stack, and Variable Area, The program area contained the intermediate code as a sequence of operators and operands, where each operator is followed by a fixed number of operands depending on its type. The stack is for execution-time control and data storage.

It is partitioned into blocks as required by the dynamic language structure. Data and control entities in the stack and variable area are self-describing, or "tagged"; this means that a fixed field in every data word identifies its contents as a value, pointer, control, etc,

McKeeman in 1967^2 considered the influence of language structure on machine design. Some researchers were considering the more general question of advanced facilities to support a variety of languages. In 1968

Iliffe^ published a volume describing his "Basic Language 14

Machine." The approach taken here was to analyze the structures and operations actually used by programmers and develop a machine design oriented to these structures.

The BLM featured a "Tree-Structured Store" as shown in

Figure 2.3. This system used blocks of "codewords" to define and locate the components of a program environment, e.g., code, input-output buffers, tables, etc.' Each pointer could identify a unit of data or another codeword block which extended the hierarchy.

The Basic Language Machinealso featured tagged data as in the EULER machine. This concept was later expanded by McMahan and Feustal^Q.

In 1969 Sugimoto^ proposed a processor for a new and complex language, PL/l. This system involves a software translation phase, the "PL/l reducer," which converts source text into a list-structured machine language, the "Direct Processor Input Language (DPIL)."

This language is executed by the hardware interpreter called the "Direct Processor." A block diagram of the

Direct Processor is given in Figure 2.4.

The DPIL is a series of lists and tables describing the program statements and variables. A basic list of statements is linked to more detailed descriptors of instructions in Reverse Polish form. Several lists contain variables, constants, pointers, and control data. ROOT

LINEAR SOURCES DEST S.D. PROGRAM PAIRS

SUMMARY TABLE

TASK CODEWORD

BUFFERS TREE PROGRAM INSTRUCTIONS CODEWORD

Figure 2.3 Tree-Structured Store of Iliffe I/O CONTROL I/O r UNIT WORKING

STORAGE

MEMORY

UNITS

INSTRUCTION INSTRUCTIONOPERATION UNIT ISSUING UNIT CONTROLLER UNIT

MEMORY UNIT CONTROLLER

OPERATION UNITS

Figure 2.4 Direct PL/I Processor of Sugimoto 17

The Direct Processor is capable of performing execu­ tion steps in parallel. The DPIL includes flags to identify operations that may logically take place in parallel. The "Instruction Issuing Unit (IIU)" reads

DPIL through the "Memory Unit Controller (MUC)" and passes commands to the "Operation Unit Controller (OUC)."

This unit can activate any of several special-purpose processors for arithmetic, logical operations, data conversion, etc. The results are buffered in several

"Working Storage" units and passed back to the MUC. The working storage registers use a tagged representation, with fields for data and for attributes.

In 1970 McFarland^ reported on the HYDRA computer.

This design was a direct processor for an extensive, specially developed language named TPL. The processor had four separate units which operated in pipeline fashion, each passing data to its successors. These units translated instructions, broke them down into basic operators, and executed them.

Also in 1970 Thurber and Myrna^ presented their

Cellular APL computer. APL was an attractive candidate for hardware processing. It is designed to be interpreted rather than compiled, and it supports conceptually parallel processing of array elements, which hardware could implement in fact. A diagram of Thurber and Myrna's VECTOR ACCUMULATORS (32) VA1 VA32

/ \

32 X 32 SHIFT REGISTER ROUTING AND CONTROL LOGIC ARRAYS

/ 7K MA16

PREPROCESSOR Matrix INSTRUCTION Logic- MEMORY In UNIT Memory UNIT

I/O 4 I/O CONTROL

Figure 2.5 Cellular APL Computer 19 system is given in Figure 2.5. A principal feature of the system is the "Matrix Logic-in-Memory Unit (MLIM)", a

32 by 32 array of parallel processing units. This unit communicates with local storage units of similar size, forming a parallel array of accumulators and arithmetic units.

Program text and data is read in through unspecified input-output control and an unspecified preprocessing unit which breaks down text into suitable parallel operations. These operations are interpreted in micro­ code in the "Instruction Memory Unit." The system also has 32 "Vector Accumulators" for reduction operations.

In 1971 Love and Savitt^ reported on ASP (Association

Storing Processor). This project involved development of a language oriented to associative processing techniques, and design of a machine to interpret this language.

A major development in 1971 was the first report on the SYMBOL system by Rice and Smithy. Later reports on this system are given by Laliotis^• SYMBOL became the first direct high-level language processor to be fully designed and built and put to actual use.

The SYMBOL machine is an interpreter for the SYMBOL language, a specially designed block-structured language based on ALGOL and PL/l. A diagram of the system is given in Figure 2.6. The machine is partitioned into CHANNELS MAIN BUS CHANNEL CONTROLLER SYSTEM MAIN

SUPERVISOR MEMORY

MEMORY CONTROLLER

DISK TRANSLATOR

DISK DISK FILE CONTROLLER

CENTRAL

MEMORY INTERFACE PROCESSOR PROCESSOR RECLAIMER

Figure 2.6 SYMBOL Computer System 21

8 independent processing units. Program text is read in through the "Translator," which builds name tables, etc., and converts the program to a Polish string form. The

"Central Processor" carries out processing and execution of this data. A "System Supervisor" provides overall control and operating system functions.

Access to the SYMBOL memory is routed through a

"Memory Controller" which provides character and operations on them as its basic primitives. There is a

"Memory Reclaimer" which handles garbage collection when necessary, and other units for input-output processing.

This system is described in great detail in the literature.

In 1972 the Burroughs B1700^^ was introduced. Like

Iliffe's BLM, this system provides general high-level hardware operations. It then is extended with firmware to interpret several alternate high-level languages with different orientations; Finally, user languages like

FORTRAN, COBOL, etc. are translated by software into the most suitable firmware language.

Shapiro.Q presented in 1972 a design for a SNOBOL 1 o processor. This system was a basic Von-Neumann machine with extended operators and data types suitable for SNOBOL, to make compiling and execution as efficient as possible.

Some important advances were brought together in 1973 with the first Symposium on High-Level Language Computer 22

Architecture at the University of Maryland^. Among systems reported at this time were SYMBOL^* Herriot's

GLOSS^q» the DHLLP of B l o o m ^ ^3 * and special-purpose systems for aerospace applications-- -_. | fe J GLOSS draws on emerging concepts for modeling of program processes, especially Johnston,^. The machine uses an unlimited number of independent virtual processors and implements a model sufficient to handle the semantics of many well-known programming languages. Actual languages are translated into the GLOSS language by a software translator; this translation would lose no semantics and is in fact invertible.

DHLLP is an interpreter for a subset of ALGOL. A significant feature of this machine is that it accepts programs at the source language level and interprets them directly and incrementally; there is no Polish string or intermediate form whatever.

Also in 1973 Wade and Schneider^,. presented a design for a simple machine having operators suitable for high-level languages; Chevance^^ proposed a design for a COBOL processor; and Hassitt et al2 7 reP°rted a microprogrammed implementation of APL on the IBM 360/25.

In 1975 Sylvain and Vineberg__ reported on their

Array Machine. C h u ^ published the first complete text­ book on high-level language machine architecture. 23

Also in 1975, Fournier*,^ presented his "Grammar-

Programmable Machine." This introduced the concept of grammar-programming. A direct interpreter for a high-

level language resides in writable microcode, and this

interpreter can be reloaded for other languages or even

extended dynamically during execution. This appears to

be the first proposal for a direct processor whose language

definition could be easily changed.

A logical flow diagram of the GPM is shown in

Figure 2.7. The diagram breaks a view of the system

into three levels; the intermediate "Grammar level" is

the semi-permanent storage of the language definition.

The machine consists of four principal units: a Scanner

for recognizing tokens in the source code; an Analyzer

for syntax analysis and conversion to a Polish stream

of commands; a Data Processor for execution; and a

Memory Processor. The analyzer is developed in detail.

Language definitions are represented as a "Syntax Network."

This model is then used to control state transitions to

determine allowable interpretations for each input token.

A basic set of semantic operators is assumed; this set

and the Data Processing mechanism are not detailed. The machine was shown suitable for ALGOL and SNOBOL through

simulation.

The chronology of high-level machine proposals and

designs as described above is summarized in Table 2.1. MEMORY INTERFACEPROCESSOR

PROGRAM DATA USER

LEVEL USER'S HLL PROGRAM

GRAMMAR GRAMMAR GRAMMAR LEVEL DESCRIPTION/ PROGRAM PROCESSOR

HARDWARE/ FIRMWARE MEMORY PROCESSING LEVEL FUNCTIONS FUNCTIONS/ OPERATIONS

Figure 2.7 Fournier1s GPM System (Vi 25

Table 2.1 - Chronology of HLL Machines

YEARSYSTEMLANGUAGE MP/HW IMPLEMENTED

1957 NCR 304(Yowits) general HW yes 1961 Anderson ALGOL HW no 1961 B5500 general HW yes 1963 ADAM special HW no 1964 1PL—VI IPL-V HW part 1965 Melbourne et al FORTRAN MP no 1965 Bashkow et al FORTRAN HW no 1967 Weber EULER MP yes 1 968 Iliffe(BLM) general HW no 1969 Sugimoto PL/1 HW part 1970 McFarland special HW no 1970 Thurber, Myrna APL HW part 1971 ASP special HW no 1971 SYMBOL special HW yes 1972 B1 700 many both yes 1973 GLOSS many HW no 1973 Bloom(DHLLP) ALGOL HW no 1973 aerospace special HW no 1973 Wade et al general MP part 1973 Chevance COBOL HW no 1973 Hassitt et al APL MP yes 1975 Array Machine general HW no 1975 Fournier GPM many both no 26

Column 3 of this table gives the language implemented if it is a direct processor for a single, well known language.

Otherwise, it indicates if it has general features for high level languages, implements a special language designed for the machine, or is a direct processor for many languages.

Column 4 specifies if the design is primarily micro­ programmed, hardwired, or both. Column 5 indicates if the machine has been implemented. The information in this table is inferred by the author from the literature as published. The implementation status of some systems may have changed,

2.2 Related Research

Progress in the development of high-level-language oriented machines has been aided and influenced by work in many related fields. Among these are prog'ram language development and analysis, formal language theory, and computer architecture. This section will review some work of interest in these areas.

2.2.1 Languages and Theory

Clearly, a major influence on the development of machines to process high-level languages is the languages themselves. A good history and detailed description of programming languages through 1967 is given by Sammett^g.

FORTRAN, first appearing in 1954, was undoubtedly the 27 earliest language to gain extended use on more than one machine. FORTRAN developed by evolution, with little understanding at first of the relation between language features and their implementation.

In the early 1960's two important languages, ALGQL^ and COBOL^y were developed to meet growing needs for scientific and business computing, respectively. Each was developed by a committee in an organized effort.

Significantly, each used a formal notation to describe the syntax of the language. The languages took on some overall structure, and compiler writing was easier.

Many languages have been designed to meet problem- oriented needs of various users. Those that have gained widespread use have been those that did not present major difficulties in implementation. Language and architecture development have long been symbiotic, with the most successful achievements in one area being guided by what was necessary or possible in the other.

Recent language developments, though, have been launched with more confidence in our ability to prepare adequate translators and support systems in software although running on conventional hardware. These languages require not only translators but extensive software present during execution to manage special needs. PL/I^q uses constant dynamic storage management. Many languages, 28

e.g., LISP 1.5^^? use lists, trees, and assorted complex data structures as basic building blocks. Some important

languages are extensible, such as SNOBOL4^q and ALGOL 6 8 ^ .

These languages can add new data types and even new

operators during execution. A system to keep up with

them must be able to change its language definition

as well.

Language researchers have noted the failure of

hardware to support language needs. Sammett^^ is surprised

by the absence of high-level language machines. Griswold^

encourages efforts toward direct implementation of SNOBOL.

Pratt^ considers implementation problems (in software)

of a number of languages, and points out many problems

that could be solved with better architecture.

The development of programming languages was aided

by, and required, parallel progress in formal language

theory. It was recognized that programming languages,

unlike natural languages, must be clear and unambiguous.

The first achievement was a theory for the structure

(syntax) of languages and grammars. This was begun by

the work of Chomsky^. A good reference for grammar

theory as of 1969 is Hopcroft and Ullman^. The revised

ALGOL report.^ introduced BNF notation as a means of

formally defining a language syntax. These efforts have

led to methods for defining recognizers (either software

or hardware) to identify and classify valid program 29 components in a language. These methods can be relatively independent of the language itself.

To interpret a program we must not only recognize a structure but assign the correct meaning to that structure.

The problem of language semantics is only beginning to yield to theoretical analysis. The IBM Vienna Laboratory has worked on a formal model for semantics and used it to prepare a formal definition for PL/l^g. This model is based on a tree structure, with nodes representing "objects" and branches serving as "selectors" to indicate relation­ ships. The method isfully described and extended by Lee^.

Other approaches to semantic definition are reported, for example, in Rustin^g.

Other theoretical work has included modeling of programs dynamically during execution. Johnston2 4 intro­ duced the "Contour Model" which displays the relationships in block-structured processes such as ALGOL programs both statically and by "snapshots" of particular points in execution time. Several other efforts in this area are reported in Tou and Wegner^.

Semantic theory and program modeling have had little influence so far on processor designs.

2.2.2 Computer Architecture

The central concepts of computer architecture have in large part remained constant since the earliest systems, 30 based on ideas of Von Neumann.,.. The so-called "Von Neumann ou architecture" uses a single central processor, a memory, arithmetic unit, and input-output channels. The CPU reads instructions one at a time from the memory and executes them.

In early machines the types of instructions available varied widely and instructions were implemented in an ad hoc manner. An important milestone was the development of microprogramming, first proposed by Wilkes^ in 1951.

With this technique, instructions could be built up in an orderly fashion from primitive building blocks such as register transfers. More recent machines use microprograms of considerable power, and in some cases the programs can be changed. Microprogramming has been effective in imple­ menting a number of high-level languages- , 7 ot. 07. f f I f | b J | b I Research into machine architectures that differed from that of Von Neumann has been motivated primarily by a need for increased speed. The idea that a machine could perform faster by having a number of different units working at the same time was recognized early, but until recently hardware costs were too high to allow many efforts to build such machines. The only serious attempt to construct a machine with a large number of independent processors was the ILLIAC This system was to have

256 identical processors; 64 were actually built.

v 31

Early efforts were directed to speeding up a single processor by "pipelining” and lookahead techniques. In these methods parts of a processor could be handling successive instructions in assembly line fashion. A survey of efforts in this area is given by Ramamoorthy and Li__. by A major limiting factor in machine speed is the time spent accessing memory. A significant development gaining speed in this area was the "associative" or

"content-addressed" memory. An associative memory system has logic in each storage cell. Entries are accessed by the content of part of their cell rather than by physical location, and all entries having certain contents can be fetched directly in a single operation, without searching.

This greatly reduces the number of memory accesses needed for certain problems such as table lookup.

An associative memory was actually built as early as 1956 using cryogenic cells^* Small memories have been built with many other technologies. These memories have been quite expensive and could not be built in large sizes. In recent years, large declines in both the cost and physical size of digital hardware have led to new and extensive research in parallel processors and associative memory techniques. Recent progress is described in a number of surveys, e.g., Thurber and W a l d ^ and Feng^g. 32

Very large associative memories are now possible*

Anderson^ reports on ECAM, a memory which combines smaller modules to achieve up to 250K words of 4K bits each, over a billion bits in total. Many associative processors have been reported, which perform not just lookup but logical and arithmetic processing within each memory cell.

STARAN^., and PEPE^g are two which have been produced commercia ny .

To some degree, rapid progress in associative and parallel machine architectures has left languages and programs behind. A few applications of these processors are obvious, such as simultaneous arithmetic on elements of an array. More generally, though, the manner of using parallel processors effectively to work together on different parts of the same problem is unsolved. Some work has been done on analyzing and transforming programs to expose opportunities for parallel processing. Tjaden and

Flynn^Q considered general methods of identifying independent instructions. Kuck et al^ and Lampar^ studied parallelism in arithmetic expressions and DO loops.

Stonegg among others considers how to direct compiler analysis of arithmetic expressions to achieve the most efficient parallel, rather than stack-oriented, execution.

This subject is surveyed by Kuck^g. CHAPTER III

OVERVIEW

3.1 Design Considerations

The task of designing a machine to interpret high-level languages requires many decisions based on alternate approaches to the design. This section discusses some of the choices that we make and reasons for them.

The major goals of the project are two: to present a system whose language definition is variable over a large class of languages, and to explore uses of parallelism in the interpretation. A variable language definition enhances the flexibility of the system and its potential as a full replacement for compiler-driven machines. This concept has been proposed only occasionally^Q 2 9 * an<* ^as not yet been implemented. Use of parallelism can greatly improve the speed of a system, as well as bring the system's actual behavior closer to its logically desired behavior in some program situations. The major factor limiting this exploration has been cost. This economic problem has now been greatly reduced, and parallel architectures are being extensively investigated^^. However, work in this area for high-level language processors is limited^.

We will restrict consideration to programs that may be expressed as a string of (ASCII) characters. This disallows 33 34 only certain specialized ''languages" having a multidimen­ sional input form.

With any system, program interpretation is a two phase mixture of analysis and execution. The meaning of a suitably-selected portion of program text must first be discerned, then acted upon. For greatest throughput, analysis should proceed through the text in the sequence in which it will be executed, passing incremental commands off for execution. While this is ideal, it presents many problems, since parts of a program may need to reference and even transfer to other parts not yet seen by the analyzer. Unlike many systems, we do not wish to restrict consideration to languages designed for

"one-pass" interpretation.

For this reason, analysis in our system will process a program in its input sequence and in convenient units, and execution will not proceed with a step until all the necessary information is available. The basic unit con­ sidered may be a statement; often we will assume that it is the entire program module.

Chu ^ classifies high-level language processors according to the degree of translation required of the source text. The most desirable is held to be the "Direct" processor which executes programs strictly in their original form, with no intermediate translation. We will 35

take the view that use of an intermediate form is less harmful if, as we postulate, adequate fast storage is

available. Moreover, it enables us to identify oppor­

tunities for parallel execution. Thus we adopt an

intermediate form having an indefinite lifespan. No

part of the source text will be analyzed more than once•

The proposed system will have analysis and execution

sections that function independently. Within these

sections we plan to use a parallel approach. In analysis we will scan the input sequentially, and as each new unit

of text may support several possibilities for its

interpretation, we will try to explore all these possi­

bilities at once. In execution we wish to perform steps

in parallel where opportunities are discovered. Examples

are evaluating terms to be added, or simultaneous arith­

metic on array elements. This motivates use of a tree

structure (supported by symbol tables) as the basic

intermediate program form.

A further opportunity for parallel activity lies in

the use of associative storage. While not essential to a

high-level language processor, use of an associative

memory offers significant advantages in many steps of the

interpretation, especially table management. Advantages

can accrue not only in greater speed, but also in chances

to support program reliability and error checking without 36 overhead penalties. Disadvantages of associative storage have been high cost and the infeasibility of large memories.

Both of these problems are being overcome^.

Researchers have considered ways of analyzing programs to discover more global, context-dependent opportunities for parallel execution^Q_^^ 5 5 * For example, iterations in a DO loop could be performed simultaneously under certain conditions. Unfortunately, discovering these opportunities in most languages requires an extensive analysis normally done in a software preprocessor. It is not currently feasible in a system like ours which seeks to eliminate such preprocessing. We believe a more fruitful approach to this problem lies in new language constructs which allow programmers to make opportunities for parallel processing explicit in case a suitable machine is available.

An important consideration for a complete design of a hardware language processing system is the possibility for diagnostic reporting, error detection and recovery, etc., compared to a conventional system. These considerations are essential in a practical system, but they will not be emphasized in the present work.

A major issue in a system with variable language definition is how the language should be represented. All information on the language's syntax and semantics must be encoded in some way in a fast local storage. We will try 37

to reduce this information to tables to the greatest extent

possible. This provides for a general language-independent

structure. However, in some cases the language definition will contain microcode to be executed by certain special-

purpose processors. We do not wish to restrict the language

set to a common base or orderly form which does not exist

in reality.

The overall syntax of a language can be based on well-

known definitional systems. Recognition of individual

tokens, as well as deletion of extraneous text, is a less orderly process that must be handled in an ad hoc manner.

Tabulation of semantic properties is an uncharted territory.

We have tried to organize processes for building an execution tree to drive a fairly universal set of execution

processors. All cases cannot possibly be covered. We will provide escape mechanisms for implementing specialized processes in software when special hardware is not available.

The distinction between syntax and semantics is a hazy one which must be considered for each language. For example, the priority of different operators in an expression is often built into and enforced by the syn­ tactic definition, although the issue is more of meaning

than of form. While convenient, this approach cannot be used in extensible languages, such as SNOBOL4 or ALGOL 6 8 , where operators and their priorities may be changed. In 38 these cases we must resort to a "priority table" and treat priority as a semantic question.

We will postulate the existence of a program which can accept information about a particular language in a commonly available form, and generate a definition well suited to the language and to our system. The efficiency of this program will not be an issue, although the efficiency of the resulting language definition is important. We have not constructed such a program; language definitions used in this work have been manually prepared*

Finally, some discussion is in order about the use of the proposed system and the environment in which it may operate. A high-level language processor is probably not useful for large programming systems, or programs that will run continuously for an extended time such as process controllers. The language and method of translation is irrelevant in such systems, as they require efficiency only during execution. Our system will be most useful for stand-alone programs written to solve a specific problem.

We will not be concerned with complex linkages between program modules. We must, however, provide for programs with accompanying subroutines, as well as libraries of subroutines stored in a convenient form.

Input and output also cannot be ignored. We provide for 1 - 0 formatting and translation as particular execution 39 tasks. Actual input and output will be assumed to consist of sending or accepting small units of data (characters) on a specified channel. We will not consider problems of real-time data acquisition, and we do not anticipate such a use. We will also not consider problems of interrupt- driven 1 -0 , although that may be a valid extension.

Our system will be viewed as a single-user machine, and its overall executive control will not be emphasized.

The philosophy that justifies providing extensive hardware for one task also suggests that it will not be too important to time-share that hardware among several users.

3.2 The Proposed System

The system to be proposed in this dissertation is a high-level language processor which is distinguished from most previous systems by two key characteristics.

First, the definition of the high-level language is stored in microcode and is dynamically variable over a wide class of languages; second, parallel processing elements are used to explore different analysis paths simultaneously and to evaluate multiple operands for execution. The system is motivated by goals of high speed and language flexibility; hardware costs are assumed to be low. ! r ------LANGUAGE DEFINITION STORAGE

MAIN LANGUAGE STORAGE DEFINITION STORAGE TABLES CONTROL

EXECUTION ANALYSIS

EXECUTION (Data-in TOKEN PROCESSOR PROCESSOR ARRAY Data out Text ARRAY

/\

PREPROCESSOR Text Control OPERAND EVALUATOR ARRAY Control Control ANALYSIS CONTROL UNIT

Figure 3.1 Proposed System Block Diagram 41

A simplified block diagram of the system is shown in

Figure 3.1. The system has four major sections: Language

Definition, Analysis, Execution, and Storage.

The Language Definition Section is a set of semi­ permanent data stored in a fast local associative memory for rapid access. This data in total defines both the syntax and semantics of the language being accepted at any moment. The data contains substantially all the information needed to effect a mapping from the input character stream

(program) to the actions of the execution section. Some information is in the form of tables, and some can be viewed as sequences of microcode for certain special processors.

This data is not normally changed during processing, but it may be changed to switch languages or to implement languages which are dynamically extensible.

The Language Definition Tables and the program text form the inputs to the "Analysis Section." The program text is a continuous stream of characters in a suitable alphabet. The text is first passed through a preprocessor which may deal with comments, line headers, line endings, etc. The remaining text is then presented to an array of identical processing units termed "Token Processors."

At any moment, a subset of these processors will be active, each monitoring the input stream for occurrence of a specific token (variable name, number, keyword, etc.) 42

Those token processors which succeed in finding their token

then perform a state transition directed by the language

tables. This activity includes modifying a developing output data structure and activating one or more new

Token Processors. The output data structure is called

an Execution Tree, and consists of semantic directives.

This tree, along with a common symbol table, is made available to the Execution Section.

The execution section consists of a central Control

Unit and a variety of independent special-purpose processing units which can carry out execution steps. Every execution processor is viewed as performing a function on a set of operands and returning a single result. The individual units may have tasks of wide ranging complexity, e.g.,

allocate storage, multiply, read in an array, interpret a

FORMAT statement, etc.

Execution units may be hardwired or they may be programmed microprocessors. Some of the highest-level

processors may be designed to support a specific language.

Provision is also made for selection of extended microcode

sequences for general-purpose processors.

Driven by the execution tree, the execution control unit first invokes processors to perform the highest-level

functions. These units then call others to evaluate

their operands in parallel. Beyond the parallel activity 43 made explicit by the execution tree, some processors are

implicitly required to invoke others for low-level

activities, e.g., symbol lookup, mode conversion, storage

allocation. Where necessary, a number of units may be

included to perform identical functions. Overall, the

execution section handles all input of user data, processing

of this data, and generation of output.

The Analysis and Execution sections work and communi­

cate through the Storage Section. This section contains

a storage control unit and a large associative memory.

The storage control unit filters memory access requests

to prevent conflicts. It also aids in managing a variety

of useful data structures, including strings, lists,

stacks, queues, and tables. CHAPTER IV

THEORETICAL MODEL

This chapter will develop a theoretical model for the major sections of the language processor. Models for analysis and execution of programs will be presented in detail. Section 4.1 is concerned with specifying the basic language class to be considered. Section 4.2 introduces a means of representing grammars as directed graphs. Section 4.3 provides a series of definitions to build a base for the model.

Section 4.4 presents the model for syntactic analysis of programs. Section 4.5 gives the form of the semantic data structure, the Execution Tree. Section 4.6 extends the model for semantic processing. Section 4.7 covers execution of the resultant data structure.

Section 4.8 discusses several possible extensions to the basic models. Finally, Section 4.9 is a summary of the chapter.

4.1 The Language Class

This section defines the class of languages to be treated by the model. We start by stating some basic definitions. Much of the basic material on grammars is covered in detail by Hopcroft and Ullman.c and will be stated only briefly. 44 45

Definition 4.1. A Language L is a set or "alphabet" of distinct symbols, together with a set of ordered sequences or "strings" of symbols in this alphabet. The strings need not be of finite length or number. The strings are termed sentences in the language.

Definition 4.2. A Production is a rule of the form

s1 — > s2

providing for the replacement of string s1 by string s2. In general, s1 is imbedded in a larger string. s1 is said to directly produce s2 .

Definition 4.3. A Grammar G for a Language L is a 4-tuple: G « (Vn,Vt,P,S> where Vt is the set of symbols in the alphabet of L; Vn is a set of symbols distinct from those in Vt; P is a set of productions; S is a specific symbol in Vn.

The symbols in Vt are called Terminal Symbols0 The symbols in Vn are called Non-terminal Symbols. No symbol is in both Vt and Vn. The symbol S is called the Goal

Symbol.

Definition 4.4. A Grammar-Variable (GV) is a non-terminal symbol in a grammar.

Definition 4.5. A string W of symbols in Vt (y Vn is said to be generated by a grammar G if there exists a sequence of productions in P which produces W, starting from the goal symbol S. The sequence of productions is called a derivation of W.

Definition 4.6. A sentential form is a string of symbols in Vt U Vn, generated by a grammar G. If the string contains only symbols in Vt, it is a sentence.

Definition 4.7. A Leftmost Derivation is a derivation such that, at every step, the grammar-variable replaced has no grammar-variable to its left in the sentential form from which the replacement was made. 46

We now have the basic terminology for describing any

grammar* We next define some restrictions on the class of

grammars which we will consider.

Definition 4.8. A Context-free Grammar is a grammar G in which all of the productions are of the form A -» W where A is a single Grammar-Variable and W is a string of symbols in Vn U Vt.

Definition 4.9. A Context-Free Grammar is Left- Recursive if there exists a grammar-variable A and a sequence of one or more productions, such that applying the productions, starting with A, produces a string having A as its leftmost symbol.

Definition 4*10. A Context-Free Grammar is in Griebach Normal Form if all of the productions are of the form A — > bW where b is a terminal symbol, and W is a string of grammar variables. Further, no production may produce the empty string except perhaps S —

Definition 4.11. A grammar is ambiguous if there exists a sentence generated by the grammar which has two or more distinct leftmost derivations.

The grammars which we will consider will be context-

free and must not be left-recursive. A theorem due to

Griebach^ shows that any context-free grammar can be

converted to a grammar in Griebach Normal Form. Such a

grammar produces only strings whose leftmost symbol is

a terminal, and cannot be left-recursive.

It is necessary to require that if cx and J3 are

distinct strings in L, Or is not a leftmost substring of

0. To ensure this, we augment G and L by introducing 47

a new terminal symbol z, the end symbol, not formerly in

Vt. The end symbol must appear as the final symbol of

every string in L, and it may occur nowhere else in the

grammar.

We also require that the empty string does not appear

in any productions. The Griebach Theorem shows that this

requirement can be satisfied. We will be more stringent

and not allow the production S— thus there are no

empty sentences in our language. In addition, every grammar-variable in G must appear on the left side of

some production, and every GV except the goal symbol S must appear on the right of some production. Finally,

G must not be ambiguous.

In what follows, the term "grammar" will imply a grammar with all these properties, unless otherwise stated.

4.2 The Directed Graphs of a Grammar

In this section we introduce a method for representing

a grammar as a set of directed graphs. To do this we need

some further definitions, including some basic concepts of graph theory.

Definition 4.12. A Directed Graph is a set of points called nodes and a set of directed line segments called arcs, each beginning at a node termed the initial node and ending at a node called the terminal node. 48

Definition 4.13. A Connected Path in a graph is a sequence of arcs such that the terminal node of each arc is the initial node of the next arc in the sequence.

Definition 4.14. A Grammar Graph is a directed graph with a unique node called the entry node, such that any other node can be reached by a connected path starting at the entry node.

In what follows, a Grammar Graph will be called simply a graph. It will be designated by the script letter

and an identifying subscript. The subscript alone may be used where no confusion can arise. We will represent graphs pictorially as shown in Figure 4.1. Nodes are represented by small circles, and-arcs are represented by arrows connecting two nodes. The normal direction of arcs is left to right or top to bottom. Nodes will be numbered for identification; the numbers have no other significance.

In graph node 5 is designated as A5 ; the arc from node 5 to node 6 is designated A5-6.

As developed below, we will often associate information with arcs. This will be represented by symbols above or close to the arcs.

We will use graphs to represent productions in a grammar. To do this we need the following definition;

Definition 4.15. For a given grammar-variable A in a grammar G, let P be a set of productions as defined in Definition 4.3. Then the A-partition of P , designated P^, is the set of productions of G having A on the left side. 49

Figure 4.1 Grammar Graph Format

PA can be represented as a graph. To construct the graph, begin with a single node, the entry node. For each production in P , draw a sequence of arcs separated by nodes starting from the entry node, having one arc for each symbol on the right side of the production.

If P^ consists of the productions

A —> abA A —> cdaBb A —> cbBcd then an equivalent graph is ^ in Figure 4„1•

Definition 4.16. A Graph System for a grammar G is a set of graphs derived from the productions of G, containing one graph for each grammar-variable in G. 50

We next present, a simple grammar and associated

Graph System to illustrate these concepts and others which will follow. The grammar is defined as follows:

Vn - {S,A,V,E,P}

Vt = {a, b, c, d, e, f , g, z}

P, the productions, are:

S — * Az S — ^ abAz A — » VcE V — » a V — > adef E - i P E — * PgE P — » dEf P — * e P — > V

S, the Goal Symbol, is S.

A Graph System associated with this grammar is shown in Figure 4.2. Every arc is associated with (contains) exactly one symbol in Vt U Vn. Some of these graphs have been transformed by rules to be described later.

Next we will state several definitions relating to classes of arcs and nodes, and their representation.

Definition 4.17. A Terminal Arc is an arc containing a Terminal Symbol.

Definition 4.18. A Non-Terminal Arc is an arc containing a non-terminal symbol (Grammar-variable). •z. 3 V f V: _Q_ -»0-

Figure 4.2 Example Graph System 52

Definition 4.19. An Entry Node is the unique node in a graph Jfa corresponding to the starting point of all the rules in P A. This is a restatement of A the definition in 4.14. Entry nodes will always be represented by a solid circle.

Definition 4.20. An Node is a node which corresponds to the end point of one or more productions. An arc containing the rightmost symbol on the right side of some production must enter this node. Exit Nodes will always be represented by a double open circle. The set of all such nodes in a given graph is the Exit Node Set for that graph.

Definition 4.21. An Internal Node is a node which is neither an Entry Node nor an Exit Node. Internal Nodes are represented by a single open circle.

An entry node cannot also be an exit node; else P. J A would have to contain the production A

In Graph A of Figure 4.2, Arc A2-3 is terminal;

Arcs A1-2 and A3-4 are non-terminal. In Graph V, Node V1

is the entry node; nodes V2 and V5 are exit nodes; and

nodes V3 and V4 are internal nodes.

Definition 4.22. The Final Node of a Graph System, N ^ n , is the unique exit node for the graph of the Goal Symbol S.

There is only one exit node in because of the

end symbol introduced above. There are no arcs leaving

this node and only one entering it; that one contains the

end symbol. In Graph S cf Figure 4.2, the final node is S5. 53

We defined above (Def. 4.13) a Connected Path in a graph. We now define certain types of paths and their relation to sentences in the grammar. This will lead to concepts of equivalence of graphs and transformations.

Definition 4„23o An Augmented Connected Path in a graph Jfy in a Graph System is formed from a connected path in ^ by zero or more applications of the following rule: for some non-terminal arc in the path containing grammar-variable Z, replace the arc by a connected path in fy ^, beginning on the entry node of J b ^ arid ending on some exit node of i ) z .

Definition 4.24. A Valid Path through a graph f y ^ is an Augmented Connected Path beginning on the entry node of fo ^ ending on some exit node of Jtj and including only terminal arcs. A valid path is associated with (generates) exactly one string, consisting of the symbols associated with, each arc in the path in order. A valid path in Jfy^ generates a sentence in the grammar.

Definition 4.25. Two valid paths are Syntactically Equivalent if they generate the same string.

Definition 4,26. Two graphs are Syntactically Equivalent with respect to a Graph System, if for any valid path generated by either graph, there is a syntactically equivalent valid path generated by the other. Two graph systems are Syntactically Equivalent if the graphs /j s in each are syntactically equivalent.

It is possible to transform graphs within a graph system in a number of ways which preserve syntactic equivalence. Among these transformations are the following: a) combine equivalent branches

b) replace right recursion with loops

c) combine and eliminate variables

d) merge certain exit nodes

These transformations are illustrated in graphs

A through D, respectively, in Figure 4.3. It can be seen by inspection that these graphs are syntactically equivalent.

Another example is shown in Figure 4.4. In this figure Jfy^ is the graph shown earlier in Figure 4.1•

If ' is defined as shown, then is syntactically o J lA j equivalent to in the context of this particular graph system. Note that with a different definition for

this equivalence would no longer hold.

In later sections we will discuss additional information which can be associated with arcs and thus generated by a valid path. This will lead to a more restricted definition of equivalence.

4.3 Further Definitions

This section present some additional definitions which will be useful. The first ones provide a classifi cation for exit nodes.

Definition 4.27. A Simple Exit Node is an Exit Node which is not the initial node of any arc in the same graph. 55

A' : A: ■ O

B' : -<§>

-<§>

b

D: — &

Figure 4.3 Equivalent Grammar Graphs M 57

Definition 4.28. A Complex Exit Node is an Exit Node which is the initial node for one or more arcs in the same graph.

It is possible to introduce complex exit nodes by transformations such as those described above. In

Figure 4.2, Node V5 is a simple exit node, while Node

V2 is complex. P4 is a simple exit node which was merged from three separate nodes.

Associated with nodes of a graph are some useful sets of terminal symbols which define allowable paths through that node.

Definition 4.29. The Next Terminal Set (NTS) for a given node N in a graph system is the set of all terminal symbols associated with arcs which may appear immediately following node N in some valid path through the goal symbol graph ^ in that graph system.

Definition 4.30. The Initial Terminal Set (ITS) for a graph ft is the Next Terminal Set for the entry node of jty

Definition 4.31„ The Exit Terminal Set (ETS) for an Exit Node with respect to a graph system is the Next Terminal Set for that exit node.

The reason for distinguishing the Exit Terminal

Set will be seen shortly. We will describe how terminal sets can be identified, and then provide illustrations from the figures.

Let W C(Vn (J Vt) be the set of all symbols which appear on arcs whose initial node is N. Then the NTS 58 for node N consists of the union of

a) all terminals in W;

b) all terminals in the Initial Terminal Set for

the graph associated with each grammar-variable in W,

This definition holds for any N which is not an Exit

Node. The definition is recursive, but it cannot be circular since G was required to not be left-recursive.

W cannot contain or lead to a graph having N as its initial node. Note that if N is the entry node of a graph then the NTS for N, which is the ITS for Jij specifies all terminals which may occur as the leftmost symbol of some string produced by the grammar-variable A.

Referring again to Figure 4.2, Graph $ ^ has terminal a in its Initial Terminal Set. a is on the terminal arc leaving S 1 ; it is also the complete ITS for V and thus for A. Graph P has terminals d,£, and a in its ITS. The

Next Terminal Set for Node S2 contains only b.

An Exit Terminal Set is identified in a somewhat similar manner. However, there is now the complication that some of the possible successor arcs are in different graphs than the one containing the exit node. In fact if the exit node is in graph then wherever there exists an arc containing A in the entire graph system, all arcs which immediately follow it must be considered as successor arcs to the exit node. 59

Let N be an exit node of graph f a A » and let Z be the set of all non-terminal arcs throughout the graph system which contain A. Let M be the set of destination nodes for all arcs in Z. Finally, let W cVt be the union of the Next Terminal Sets for all nodes in M. Then W is the

Exit Terminal Set for N.

This definition is also recursive; it may in fact be circular. We have no guarantee that graphs in our grammar are not right-recursive. However, when tracing an ETS we will simply bypass any troublesome areas. If

M leads to a node which is again an exit node of f a A »

its exit node will be omitted. We justify this by noting that if we reconsidered the same node again and again we would discover no terminals not found on the first pass, and we are interested only in the union of the terminals which can be found.

To illustrate these ideas we will trace the ETS for the exit nodes of graph V in Figure 4.2. The nodes in

M are A2 and P4. A2 is not an exit node, and contributes terminal £. P4 is again an exit node with destination node set E20 E2 covers £ and exits to nodes P3 and A4.

From P3 we get £. A4 exits to S4 and gives z. The complete Exit Terminal Set for V2 and V5 then contains

£»£»£» ancl £• 60

To identify a point on a particular Valid Path, it is important to Know both a particular node and the route which has led us to that node. Specifically, to continue along the path, we must know where to go when an exit node is encountered. This question is addressed in the next definition.

Definition 4.32. A Return Node Stack (RNS) is a stack of nodes associated with a given arc in a valid path, specifying nodes to transfer to when successive exit nodes are reached while continuing to traverse the path.

When constructing a valid path through A, the

Return Node Stack is initially empty for all arcs in the path. If a valid path through B is substituted for a non-terminal arc Am-n in A, then node An is placed on top of the existing stack for all arcs in the path being substituted.

We have seen that every valid path through the goal symbol graph ^ generates a string which is a sentence in the grammar G. Moreover, since G is unambiguous, every string in G is generated by exactly one valid path through f y

Definition 4.33. A Partial String in a language L is a terminal string which forms a left substring of one or more sentences in L.

Definition 4.34. A Part ial Path through a graph f t ^ is a leftmost segment of a Valid Path through A. 61

Definition 4.35. The Completion Set of a Partial Path « through J t y is the set of all Valid Paths through $ for which a is a leftmost segment.

Definition 4.36. The Length of any Connected Path including only terminal arcs is the number of (not necessarily distinct) arcs in the sequence. The length of a connected path containing non-terminal arcs is a range; it spans the lengths of all possible Augmented Connected Paths (Def. 4.23) containing only terminal arcs which can be formed from the given path. This range is not necessarily finite.

A partial path is associated with a partial string.

A partial string may have several partial paths; however, no two of these may lead to valid paths which generate the same string.

We will illustrate these concepts with Figure 4.2.

A sample Valid Path through S consists of the following arcs;

S1-2; S2-3; V1-2; A2-3; P1-4(e); E2-1; V1-2; S4-5

This path generates the string abacegaz, which is a valid sentence in the grammar represented. The third arc was initially S3-4; this was replaced by a Valid Path through A. A1-2 was replaced by a valid path through

V, etc. There is a loop in this path, since the arc E1-2 is traversed twice. Any portion of this path starting with S1-2 is a valid partial path, and the corresponding portion of the sentence is the valid partial string. The Return Node Stack after appending each arc in this path is given as follows:

ARCRNS

S1 -2 S2-3 V1-2 A2,S4 A2-3 S4 P1 -4(e) E2,A4 ,S4 E2-1 A4,S4 V1-2 P4,E2,A4,S4 S4-5 —

The length of this valid path is 8 since it contains only terminal arcs. The length of a path containing, say, the single arc A3-4 is unbounded, since in expanding E we may traverse the loop any number of times.

Some of the concepts presented here can provide methods for classifying graphs, leading to a measure of complexity for a given grammar. This measure is of value in reducing the hardware requirements of a physical system.

This subject is considered further in Appendix A.

4.4 Syntactic Analysis

The system which we are modeling performs a two-phase process of recognition of a valid sentence in a grammar followed by interpretation of its associated meaning.

This section develops the mechanism for the recognition phase. The process of syntactic recognition for a given string

The (terminal) symbols in or will be examined, one at a time, from left to right. After each successive symbol is scanned, we create all of the partial paths in J b ^ which generate the string of symbols scanned so far, and which have non-empty completion sets. If there are no such paths, recognition has failed. If there is one which is a valid path in S, recognition is complete. We will use the following definitions:

Definition 4.37. The Current Node is the last node of a partial path.

Definition 4.38. Extension of a partial path is the act of appending an arc to the path which forms a new partial path.

Definition 4.39. The Syntactic State of a Partial Path is its Current Node together with the Return Node Stack (Def. 4.32) for the last arc on the path.

After a new symbol is scanned, each new partial path is formed by extension of a previous partial path. For each previous path we identify all distinct terminal arcs containing the new symbol which may validly be appended to the path. If there are none, the old path is removed from further consideration. If there are one or more, each extension forms a new partial path.

The information needed to select valid extensions is contained in the Language Definition. This is an information structure which contains the graph model of 64 the grammar being used. The mechanism we now introduce to perform the recognition is a unit called a Token Processor.

A Token Processor (TP) carries out a single extension step for a partial path as described above. A recognition system contains a number of identical token processors, each capable of acting independently. For the abstract model we assume that this number is infinite; in later chapters we will consider what happens if it is not. In the course of its task, a TP may activate one or more other

TP's for succeeding tasks.

Overall control of analysis is accomplished by an

Analysis Control Unit. This unit has three tasks:

a) Start the first Token Processor;

b) Fetch input symbols as needed;

c) Detect final success or failure and take action.

The operation of the Token Processors will be presented in detail. In the complete model the TP's not only perform recognition but also carry out semantic activity and generate information for the Execution Section. We will ignore these functions for now and discuss syntactic recognition only. The goal of the complete analysis system is then to reach a yes-or-no decision on the validity of a giver, input string. Each TP, after performing its step, must either

a) fail, and start no new TP's, 65

b) start one or more new TP's, or

c) report final success.

The section proceeds until noTP's are active (failure) or until one reports success (final success). We have

required that the grammars be unambiguous. For this reason

it will not be possible for two TP's to report success, or for one to report success while others are still active.

We have also required that valid strings be terminated by a unique end symbol. The model is so structured that the input may be a continuous valid stream of many strings delimited by end symbols; they will be detected and processed separately. We will not allow the input to be exhausted;

rather, if there is no input available, dummy symbols will be generated ("end-of-file") which will not be recognized and lead quickly to failure.

The syntactic process for a single Token Processor

is presented in the flowchart of Figure 4.5. We will describe the process with reference to this figure.

When initially set up for a task, a TP is provided

two units of information: a Goal Symbol and a Return Node

Stack. The processor is then pending. When the next

input symbol is available, all ready TP's begin their

tasks. The input symbol does not change while any

processor is active. Input symbol -=goal ?

top of stk ->cur. node

any arcs to process^

fetch data exit node

get a new TP pop stack

set next goal empty? report success copy old stk; link to partial link to new TP top of stk -> cur. node release

FINISH

Figure 4*5 Token Processor Syntactic Recognition When a Token Processor is activated, it first compares the input symbol to its Goal Symbol. If they don't match, the processor fails immediately.

If the symbols match, the TP accesses the top node on its node stack; this is the node to which this TP has now successfully extended its partial path. This node is used as an index into the Language Definition. The data available for this node is a set of zero or more packets, each consisting of a goal symbol and a Partial Node Stack.

Each packet represents a terminal arc which may be appended to the current path. The top node of the Partial Node

Stack is the destination node for this arc. The data also specifies whether the current node is an exit node.

For each packet, if any, the TP now does the following

a) Secures a new TP for a future task. We will not specify a mechanism for doing this. However, any TP which does not already have a next task may be selected, as all will complete their present task with the current symbol.

The best candidate is the current TP itself.

b) Send the Goal Symbol from the packet to the new TP.

c) Create a new stack by copying the current node stack, minus the top entry. Copy the partial Node Stack from the packet on top of this stack. Attach the complete new stack to the new TP. 68

When all packets have been processed, if the node was not an exit node, the TP is done.

If the node was.an exit node, the top entry is now popped off the Node Stack. If the stack is empty we have reached the final node. The TP reports success and terminates.

If the stack was not empty, the new top becomes the current node. This node is used to access new data from the Language Definition. The entire process repeats as necessary.

For illustration, consider again the valid path described above which generated the string abacegaz in the grammar of Figure 4.2. Suppose we have scanned the first three symbols, reaching node V2. We then have a Return Node Stack of (A2,S4) as noted above.

In graph V there is one arc leaving node V2, and thus there is one packet in the tables for this node0 This packet contains the goal symbol d and the single node

V3. For this data a TP is set up, and assigned symbol id and node stack (V3,A2,S4). If a d is found, V3 will become the current node of a new partial path while

(A2,S4) will again be the Return Node Stack.

Node V2 is an exit node, so the operation continues.

To set up more paths we consider the current node to be

A2 (the top of the stack) and the RNS to be (S4). In 69

the tables there is one packet for node A2; it contains goal symbol c and partial stack (A3). A second TP is

set up for this symbol. If a c is found (as it will be),

A3 will become the current node and (54) will be the RNS.

Finally, A2 is not an exit node, so the operation is done.

In an implementation, it would be possible to set up the new tasks in storage for virtual TP's. A smaller number of physical units could then do the work, if the full number was not available.

4.5 The Execution Tree

The goal of analysis is both to recognize a valid sentence and to extract the semantics of that sentence so it can be acted upon (executed). For this purpose the Analysis Section builds a data structure which

represents the action to be taken. This structure is the means of communication from the Analysis Section to the

Execution Section. We now describe the form of this structure, which is called the Execution Tree.

Definition 4.40. An Execution Tree is a data structure consisting of a set of cells organized in the form of a rooted tree. Each cell contains the following information components: a Status Flag, a Value component, a Function coae, and zero or more Operands.

The action of the Execution Section will be composed of a collection of asynchronous steps. Each step performs a function on a set of operands and produces a result. 70

The operands and results are units of information, and the function is a transformation on these units. The nature of the units or functions is not further specified. The

Execution Tree is a rooted tree structure of nodes or cells, each corresponding to an execution step. The form of this tree is illustrated in Figure 4.6.

The Status Flag in each cell is normally zero. It may be set to other values to indicate that the cell is being evaluated or to indicate other special information.

The value component at certain times contains a value representing the cell and the subtree for which it is the root. These components are discussed further in section

4.7. The Function Component identifies in most cases a function or operand to be performed. Generally, this function takes a fixed number of operands. The remaining componenets, if any, are an ordered list of the operands for the step. An operand may be either a primitive value or a subtree; if a tree, then that tree must be evaluated

(or executed) to produce the value for the operand. During execution the system starts at the root cell and tries to evaluate all the operands independently. tohen they all have values the function is applied. Its result then becomes the value for the cell, and for the subtree of which it is a root.

Associated with an Execution Tree is a set of data structures called Symbol Tables, 71

Root

f la^ V6lue Cell

Function

6

Valtie

Figure 4.6 An Execution Tree 72

Definition 4,41. A Symbol Table is a list of Information Units, each specifying an association between a character string and an interpretation for that string.

A special function code in a cell of an Execution

Tree is used to associate that cell with a symbol table.

This function takes two operands: a pointer to a symbol table and a pointer to a descendant cell. These special cells may appear at any point within the tree. The interpretation is that whenever execution is proceeding within the subtree linked to that cell, the information in this symbol is included in the active environment.

In Figure 4.6 the F's are functions and the v's are primitive values. The F's and v's may all be distinct or any of them may be identical. This does not affect the model. Cell 5 contains a function of no operands; Cell 3 has four operands.

We will allow (but not require) identical cells or subtrees to be merged in certain cases, although the result is not strictly a tree. This corresponds to the fact that a cell may need to be evaluated only once, although its result is used by several higher cells, if its own operands do not vary between successive uses.

In Figure 4.6, Cell 7 forms an operand subtree for both

Cell 3 and Cell 4.

In what follows we will normally use the term "Tree" to mean an execution tree as defined above. In depicting trees, the Status Flag and Value Cell will usually be omitted. 73

4.6 Semantic Processing

With the form of the Execution Tree established, we can consider the semantic aspects of the analysis process.

The goal of the analyzer is to build a tree which suitably represents the sentence that will be recognized. This is done by a series of primitive semantic steps, each of which adds a part to the developing tree. These semantic steps build the structure; they should not be confused with the execution steps which will then act upon it.

The tree is initially null. It grows through semantic steps which create new cells and place entries into cells.

We will assume that these steps take place in a well- ordered sequence. At a given time, several trees may be under construction independently. Those that survive will be combined in the final tree.

Modifications are normally made at a terminal cell or leaf of the tree, by inserting an operand or attaching a new cell. An operand to be inserted may be a value; it may also be a complete subtree that has been formed indepen­ dently. It is also required in some cases $o attach a new cell at the root; the new cell becomes the new root and the entire existing tree is attached to it. These are the only allowable modifications. There will be no radical changes to subtrees that have already been built. 74

At any given time only one leaf cell is a candidate for modification. We will need two pointers to manage the tree structure:

Definition 4.42. The Root Pointer (RP) is a pointer to the root cell of the tree being constructed.

Definition 4.43. The Leaf Pointer (LP) is a pointer to the single leaf cell, and component within that cell, where modification is currently permitted.

We now present four primitive steps to be performed on the tree. All modifications of the tree will be composed of these four steps.

a) Add Root Cell. Create a cell and insert a function component. Insert the Root Pointer as operand 1. The pointer to this cell becomes the new Root Pointer. The

Leaf Pointer now points to operand 2 in this cell.

b) Add Leaf Cell. Create a cell and insert a function component. Get the value currently pointed to by the Leaf

Pointer; insert this as operand 1, In place of this value put a pointer to the new cell. Set the Leaf Pointer to point to operand 1 of the new cell.

c) Store in Leaf. Place value in the component pointed to by the Leaf Pointer.

d) Advance Leaf Pointer. Advance the Leaf Pointer to point to the next component in the same cell. We assume a cell has as many components as necessary. 75 RP RP

LP LP

a) original b) advance LP, store

RP RP

LP

Tlp

c) add root cell d) add leaf cell

Figure 4.7 Primitive Steps in Tree Building 76

The effect of these primitives is illustrated in

Figure 4.7. There will also be semantic primitives that do not affect the tree directly, but that affect the selection of succeeding steps. These primitives may access separate data structures and may even force the entire recognition process to fail for semantic reasons. These situations will be described shortly.

We can now reconsider the model for analysis and the

Token Processors, with semantic processing included. With every arc of a graph we now associate a Semantic Sequence as well as a symbol.

Definition 4.44. A Semantic Sequence is a sequence, possibly null, of primitive operations on an execution tree.

For terminal arcs the sequence is performed when the arc is appended to a partial path. For non-terminal arcs the sequence is performed when a complete valid path substituted for the arc has been appended. The sequence is generalized to have the form of a (structured) program, with loops and alternation permitted.

The semantic primitives operate on a set of data which is associated with the particular partial path being processed. This data has three main parts:

Definition 4.45. The Local Result (LR) is either a value derived from the single terminal just appended, or a Root Pointer to a subtree just built from another graph. 77

Definition 4.46. The Cumulative Result (CR) is the partial subtree derived within the current graph as of the current node.

Definition 4.47. The Auxiliary Data (AD) is a set of information used and modified by semantic primitives to affect the flow of control. This set includes in particular the Leaf Pointer for the current subtree. Its further contents are not specified.

Each Semantic Primitive takes this set of data as input and may modify the Cumulative Result and the Auxiliary

Data. It also may modify the subtree in memory as pointed to by the original CR. Some primitives may affect the selection of the next primitive, and some may force the current partial path to fhil.

With this information we can now provide a new definition for the "state" of a path:

Definition 4.48. The Complete State of a partial path is specified by the following information: a) The Syntactic State (Def. 4.39); b) For each node in the Syntactic State, an associated Cumulative Result, Auxiliary Data, and Semantic Sequence; c) A Local Result.

The process of finding a valid path through a particular graph is also the process of building a tree.

The tree will always be pointed to by the current CR.

The tree for the graph starts as a null tree at the entry node, and is independent of any tree known to a larger 78 path of which this path will become a part. The act of replacing a non-terminal arc with a valid path through another graph, now implies attaching a complete subtree, built within that other graph, to a leaf of our current tree.

We can now present the operation of a complete Token

Processor, with semantic processing included. A flowchart for the full process is given in Figure 4.8.

The process begins as before. Tne active TP compares the input symbol to its goal, and fails if they do not match. If they do match, a value associated with this symbol by the Language Definition or the process to date, is taken as the current Local Result.

The top node on the Node Stack then becomes the current node. In addition, semantic data is fetched from the top entry of the stack. This data includes the CR,

AD, and a Semantic Sequence. The processor then executes the Semantic Sequence, producing a modified CR and AD.

This corresponds to extension of the partial path for the input symbol.

The semantic sequence may signal the TP to fail. If so, it stops operation and releases itself. Otherwise the

Language Definition is accessed and each packet processed as before. When building the stack, each entry includes a node, a semantic sequence, and a CR and AD, 79 START

nput symbol =goal?

F(goal) to LR

top of stack cur. node

fetch seman. data

perform sem. sequence

Fail?

ny arc to roces

fetch data pop stack get a new TP report empty? success; set next CR result goal top of stack copy old stk; -» cur. node release insert CR CR -> LR link to partial link to new TP ( FIN1SH )

Figure 4.8 TP Flow with Semantic Processing The node and semantic sequence are specified in the definition packet. Recall that the Partial Node Stack

always contains at least one entry, the next node in the current graph. If it has more than one entry, then the bottom entry is in the current graph, while other entries

represent starting points for subgraphs to be processed.

For this reason, each of these higher entries receives an

initially null CR and AD. The bottom entry receives the current CR and AD from this process.

When all the arcs for the current node are processed

the TP is finished, unless the node is an exit node. If

so, the stack is popped to the next entry. Unless the stack is empty, we then fetch the information from the new top entry and repeat the process. At this point, the old CR becomes the new LR. This is now a subtree which may be inserted intact in a tree at the higher level.

If the stack is empty, the TP reports success. At

this point, the CR contains the root pointer for the

complete Execution Tree for the sentence which has been

recognized. This pointer will be delivered to the Execut

Section, which will interpret the tree.

4.7 Execution

The Execution Section receives the Execution Tree

and interprets it, producing the results desired by the

program. We now develop a model for this section. 81

The Execution Section consists of a number of indepen­ dent processors in two classes. The first class contains a large number of identical processors called Operand

Evaluators (OE's). The job of these units is to prepare a function for execution by causing all of its operands to be evaluated. The second class consists of dissimilar units termed Execution Processors (EP's). Each EP is capable of performing a particular function on a set of operands, corresponding to a function code in the Execution

Tree. We assume that for every such function code, there is at least one EP which can process it.

Execution is begun by activating an OE and supplying it with a pointer to the root of the tree. "Executing" the program for the tree is equivalent to evaluating this root cell. A cell is evaluated by first evaluating its operands, then applying the function specified in the cell to produce a result. Evaluating the operands is controlled by an

Operand Evaluator. Performing the function itself is the job of an Execution Processor.

The task of the OE is to evaluate the operands. If they are values already, the job is done. Those that are subtrees must be converted to values to proceed. A subtree is evaluated by invoking another OE to start the evaluation of its root cell; the result eventually produced is then the value of the operand. 82

It is thus necessary for OE's to call others and for them to complete. When all these other OE's have caused values to be produced, the OE then invokes a suitable EP for its own process, to obtain a value and complete its task. We assume an unlimited number of

OE's, so there will always be enough available for subtasks.

If this is not the case then the OE's must be able to stack their jobs in some manner, saving partial data at one level to process a job at a lower level. We do not consider this here. The number of OE's must at least equal the longest chain in the tree at any time to prevent deadlock•

For acquiring results, the means of communication is the Execution Tree itself. When a given cell, which may be the root of a subtree, has been evaluated, the value is placed in the Value Component of the cell and the status is set to indicate that a value is available. If this value is certain not to change it may remain in the cell; otherwise it will be deleted after it is accessed.

The status flag is also used to indicate that a subtree is in the process of evaluation, and a second evaluation should not be started.

The normal operation of an OE, then, has three phases: a) Check the operands and start other OE's for any operand subtree which is unevaluated; b) Monitor the root cells 83

Get first operand

all check results next ^eadyZ

V ______wait for get an EP value? results for function

V start the EP set flag

start a new OE for this (finish ) cell

Figure 4.9 Operand Evaluator Basic Flowchart 84 of these subtrees until values are returned; c) Locate an

EP for the proper function and start it on the current cell. A flowchart for this activity is given in Figure 4.9.

The flowchart describes the operation of an OE in

the normal case. One or more operands are evaluated

simultaneously, invoking further OE's for the purpose.

After evaluation, a separate EP is called to apply the

function, and the OE is then done. In addition to such normal processing, there are cases in which the OE itself must recognize special functions and apply them directly,

or take other special actions. The OE must recognize three

special functions: LIST, SEQUENCE, and COND.

LIST is simply a dummy function. The OE evaluates all

the operands, then returns the first one as a result. No

EP is used. SEQUENCE is similar, but now the operands are

evaluated completely in the order given, waiting for each

result before evaluating the next. This is the means of

introducing sequential processing into the basically parallel

execution tree.

COND is a primitive conditional function with three

operands. The first operand is evaluated first, returning

a Boolean value. If the value is TRUE the second operand

is then evaluated, and its value is the result. If the

value is FALSE the third operand is taken instead. 85

An additional special problem which must be dealt with by the OE's is branching. Ordinarily, execution of a tree continues in a standard fashion, with each step beginning when the needed operands are available, or when the previous step in a sequence is complete. In branching the sequence is disrupted; normal processing must be aborted and must resume at a different point in the tree. The branching mechanism for our Execution Section works as follows. A function type GOTO is provided, which returns a special value. This value is a pointer to a destination in the tree, and includes a flag identifying it as a branch rather than a normal value. When an OE receives a "branch value" as a result of evaluating some operand, it suspends further activity and checks to see if the value is a pointer to one of its own operands. If not, it immediately passes on the branch value as its own result and terminates. If so, it resumes operation by evaluating the operand pointed to by the branch.

There are two immediate observations on this method of branching. First, the process is meaningful only if the destination is a component of a SEQUENCE cell. Second, branching will only succeed if the destination can be found by moving directly upward in the tree. This is not a completely general mechanism. However, in practice both of these restrictions are manageable and in fact are 86 useful in enforcing restrictions of most common languages.

These languages organize code into modules, blocks, or other possibly nested groups, and do not allow arbitrary branches into the middle of these groups. Branches within a group, or to a higher level, are freely allowed, and this is exactly what our mechanism provides.

When a non-special function occurs, an Execution

Processor is called upon to apply it. Each EP is responsible for applying a function to a set of operands. The function is a transformation which produces a value to be stored in the cell being processed. In addition to producing a result value, a function in general causes Side Effects.

These are modifications to the environment that occur when the function is performed. Side effects consist of storage modification and input/output. Storage modification involves accesses to working storage used for program processing; this is done by most processors. Input/output involves acquiring units of data from input channels and sending units of data to output channels. There are an unspecified number of such channels available to the Execution Section.

These side effects as a body comprise the normally desired effects of program execution.

Our model does not limit or specify either the nature of the functions provided or the manner of evaluation. It is permissible that different EP's may use widely different 87 processing methods. We expect, in fact, that an imple­ mentation would use a variable mix of hardware, firmware, and software on an assortment of dedicated or shared processors. More complex functions peculiar to a language would certainly require software which may be provided as part of the Language Definition.

We assume that there is at least one EP capable of performing any function required. For more common functions, a number of similar EP's may be provided.

In most cases, the EP is responsible for returning a result value to the cell for which it was called. When a result has been returned in the root node of the Execution

Tree, execution is complete. All OE's and EP's are necessarily inactive at this point. No final actions are required, as all side effects have been performed. The

Section is available to accept another Execution Tree from the Analysis Section.

We have said that the manner of evaluation of functions is not specified in the model. However, in some cases the function designator may represent a process which actually will require several EP's for execution. In these cases a process of decomposition is applied to the function.

Definition 4.49. Decomposition is the process of substituting a subtree based on a model for a single function cell prior to evaluating the cell. A function which yields to decomposition is a Decomposable function. 88

A function which is decomposable implicitly represents a subtree whose form is known to the system. Decomposition is handled by the Operand Evaluators. When a decomposable function is present, the OE first evaluates its operands as usual; it then elaborates the function by replacing the cell with a copy of the subtree model into which the operands are inserted. The OE then begins again to process the root cell of the resulting structure. The effect of decomposition is illustrated in Figure 4.10. 4.10a shows the original tree; cell 3 is being processed and contains a decomposable function. 4.10b is the model for the function.

4.10c shows the tree after operand evaluation but before decomposition. 4.1Od is the tree after decomposition; evaluation is now proceeding as before from cell 3.

Decomposition may occur repeatedly at any level of evaluation.

A revised flowchart for the Operand Evaluators, including special functions and decomposition, is given in Figure 4.11.

In performing "side effects" and processing working storage, the EP's make extensive use of a common, large associative memory. Our model assumes that processors access this memory through a Storage Control Unit which can provide high level activities such as space allocation,

table lookup, queue management, data formatting, etc. In

Appendix B we consider the form of a controller providing

suitable features. 89

2 r, a b c d 2

o1 0 2 2 6 3,

a) Initial State b) Model for F.

d G 1

c) Operands evaluated d) after Decomposition

Figure 4.10 Function Decomposition 90

( START )

^ C O N D ^ n . OE2 function El

^ a l l \ operands started.

start next intrinsic function wait for SEQUENCE results decom-^v wait for posable? result

OE2 start branch? an EP

Return value Decompose

Restart at dest OE1 ( f i n i s h )

Figure 4.11 Operand Evaluator Complete Flowchart START

Eval ih Ci™lJ operand

Copy Model

wait for result Replace cur. cell

Result insert = TRUE operands

Eval #2 Eval #3 restart operand operand cur. cell

wait for result ( END )

END

Process Conditional Decompose

Figure 4.11 (continued) 92

4.8 Extensions to the Model

This section considers several extensions which can usefully be made to the basic model. Some of these extensions will be necessary for a physical implementation for Real Programming Languages.

The first extension we will need to make is to expand the concept of a "terminal" or Token in the model.

So far a token has always been a single symbol in the input alphabet. We will now allow a token to be a string of symbols. The goal of the Token Processor then becomes recognition of the entire string before extending its partial path.

With this extension it is no longer necessary that a

TP perform its complete task before the input symbol advances. Instead a TP can be in a "pending" state, where the input so far matches the goal token so far, but the outcome is not certain until more input is seen. We cannot assume that each TP will be available for a new task on the next symbol. They must signal readiness when certain of success or failure.

The token string which a TP is trying to match may be a unique string, with a model available for simple comparison. However, we will also allow the string to be a Token Class. This is a set of distinct strings, any one of which will satisfy the matching goal. The strings in the set are well-specified, but we do not restrict the 93 manner of their specification. The TP will now be required to invoke a special subunit termed a Token Recognizer after each input symbol, to determine if a string in the specified class has been recognized. The algorithm for the Token

Recognizer is not specified in the model.

So far we have made no restrictions on the set of strings in a single Token class. In particular one string may be a of another string in the class. In this case there is uncertainty. Should the shorter string be accepted, or should we wait to see if the longer string will occur? One solution would be to have the TP both accept the shorter string and keep looking fora longer one, launching appropriate path extensions for each case.

For our model, however, we will assume that in all cases, the longest string which can be matched in the class is the one desired. This is indeed the situation in virtually all Programming Languages. To decide when the longest string has been matched, we will allow the TP to "look ahead" one symbol. If the next symbol will not match a longer string, then the current string is taken.

When the Goal Symbol was recognized in the basic model, a simple mapping obtained a suitable LR to be used for semantic processing. We now charge the Token

Recognizer with providing the LR for the string recognized.

It must, of course, carry the whole semantic content of 94

the particular string,, The LR may simply be a pointer to

a copy of the token in its original form, or it may carry

further information digested from that token.

The model has assumed that the number of T P ’s is

unlimited, and is unconcerned with their efficiency of use. Several extensions can be made to reduce the number of TP's required at a cost of added complexity in the

individual units.

In the Basic TP model, the successful TP always

starts another for every possible arc from its new node.

Many of these paths will fail immediately since the input

symbol will not match the goal. With symbol lookahead,

the original TP might be able to pretest the next symbol

against the new goals, and only start TP's that might succeed.

With only single-character tokens, in fact, the comparison would be easy, and a new TP would be required only when a new path extension is definitely valid.

With string tokens and token classes, such checking

is still possible but more complicated. We must now ask whether the next single character is compatible with any

tokens in the goal set, or not. While this could

indeed reduce the number of TP's needed, we will not pursue

it in our implementation. However, we will return to this

extension again when we consider bounds on the number of

TP's required in Appendix A. 95

Another possible extension would allow a single TP to

watch for a set of distinct tokens, only one of which could

possibly occur. Each token in the set would be associated

with its own separate node stack to be used if the token is

matched. We would then require separate TP's only when there

were separate stacks (paths) for the same goal token. One

TP would do the work of many, and the number of TP's would

be reduced.

Again this would be quite straightforward for single-

character tokens. For token strings we an analogy in

programming languages such as FORTRAN, where a statement may be introduced by one of many keywords. The question is

whether to provide a TP for each possible keyword or only

one for the set. The savings in the number of TP's is

clear, but the complexity of each is increased and the

parallelism is reduced. In our implementation we will assume

a separate TP for each possible path.

Our model has been developed based on languages which

are described by a somewhat restricted type of context-

free grammar. Actual Programming Languages do not always

conform to this assumption, and we would like to include

languages that have at least limited anomalies from being

"purely" context-free. Rather than extend the language

class, we have already included "escape" mechanisms to

handle such anomalies. Chief among these are the unspec­

ified form of the Token Recognizer (above) and the provision 96 for semantic primitives which maintain unspecified data structures and can cause recognition failure. We will see in later chapters that these methods will suffice to model some real languages.

Other extensions which can be considered would have the effect of extending the boundaries between the model and its environment, or of modifying some assumptions that have been made about unlimited resources. Topics that might reason­ ably be covered in an expanded model include:

a) Preprocessing that may be applied to the input stream to deal with extraneous symbols such as line endings, blanks, comments, etc.

b) Structure of the Storage System and methods of access to it.

c) Provisions for releasing storage that is no longer required.

d) Interconnection methods for communication between processors.

e) Allocation of tasks among a limited number of processors, avoiding deadlock.

f) Methods to measure and optimize the system perform­ ance, including resources used and time expended.

All of these issues and more must be dealt with in a practical system. However, we take the view that they raise 97 problems best solved in the context of a particular imple­ mentation. To include them in the abstract model might needlessly restrict the model when different circumstances affect various implementations. All of these subjects are dealt with in the detailed implementation in Chapter V.

Storage Control is discussed and modeled in Appendix B.

The model on which we base our system, then, includes an Analysis section with Token Processors and a Control

Unit; an Execution section with Operand Evaluators and

Execution Processors; and the Execution Tree as the means of communication. The Token Processors detect and respond to tokens in an input stream. These tokens may be single symbols, strings, or members of a Token Class. The TP's can see the current symbol and the next symbol at any time. However, they do only limited interpretation of the next symbol.

4.9 Summary

This chapter has developed an abstract model for a class of languages and a system for processing strings in these languages. First we specified the language class of interest as a subset of context-free languages. A method for representing grammars as sets of directed graphs was developed. Included was classification of arcs, nodes, and graph transformations. We defined partial and complete paths through graphs and the concept of a node stack, associated with the steps of parsing a string. 98

We then presented a model of a mechanism for recognizing valid strings in this model. This mechanism includes Token

Processors, an Analysis Control Unit, and a Language

Definition. The model was extended to extract the semantic content of the string recognized, producing a data structure called the Execution Tree. We described the representation of semantics in the language model, and the steps performed by TP's in building the tree.

A model for an Execution system was then developed.

This system interprets the tree to perform the actions of processing and input/output which comprise program execution.

The components of this system, Operand Evaluators and

Execution Processors, were described.

Finally, possible extensions to the model were consid­ ered. The main extension is one which generalizes tokens as strings and token classes. CHAPTER V

PROCESSOR FOR A SMALL LANGUAGE

This chapter develops the design and implementation of a language processing system able to process one specific, non-trivial language. Section 5.1 defines the language used, a simple language but one capable of expressing a meaningful program. A full representation of this language through Language Description Tables is presented. Section

5.2 develops a register-level design for the Analysis Section of a processor sufficient to handle the language. Section

5.3 continues this design with the Execution Section.

Section 5.4 then describes in detail the operation of the machine in processing sample program segments in the language. Section 5.5 presents a software simulation for the machine and discusses measures on its performance.

Finally, Section 5.6 is a summary of the chapter.

5.1 Language Definition and Representation

This section presents the definition for a Small

LANGuage (SLANG) which will be used to illustrate the operation of the language processor. SLANG is a complete language which can be used to express meaningful algorithms.

It is far less complex than any language in practical use,

99 100 but it gives rise to many of the problems of interpretation that will arise in more powerful languages.

We will follow the notations and terminology introduced in Chapter IV. The basic syntactic elements which make up a SLANG program are fixed tokens and token classes. There are two variable-length token classes:

NAME (n): a string of letters and digits, in which the first character must be a letter. The string may be from 1 to 15 characters long.

INTEGER (i): a string of digits, from 1 to 8 characters in length.

These token classes will follow the longest-match rule. The grammar will ensure, for example, that an integer cannot be followed directly by a different token starting with a digit.

Besides the token classes, SLANG has a set of fixed tokens which we will often call keywords. These include the character strings READ, WRITE, DCL, ARRAY, IF, and END, and the single characters (, ), +, -, *, /, >, =, :,

;, and comma.

The input alphabet is assumed to be standard ASCII character codes. The normal meaning of the terms letters and digits applies. Only upper case letters are accepted.

The characters Return (end-of-line), Blank, and Tab may occur anywhere in the program and will be ignored by the grammar. 101

Detection of each token produces a Local Result (Def.

4.45). For token classes, this result is an index to the string itself in a string table. Note that we do not evaluate integers when first detected, but leave this task for the Execution Section. For keywords, a suitable result is obtained from a table. This result may associate an execution function with the symbol; for example, the keyword + yields the function code ADD as a result. A mechanism for recognizing these tokens and generating the result will be described in the next section.

Using these tokens as primitives, we will now define the syntactic structure of SLANG. There are eleven

Grammar-Variables in the SLANG grammar. Each is represented by a graph in the graph system. The full graph system is shown in Figure 5.1. The basic format for these graphs was introduced in Section 4.2. In the SLANG graphs the grammar-variables are named by upper case letter strings, and these names appear on non-terminal arcs. Terminal arcs contain token class designators

(n or i), or keywords. Since some keywords are also upper case letter strings, they are quoted to avoid confusion.

In these graphs we have used two notational extensions to simplify the appearance. They are both illustrated in the graph STM. First, arc STM3-4 is shown with a stack of six items rather than one. This should be viewed I STM PROG:

STM

DCL ADCL IFS READ WRITE STM: ASG A 5

READ: J 'READ' 2 3

/'WRITE' a 3 4 WRITE: ->o- ^o-

IF a 3 VAR b" VAR G 7 IFS: _^o------> ° ~ >>

Figure 5.1 Graph Syntax for SLANG 103

DCL:

ADCL: ARRAY

/ VAR 2 * 3 EXP 4 ASG: •------

EXP: TERM

TERM; EXP

VAR

n 2 3 i VAR: -*§>■ ---- -7®

Figure 5.1 (continued) 104 as equivalent to six arcs from STM3 to STM4, one for each item. Second, we have shown an arc from STM1 to STM3 containing the special symbol (the null symbol). The interpretation of this arc is that all arcs leaving node STM3 should have duplicates leaving node STM1; i.e., the set of arcs STM3-4 should be matched by a duplicate set from STM1 to STM4.

This grammar defines a complete program (sentence) in the language. PROG is the goal symbol, and PROG4 is the unique final node, N .. . Informally, it can be f m seen that a program consists of a series of statements separated by semicolons. The statements resemble familiar program language statements, and could be dialects of

FORTRAN or PL/l. For this reason we can guess their intended meanings, and undoubtedly our guesses would be correct. Formally, however, we have established nothing about the meaning or semantics of our language; the graphs have defined only its syntax or form.

To establish a suitable semantic definition we must proceed from our informal understanding of the desired effect of each statement to a specification of the form of the data structure (Execution Tree and Symbol Table) to be transmitted to the Execution section for any given program. First, we describe informally the action to be taken for each GV. 105

PROG: Execute the statements in the program. The statements will be executed sequentially and not overlapped. Statements will normally be taken in the order given, but certain statements may force a change in sequence (branching).

STM: Execute the indicated statement. If a label is given, store it in the symbol table, associated with the identity of this statement for possible branching.

READ: Accept input from the channel designated by the integer in parentheses. Input is assumed to consist of integer numbers, one per line. Store the input values, one at a time, in successive variables in the list.

WRITE: Generate output on the specified channel. The output is integer values from the successive variables in the list. Output will be issued as digit strings, one per line.

IFS: Perform the logical comparison indicated between two variables. If the condition is satisfied, branch to the statement whose label matches the indicated name.

DCL: Define a list of names as representing integer variables. Reserve storage for them.

ADCL: Define a list of names as representing one­ dimensional integer arrays of the size specified. Reserve storage for them.

ASG: Evaluate the expression and store its value in the location reserved for the stated variable.

EXP: Evaluate the expression by combining terms according to established arithmetic rules. Use left-to-right precedence. Assign multiply (*) and divide (/) higher priority than add .(+) and subtract (-).

TERM: Determine the value of an integer or variable, or evaluate the expression in parentheses.

VAR: Make available the present value, or the location for storage, of the designated simple variable or array element. 106

The above paragraphs present our somewhat arbitrary decisions on how we want programs in this language to perform. We have decided to have only integer values and integer arithmetic. The external input/output format is completely arbitrary and will be the province of a particular execution processor; in another language this processor might be driven by extensive formatting statements.

In this invariable grammar we could have built operator precedence into the syntactic structure, but we have not done so. This is in anticipation of other languages where this precedence may be dynamically variable.

Based on these descriptions we will next present prototype forms for the execution tree or subtree, and symbol table entries, generated by each GV, In so doing we make some further decisions about the exact capabilities of SLANG, and the division of labor between the analysis and execution sections.

The prototypes for each grammar-variable are shown in Figure 5.2. We consider them each in order. PROG defines a series of subtrees to be (normally) evaluated sequentially. Accordingly, it generates a cell with a

SEQUENCE function. The remaining components of the cell are an unspecified number of pointers, each specifying an STM subtree to be evaluated. PROG t SEQ

STM

STM: READ/WRITE/IFS/ •LAB' d c l /a d c l /a s g

READ: SEQ

RSET READ

VAR

WRITE SEQ

\ WSET i WRITE

VAR

IFS: COND

REL GOTO

VAR VAR Figure 5.2 Tree Cell Prototypes for SLANG LIST DCL:

DCL DCL

ADCL: LIST

ADCL ADCL

ASG ASG:

VAR EXP

OP EXP: TERM ITERM

TERM: EXP/VAR/i

VAR: EVAR n or n

Figure 5.2 (continued) 109

The STM graph does not generate a cell for the tree,

simply passing that role on to the various categories of

statement. It may, however, generate a symbol table entry

for the label. This entry has three components: name,

"label" type designator, and a pointer to the statement.

The pointer actually identifies the immediate parent

cell (in this case PROG) and its pointer to STM. It will be used to reset the evaluation of the parent cell

upon branching.

READ and WRITE have similar structures. We view them

as a sequence of activities; the first activity is to set up

the channel on which transfer will be performed. Each

item in sequence after this reads or writes a single

integer between that channel and the specified variable.

DCL and ADCL are likewise similar. Note first that

it would be possible for the analysis section to actually

execute these functions when first encountered, building

a symbol table entry and also allocating storage since all

storage is static. We have chosen not to do this, but to

pass declarations on like any other executable form to the

Execution Section. Accordingly, they do not affect the

symbol table, but will produce cells in the tree. Each

statement specifies a list of declarations, and there is

no reason that the items in the list cannot be declared 110

in parallel or in any convenient order. Thus we generate

a LIST cell with an unspecified number of pointers to cells with the Declare (DCL) and Array-declare (ADCL) functions.

IFS generates a conditional (COND) function cell with

three operands. The first operand points to a subtree for

the relational expression which must evaluate to TRUE or

FALSE. Operand 2 is the TRUE alternative and specifies a branch (GOTO) function. This function has as operand the given name, assumed to be a label name, and will cause a suitable EP to look up the label and make the necessary adjustment in the evaluation sequence. Operand 3 is the

FALSE alternative and is null.

ASG generates a cell with the Assign (ASG) function.

The'first operand defines the variable into which a value must be stored. The second points to an expression subtree to be evaluated.

EXP and TERM will produce for a complete arithmetic expression the familiar tree representation. The manner of incorporating precedence is discussed below.

VAR will produce a simple name for a simple (unsub­ scripted) variable. If a subscript is given it will produce a cell with the Evaluate-array (EVAR) function.

This function will invoke an EP to retrieve the particular array element value and location. 111

We now have both a syntax and a semantic specification for our language. We have enough information to determine the sequence of primitive operations for any valid program, and the precise output produced for any valid input.

We have not defined what the system should do if the program or the data input is invalid. For this chapter, at least, we assume that it will simply stop. It will also stop after one complete valid program has been processed•

We have also said nothing about the internal represent­ ation of data values during execution. We do not need to consider this while defining the language from an external point of view.

We have specified the abstract form of the intermediate representation for any program, and this has identified the specific capabilities needed in the Execution Processors for this system. A summary of the required EP's is given in Table 5.1. This list does not include SEQUENCE, LIST, and COND which are processed by the Operand Evaluators.

Figure 5.3 displays a complete sample program in

SLANG. This program reads in an integer N and computes

N-factorial. It then prints out N and the result.

Figure 5.4 presents the complete tree and symbol table for the factorial program at the end of analysis.

We know the form of the tree desired for any grammar- variable; we still must consider how to generate it. This 112

Table 5.1 Execution Processor Functions for SLANG

FUNCTIONOPERANDS ACTION

RSET 1 Set up a channel for reading

READ 1 Read integer from the channel

WSET 1 Set up a channel for writing

WRITE 1 Write integer to the channel

ASG 2 Store value B in location A

DCL 1 Enter simple variable name in symbol table, reserve storage

ADCL 2 Enter array name A, of size B, in table, and reserve storage

EVAR 2 Look up and fetch element B of array A

GOTO 1 Cause a branch to the named label

ADD 2 Add A + B

SUB 2 Subtract A - 8

MPY 2 Multiply A * B

DIV 2 Integer divide A/B

GT 2 Test A > B; return Boolean result

EQ 2 Test A = B

LT 2 Test A < B 113

0 0 1 DCL K,X,N; 0 0 2 K=0; X=»1 ; 003 READ (1) N; 004 LOOP: K-K+1 ; 005 X=X*K; 006 IF (K

Figure 5,3 Factorial Program in SLANG

will lead to certain modifications in the graphs en. route to specification of the complete Language Definition Tables.

The creation of the tree must be specified as a sequence of semantic primitives, attached ta the graphs at convenient points when the needed data results are available.

In Chapter IV we specified four semantic primitives for building the tree. These primitives are:

1. Add a cell at the root;

2. Add a cell at the leaf pointer;

3. Store an operand at the leaf pointer;

4. Advance the leaf pointer.

To complete the set for SLANG we need three more primitives: 114

SEQ

o LIST £ © I ASG •k ' \/ DCL V DCL V] ADD 'K' •v

DCL V

© |ASG x' \/ t t © ASG K 0 MPY ‘X’*K

© |COND 0 ASG V V © \, ^ I L T k' 'N' GOTO •LOOP*

© SEQ SEQ • • •

WSETl '2' WRITE RSET READ WRITE V

c< SYMBOL TABLE: 'LOOP' lab 0

Figure 5.4 Data Structure for Factorial Program 115

5. Test the priority of the result item (operator). This will be done by checking a priority table in the language definition. This is compared with the priority of the previous operator, carried in the Auxiliary Data for each path. The test will update the AD and may cause a branch in the semantic sequence. If the operator has higher priority, it will be inserted as a cell at the root. Otherwise, it will be inserted as a cell at the leaf.

6 . Store Label. The result item (label name) will be stored in a new entry in the symbol table, with the "label" designator and a pointer to the parent cell.

7. Branch unconditionally in the semantic sequence.

It may be noted that we have a set of semantic primitives not unlike instructions for a special-purpose

CPU. This is indeed one mechanism by which they may be carried out.

We must also observe that it was found necessary that every cell in the tree correspond to a specific grammar-variable. This is violated in a number of GV's of our grammar, specifically READ, WRITE, DCL, ADCL, and IFS which all have compound prototypes. We must now augment the grammar with several additional GV's which are needed for semantic reasons only. Every GV will then generate at most one cell in the tree. Figure 5.5 illustrates the modification required to the original graphs, and the new graphs to be added.

We can now provide the specifications of the complete Language Definition Tables for SLANG. The tables will have the following divisions: 116

READ 'READ 2 3 RSEL VAR ------

, 'WRITE* 2 ( 3 WSEL 4 ) 5 VAR WRITE :------

ADCL*: ,I 'ARRAY' * ADCEL * *

IFS1: I 'IF' 2 ( 3 LGEX 4 ) 5 GOTO fc * > > ---- >§>

DCEL: '__ 2___ADCEL: * n ^ ( 1 ^

RSEL: / i a LGEX: , VAR e > 3 VAR 4 •------*§>------.------> ©

WSEL: / i 2 GOTO: / n 2 • *§> • *§>

Figure 5.5 New and Modified Graphs for SLANG 117

MAJOR STATEWORD TABLES. An entry for every arc in the system, classed by starting node. Each entry specifies the next terminal in the path, next node, semantic sequence, and a partial node stack. The entry also designates exit nodes.

TOKEN STATEWORD TABLES. These tables give state information for recognizing the specified token classes.

KEYWORD STRINGS. Specifies the strings to be recognized as fixed tokens.

ASCII CLASS TABLE. Identifies ASCII characters in special classes, i.e., letters and digits.

SEMANTIC SEQUENCES. The sequences of semantic prim­ itives associated with each node transition.

PRIORITY TABLE. Specifies priority for certain operators.

KEYWORD VALUES. Specifies the value or Local Result to be returned for certain keywords.

TOKEN SEMANTIC SEQUENCES. Specifies semantic prim­ itives used to identify tokens.

A complete representation of the tables for SLANG is given in Appendix C. These tables are the ones actually used in the simulation to be described later in this chapter.

5.2 Analysis Section

This section develops an implementation of the Analysis

Section of a language processor which can process the language SLANG. We will start with a block diagram and proceed to develop register-level descriptions for most of the section. Later material will demonstrate the adequacy of the design. 118

5.2.1 General Features

A full block diagram for the language processor was

given in Figure 3.1. In Figure 5.6 we give an expanded

diagram for the Analysis Section. The diagram shows the

three components of the Analysis Section: Preprocessor,

Analysis Control Unit, and Token Processor Array.

Connections to the storage system are also shown. The

Storage Control Unit manages all accesses to main storage.

Its structure is discussed in Appendix B. The Language

Definition Control is functionally similar but manages

access to the physically separate, perhaps faster, store

containing the Language Definition Tables.

All of the units are shown as connected to a single master bus. This is neither the fastest nor the most

reliable method for intercommunication; but it is one

conceptually simple method and will be assumed throughout

the design. Each entity in the system has a unique

identification code and can issue messages on the bus with the sender and intended receiver specified. We will assume that our bus has enough lines to carry any

data item needed, with identification, as a single

parallel unit. An actual machine may have fewer lines

and break data into serial parts; it may have distinct

busses for separate functions; and it may utilize direct

connections and switching networks where necessary. 119

Language Definition Tables Main Storage

LANG. DEF MAIN STORAGE CONTROL CONTROL

To Execution I Token Proc. Array Section

Text

PRE- Text ANALYSIS PROCESSOR CONTROL UNIT To^ Execution External Section Control

Figure 5.6 Analysis Section Block Diagram 120

The Preprocessor receives external program text and

control signals. Its function is to "clean up" the text when possible by eliminating obviously extraneous material. For SLANG the only function of the preprocessor

is to delete line endings. We could also ask it to delete

blanks, but we will not, to show that the same function

can be handled by the Token Processors.

In a general system the Preprocessor would handle

interpreting line endings, processing line "headers" in

some languages (e.g. FORTRAN), deleting comments, etc.

It would also keep track of the original line numbers as

an aid for returning diagnostic information. None of

these functions arise in the present system. The Pre­

processor will not be considered further here.

5.2.2 The Analysis Control Unit

The Analysis Control Unit (ACU) monitors the token

processors and provides them with text data. Though

functionally simple, it is the only unit in the entire

system which may be viewed as a "central control" unit.

We will describe the ACU in detail.

The functions to be performed by the ACU may be

summarized as follows:

1. When a start signal is given, send a code to the Token Processor Array. This code will be intercepted by the first TP which will access the starting node for the grammar. It will then set up the appropriate units to begin processing. 121

2. Fetch characters from the preprocessor into two registers representing the current character (CCHAR) and the next character (NCHAR). When a new current character is available, strobe the TP array to cause units with active tasks to begin processing.

3. Monitor the TP array until no units are busy, then fetch the next character and repeat the cycle. Don't fetch more characters if no tasks are active, or if a "success" signal has been raised.

4. If no more tasks are active and there is no "success" signal, stop processing because recognition has failed.

5. If a TP signals "success," obtain the pointer to the Execution Tree. Pass this value and a signal to the Operand Evaluator Array in the Execution Section, which will begin processing.

Figure 5.7 describes the functioning of the ACU in a flowchart. It is not difficult to express these functions in a hardware design. Figure 5.8 presents a diagram of the ACU with all required signals, registers, and interconnections. Signals marked with an arrow (f) are pulses, and others are levels. Signals marked with an asterisk (*) are multi-bit data items; the number of bits is not specified.

Table 5.2 describes each of the signals involved.

Column 1 gives the signal name. Column 4 gives the type of signal: pulse, 1 -bit level, multi-bit data, or a register. Columns 2 and 3 give the unit which generates the signal and the unit which receives it, if applicable.

Abbreviations used are EXC (External Control); ACU

(Analysis Control Unit); PRE (Preprocessor); TPA (Token 122

START

Start TP *s at State &

NCHAR-> CCHAR Get NCHAR from input

Activate TP Array

Success? Send result to OE Array

Tasks DONE Active

Figure 5.7 Analysis Control Unit Flowchart PREPROCESSOR £- ^ *PADATA *PADATA tPPREQ PPRDY iue58 ACUSignal Diagram Figure5.8 O' eu pi w ANALYSIS CONTROL UNIT CONTROL ANALYSIS ^ / /^ < tJ o o pa £ P- O c o XENL CONTROL EXTERNAL TP O u, tfCCHAR *NCHAR *NCHAR *RSULT in m H V ACST(3) SUCCF(]) f ARRAY N <5 > <3 U P < H H h f \ «— c/3 o u CJ f <3 *- < <3 H H <3 Q / \ *■ c < « H H N f #AODATA fOEREQ 123 o w ARRAY 124

Table 5.2 ACU Signal Descriptions

NAME FROMTOTYPE FUNCTION

CLKEXC ACU pulse System Clock ANLGO EXC ACU pulse Start processing AFAIL ACUEXC pulse Signal recognition failed

PPREQ ACUPRE pulse request text input PPRDY PRE ACU pulse text input ready PADATA PRE ACU data input character

TPREQ ACUTPA pulse set up a TP's task TPGOACUTPA pulse start all active TP's TBUSYTPA ACU level At least one TP is busy TACTVTPA ACU level At least one TP has an active task SUCC TPA ACU pulse a TP has reached final state ATDATA ACU TPA data starting data, and characters TADATA TPA ACU data result pointers

OEREQ ACU OEA pulse set up an OE task AODATA ACU OEA data tree pointer for OE

AC ST ACU _ _ _ reg ACU internal state SUCCF ACU —_ reg success flag CCHAR ACU ------reg Current character NCHAR ACU ------reg Next Character RSULT ACU reg result pointer 125

Table 5.3 ACU Register Equations

Ta NLGO-ACSTq ^ 0 -> NCHAR 0 -> AT DAT A 0 SUCCF TTPREQ 1 — »ACST

TCLK-ACS^ =7> NCHAR -» CCHAR fPPREQ 2 -* ACST

TCLK•PPRDY•AGST2 =* PADATA —>NCHAR 3 —> ACST tclk.acst3 ^ NCHAR+CCHAR -»a t d a t a tTPGO 4 ACST

TCLK * ACST . -TBUSY • SUCCF ♦ T ACTV fAFAIL 4 0 -* ACST

TCLK•ACST 'TBUSY•SUCCF•TACTV^ 1 -* ACST 4 TCLK - ACST „ . TBUSY • SUCCF RSULT -> AODATA 4 tOEREQ 0 ACST

TSUCC 1 -* SUCCF TADATA -* RSULT 126

Processor Array); and OEA (Operand Evaluator Array).

Column 5 describes the function of the signal.

Table 5.3 presents a set of register-transfer equations to accomplish the functions of the ACU. The unit has five internal states, and the equations are straightforward.

The ACU has not been built nor has it been modelled at the logic element level. We do not claim that this is the most efficient possible design or that it may not contain timing faults. However, we expect the information would make possible a working model without major difficulty.

5.2.3 The Token Processors

We now turn our attention to the Token. Processor

Array. This array consists of a set of identical units, the Token Processors. The TP's are viewed as communicating with each other and with the rest of the system through a central bus. The function of a TP is to watch for a token in the language which, if found, would extend a valid partial path through the grammar. The abstract

Token Processor was developed and described in Sections

4.4 and 4.6, and its activities were flowcharted in

Figure 4.8. Our implementation is based on that flowchart.

An obvious question to be raised is how many TP's must be present in the array. During running of various 127 programs in SLANG on a simulation of the system, we have found no instance where more than 8 TP's were required.

In Appendix A we discuss this question in some detail, and argue informally that this is a reasonable upper limit, and that there exists such a limit for any given programming language. We have not proved this rigorously, or obtained a general algorithm for determining such an upper limit. However, we predict that perhaps 25-30

TP's are adequate for almost any language, and that by providing certain additional capabilities in the TP's we can greatly reduce this number.

Each individual TP, when active, performs the algorithm of Figure 4.8 upon receiving the start signal

TPGO from the ACU. We will partition a TP into a control section to perform the overall algorithm, and several subsections to perform specific subtasks. This division is shown in the block diagram of a single TP in Figure 5.9.

In addition to the control section, the TP contains sections to recognize the goal token, to execute semantic primitives, and to generate new tasks for other TP's.

Figure 5.9 also shows the principal signals and registers used in a TP. Table 5.4 gives a set of register equations for the TP control section. In what follows the functioning of the control section will be described as defined by these equations. 128

fTKRGO TOKEN T t k r d n RECOGNIZER

ACTV BUSY TSTAT SUCC NCHRK CCHRH LR CNOOE EXNODE SSEQ CR AD LINK LINKZ

FAIL SEMANTIC

PROCESSOR t SEMGO f SEMDN

T TSKGO DAT AO f TSKDN TASK DATAI STWRDGENERATOR

Figure 5.9 Token Processor Block Diagram 129

Table 5.4 TP Register Equations

TTPREQI*ACTV TTPRQO

1TPREQI .ACTV DATAI STAKP+TSTAT 1 -» ACTV frPGO 'TPCST1* ACTV TTKRGO 1 -* BUSY 2 TPCST DATA - y NCHRH,CCHRH 0 -* FAIL

TTKRDN -TPCST2* TK R E S ^ 0 -» BUSY 0 -* ACTV 1 -* TPCST tTKRDN *TPCST2 •TKRES1 ^ 0 -y BUSY 1 TPCST fTKRDN -TPCST2* TKRESg 3 —* TPCST STAKP -> STAKH 0 -» ACTV

TCLK 'TPCST 3 ^ STAKH -* DATAO TMREQ 4 -» TPCST

TCLK -MRDY-TPCST4 =* DATAI —y CNODE,EXNODE, SSEQ,CR,AD,LINK 5 —y TPCST tCLK -TPCST 5 ^ TSEMGO 6 —y TPCST

Tsemdn -tpcst6* FAIL=^ 0 -» BUSY 1 -y TPCST

TSEMDN•TPCST6.FAIL ^ CNQDE+mask —y DATAO TLREQ 7 -> TPCST tCLK-LRDY* TPCST? =» 8 -y TPCST Table 5.4 (continued)

TCLK'TPCSTQ => o code —■» DATAO tLREQ 9 -> TPCST tCLK * LRDY 'TPCST^ * null => DATA I -> STWRD fTSKGO 1 0 — ? TPCST fTSKDN*TPCST10^ 8 -* TPCST

TCLK 'LRDY 'TPCST9* null • EXNUDE =» 0 BUSY 1 -> TPCST tCLK'LRDY-TPCST # null* EXNODE'LINKZ=» CR -Y DAT AO fSUCC 0 -* BUSY 1 -» TPCST tCLK ‘LRDY*TPCST^*null' EXNODE'LINKZ LINK -* STAKP CR —> LR 3 -» TPCST

1. If the TP is not active (has no current task) and receives the request signal TPREu from the ACU, it accepts a new task, taking parameters from the data lines. The parameters define a goal token, and a stack for the current state of the system. The TP then blocks the request signal from being transmitted to further units. Note that although it is not ACTIVE, the TP may still be BUSY winding up its previous task.

2. If a TP is active, the TPREQ signal is passed through to other units.

3. When ready to start on a new task, a TP*s internal state TPCST is set to 1. 131

4. When each new character is ready, the ACU sends a TPGO signal and puts the current and next character on the data lines. If active, each TP sets its BUSY flag, sets its state to 2, picks up the characters (CCHRH,NCHRH), and clears the semantic failure flag (FAIL). It also starts the Token Recognizer.

5. The Token Recognizer tests the new characters according to its goal and its processing history. It returns one of three states: full match, partial match, or match failure. If a full match, it obtains a suitable value as the LR.

6. If recognition failed the TP is deactivated. If there was a partial match the BUSY flag is cleared and the state set to 1, but ACTV is not cleared as we are waiting for more input on this task.

7. When there is a full match, control goes to state 3. ACTV is now cleared and the unit can accept a next task (which it may itself generate).

8 . The TP issues a memory request for the top cell as indicated by the stack pointer. The resulting data is divided for its several functions: CNODE, the new current node; EXNODE, a flag set if CNODE is an exit node; SSEQ, a pointer to the semantic sequence to be executed; CR and AD, the working semantic data; and LINK which points to the next entry, if any, in the chain.

9. The control section then issues SEMGO to start the semantic processor which will perform any semantic primitives indicated by SSEQ. When done, the unit replies with SEMUONE. This processing modifies CR and AD.

10. If semantic processing has set FAIL, the unit terminates. Otherwise, it now requests the Language Definition Control Unit to perform a match on all statewords for the node CNODE.

11. The TP requests the next matched word from memory. If one is received it places its components in the STWRD buffer. These components specify a token, a node, a sematic sequence, and a stack which may have further node-semantics pairs. This information tells the TP how to build a new potential state from the current one. The control issues TSKGO, which starts the Task Generator to request a new task with this information. 132

12. Step 11 is repeated until there are no more matched statewords. Then if EXNODE is not set, the unit terminates. If EXNODE is set, this was an exit node and we must "pop" the stack. If LINK is null, however, the stack is empty and we have reached the final node of the grammar. The unit signals success (SUCC) and puts the CR on the data lines. This is a pointer to the Execution Tree, and will be passed as the result.

13. If the stack was not empty, the LINK becomes the new stack pointer and the CR becomes the new LR. The entire process is then repeated from step 8.

This description of the details of IP control is a simplification in several respects. We have been sloppy in describing communication with the associative memories, omitting data masks and function codes. We have not dealt with contention among signals from different units on the bus. We have not been precise about the timing relations required. This simplistic approach will continue, lest too much detail obscure the concepts of the design. Further problems can be saved for the actual builder of a machine.

The remaining sections of the Token Processor are the Token Recognizer, Semantic Processor, and Task

Generator. Schemes for the operation of each of these sections are shown in the flowcharts of Figures 5.10 through 5.12. These are essentially the algorithms we employ in the software simulation. We will not pursue these sections to the level of register equations, but will describe their functioning in narrative below. 133

TKRGO

CCHAR TSTAT (succ) Start fla (fail)

f

Fetch keywd into TKSTR

Lookup code Get stateword Lookup via TSTAT string

Found Perform Comparison

Store it

Passed string Code ix.—* LR Next ->TSTAT -> LR

Link-> TSTAT

Perform Token DONE Semantics

Figure 5.10 Token Recognizer Flowchart 134

SEMGO

SSEQ null

get code via SSEQ

SSEQ+1 SSEQ

Null code?

DONE interpret code get argument if any

Perform func

Figure 5.11 Semantic Processor Flowchart 135

TSKGO

Get a unique cell ID

Save ID & token for transmission

Clear cell buffer

insert node & SSEQ in bfr

Stack link null ^

Insert CRt AD & cur. link in buffer Read next entry to STWRD

Write cell get a unique cell ID

insert as Ink T ransmit in buffer request

Write Cell DONE

Figure 5.12 Task Generator Flowchart 136

The Token Recognizer operates on a token descriptor

which is initially provided in the task request. This

descriptor identifies a set of Token Statewords in the

Language Description Tables. If the token of interest

is a fixed keyword, the descriptor also includes an

index for that keyword in a string table. The input

character could also be a special "start" code; in this

case the unit signals immediate recognition. This is

the mechanism for initiating a program.

A goal of our system design is to keep the Token

Recognizer sufficiently general to handle almost any

conceivable token type. The scheme we now describe

should not be considered the only possible one, but it

appears to be useable for any practical token types we

know of. The set of Token Statewords for a given state

contains a number of individual statewords which are

intended to be accessed in order. Each stateword contains

seven elements as follows:

1. Character Designator 2. Mask 3. Comparator designator 4. Compare sense flag 5. Token Semantics 6 . Next token state pointer 7. Link to next stateword in set

The character designated in element 1 is selected and

And' with the mask. This character may be the current

or next character or one of several temporary characters 137

recorded by the processor. It is assumed that the external

characters have been concatenated with a set of flag bits

obtained from a Class Table in the Language Definition

Tables; these bits identify significant classes of

characters, such as digits and letters. The mask may

select some set of these bits or the character value

itself.

Element 3 specifies the item to which the character

is to be compared. This may be a specific value, a flag

pattern, or the next character in a keyword string.

Element 4 indicates whether "match" or "no match" is

the success condition.

If success occurs, the semantic action in element 5

is performed and the token state is set from element 6 .

Otherwise, the next stateword is fetched. The last

stateword in the set specifies a default and will

always succeed.

In the case of a token class, matching proceeds until

success or failure occurs and the significant characters

are assembled by the token semantics in a string buffer.

In case of success this string is then stored in a central

string table and its index is returned as the LR. No

interpretation is done on these strings.

In the case of a keyword, the keyword string is first

placed in the string buffer and then gradually shifted out.

On full match the keyword index is used to access a Keyword

Code Table to fetch the appropriate Local Result. 138

This recognition scheme can handle extraneous characters such as blanks at any point. It can count characters, and it can make context-dependent comparisons such as for string delimiters. The Token Statewords needed to describe the SLANG tokens can be seen in

Appendix C.

The Semantic Processor uses the indicator SSEQ to select a sequence of semantic primitives (microinstructions) in the Language Definition Tables. The SSEQ may be null; otherwise it points to a sequence of instruction codes terminated with a null code. The primitives provided are those described in Section 5.1. Some of them take an argument which specifies a particular execution function code; these codes are given in Table 5.1, and will be described further in Section 5 0 3. These primitives may be hardwired, or they may be handled by a simple form of microprocessor. They all act on the LR, CR, AD, and FAIL flags as their arguments.

The Task Generator is responsible for preparing state information for a next task and issuing a request for some TP to process the task. The request is a one-way signal. It is not necessary for the requesting processor to know if the task was accepted, or when, or by which TP. At this point it would be possible to place the task on a queue if there is a shortage of TP's for processing• 139

The unit operates by concatenating the partial stack in the stateword with the current state, producing a new state descriptor. This and the specified goal token comprise the task request. The token is obtained from the initial stateword. If a partial stack is present, the unit will fetch its elements one by one. The algorithm does not copy the previous state, but allows multiple pointers to it for each new state. The Storage Control is expected to handle any difficulties arising from these pointers.

When a new state is constructed, the unit enters the updated semantic data in the appropriate cell. The token designator and stack pointer are then placed on the data lines, and a task request is issued.

5.3 Execution Section

This section describes an implementation of the second major section of the language processor, the execution section. We will not formulate complete register equations for this section, but will describe the flow of activity in the units involved and discuss the principal signals required.

5.3.1 General Features

A block diagram for the Execution Section is shown in Figure 5.13. The Language Definition and Storage

Control units are repeated for clarity and are the same 140

Language Definition Tables Main Storage

LANG. DEF MAIN STORE CONTROL CONTROL

From — Analysis Section

ARRAY ~] EP ARRAY

Input channels OE

OE Output Channels

From AClt

Figure 5.13 Execution Section Block Diagram 141 as in Figure 5.6. Again we show the major intercommuni­ cation as being carried on a single bus, although in practice a more complex network may be preferred.

The major components of the section are the Operand

Evaluator Array and the Execution Processor Array. The

former is a set of identical processing units used to control the possibly parallel evaluation of operands for a function. The latter is a set of dissimilar units which perform distinct semantic functions in the language.

This section is based on the abstract model of Section 4.7.

5.3.2 The Operand Evaluators

The Operand Evaluator Array consists of units called

Operand Evaluators which essentially perform the functions

flowcharted in Figure 4.11. For implementation of SLANG we do not require any decomposable functions, and we will omit this capability here. The operation of an individual

OE is expanded in the flowcharts of Figures 5.14 through

5.16. A diagram of one OE with its principal signals and registers is given in Figure 5.17o The following description explains the steps in processing by the OE*s, as depicted in Figure 5.14.

1 o The signal OEREQ to start the OE is issued by the ACU or by another OE. If a unit is not busy and sees this signal, it accepts the task and suppresses further distribution of the request. The data lines provide a cell pointer (CELLP) and a Symbol Table pointer (SMTAB) as arguments. The OE sets a flag to C OEREQ

1->BUSY;0^OPIX DATA-»CELLP, SMTi \B

Read cell write w/flag= 1

Func= COND

NORMAL CONDITIONAL PROCESS PROCESS (Fig. 5.15) (Fig. 5.16)

Branch?

ES, 0 -> OERES CELLP Write OERES

OERES OPIX

Figure 5.14 Operand Evaluator Overall Flowchart 143

START

cur.op, -» CUROP OPIX+1 -+OPIX

OUROI null write OERES to tSEVAL Func= flag in SEQ (CUROP SMTAB)

cur .op. CUROP OPIX+1 OPIX

UROP LIST null SEQ

Wait (CUROP) fEPREQ Result-* (FUNC.CELLP Wait (CUROP) OERES SMTAB) Result-* OERES ranch

Branch? LIST

Return

Clear result Clear result

v i/

Figure 5.15 OE Normal Process Flowchart START c ~ ) cur.op CUROP OPIX+1 -+OPIX TOEREQ (CUROP,SMTAB) fOEREQ (CUROP,SMTAB) Wait (CUROP) Result Wait (CUROP) -» OERES Result OERES

OPIX+1 -*OPIX Clear result cur.op. CUROP OPIX+1 ->OPIX write OERES to flag

UROP ull

tSEVAL RETURN (CUROP,SMTAB)

Figure 5.16 OE Conditional Process Flowchart OPERAND EVALUATOR DATA(O) V DATA(I)

fSEVAL _ d a t a (o ) BUSY ------^ E CELLP SMTAB tEPREQ X MREQ OP IX ------> E CUROP MRDV C OERES OSTAT U OEREQ T I O N B U S P R O C E S s O R s

Figure 5.17 Operand Evaluator Block Diagram 146

BUSY, and the flag remains set throughout the task. Unlike the TP's which are synchronized on a common input of program text, the O E 's have no synchroniz­ ation but the basic system clock.

2. The operand index OPIX is set to the first operand. This selects an operand value from the cell to be processed.

3. The main store is accessed to fetch the current cell into a register. This cell contains a flag, a result field, a function code, and an ordered set of operands. The flag is set to indicate that evaluation is in progress.

4. If the function code indicates the COND function, the special process in Figure 5.16 is invoked. Otherwise, we use the normal process of Figure 5.15. These processes are detailed below. Either process will return a value or a branch pointer in OERES.

5. If the result is a value, the processor is done and sets its busy flag off. It does not change the flag and result fields in its current cell, as this will normally be done by an Execution Processor which is now working on the same cell.

6 .. If the result is a Branch Pointer, then a branch has been initiated by a lower level process. The current process is then cancelled unless it is the SEQUENCE which can satisfy the branch request. The process compares its cell pointer with the branch pointer. If they match, it uses the given index to reset OPIX and repeat the process from step 4. If they do not match, it passes on the branch pointer as its own result and terminates processing.

The "Normal Process" of Figure 5.15 is invoked for all functions but COND. This process involves the following steps:

1. Fetch the current operand, as indexed by OPIX, into CUROP. Also increment OPIX to point to the next operand. If a function has a great number of operands and the cell buffer is of limited length, it may be necessary to read in another cell. 147

2. If CUROP is a null operand (meaning end-of-list) then proceed to step 6 .

3. If CUROP is a string index indicating a primitive token, then issue a request to a special Execution Processor, 5EVAL, to evaluate the string. To do this the unit looks the string up in the symbol table identified by SMTAB. If it is a variable, it should be found due to a prior declaration. If not, we attempt to interpret the string as a constant, and place its constant value in the table. If this fails we store it in the table as an unrecognized form. In any event, SEVAL ensures that the string will now be found in SMTAB. It is not necessary for the OE to wait for this action.

4. If CUROP is a cell pointer we now must cause the indicated subtree to be processed. The OE issues a request for another OE to evaluate this subtree. If the function being processed is SEQUENCE, we then wait for this evaluation to complete. This is done by monitoring the flag in the root cell until a result is returned. If the result is a branch, the process terminates. Otherwise we clear the flag and result since the subtree must be re-evaluated if encountered again at a different point in processing.

5. Repeat from step 1 as long as operands remain.

6 . If the function was SEQUENCE there is no EP required, and furthermore, all evaluation is complete. We store OERES, the result of the last evaluation in the sequence, in the result field of the main cell and return.

7. Otherwise, we reset OPIX to zero and proceed to poll the operand flags in turn, waiting until each evaluation is complete. If the function is LIST we clear each result since no EP will be used. If any evaluation produces a branch pointer we return this as the final result.

8 . When all operands are evaluated, if the function was not LIST, an EP must be invoked to process the function. The OE issues the signal EPREQ and passes the function code and the arguments CELLP and SMTAB to the data lines. It then returns. The OE assumes that the request will eventually be honored, and does not wait for a signal that it is complete. 148

For the special function COND there are exactly three arguments. The first must evaluate to a TRUE or FALSE result; the second or third are alternate responses to this value. The process for this function is as follows:

1. Select the current (first) operand in CUROP, Start a new OE to evaluate this operand.

2. Monitor the flag and wait for evaluation to complete•

3. If the result is FALSE, increment OPIX so the third operand will be selected; otherwise allow the second operand to be selected. Clear the first operand result.

4. Select the operand. If it is zero return immediately. Otherwise evaluate it. Clear the result and return.

Simulation has indicated that about four or five

OE's suffice for any reasonable SLANG programs. The minimum required is related clearly to the depth of the Execution Tree, and thus to the level of nesting of expressions or other constructs in the program.

Block Structured languages will have more nesting.

The maximum depends on the program and has no absolute bound. Some number, perhaps 25-30, must be selected after empirical checking of the types of programs to be run. More could be added later if necessary.

An attempt to evaluate a large number of operands in parallel also drains the supply of OE’s. In a practical system some queuing mechanism must be employed, 149 so these requests may be fulfilled in sequence if there are insufficient units to process them in parallel.

5.3.3 The Execution Processors

The final major component of the language processor is the Execution Processor Array. The EP's are diverse units each designed for a specific semantic activity.

A request from an OE is directed to a particular EP capable of performing the requested function. A cell pointer and symbol table are provided as parameters.

Some of the EP's are designed to communicate with the external world through input/output channels. These processors accept data input and produce data output.

Implementation of the EP's is expected to be as diverse as their functions. In particular certain complex functions may be trapped by a common microprocessor.

This unit could access instruction sequences in the

Language Definition Tables to perform functions impractical to implement in special hardware.

The EP functions required for SLANG were given in

Table 5.1. All of these are simple enough and general- purpose enough for hardware implementation. We will describe action of these functions in somewhat more detail, but will not give circuitry for their actual implementation. 150

RSET: Obtain control of the channel as designated by its argument, and initiate the channel for reading. Return a zero result.

READ: Read a character string, assumed to be a digit sequence, on the selected channel. Convert the digit string to an integer value. If the argument is a pointer, fetch the value (which is a location). If it is a string look it up in the symbol table and find its location. Store the input value in this location. Return the value as the result.

WSET: Obtain control of the channel as designated by its argument, and initiate the channel for writing. Return a zero result.

WRITE: Look up the argument if a string, or fetch its value. Convert the value to a digit character string. Output the character string and an end- of-line character on the selected channel. Return the value as a result.

ASG: Fetch the value of argument 2, or look it up in the symbol table. Evaluate argument 1 and obtain its location. Store the value in the location. Return the value as result.

DCL: Enter the argument variable name in the symbol table. Reserve a word of storage. Return a zero value.

ADCL: Enter the argument 1 variable name in the symbol table. Reserve a block of storage with length as given by argument 2. Return a zero value.

EVAR: Look up argument 1 in the symbol table and find an array pointer. Get the value of argument 2. Determine the location of the element. Return it as the result.

GOTO: Look up the argument in the symbol table as a label entry. Return the branch pointer as result.

ADD, SUB, MPY, DIV: Perform the arithmetic operation on the value of the two arguments. Return the result.

GT, EQ, LT: Perform the logical comparison on the value of the two arguments. Return the Boolean result. 151

In every case, when the operand is a pointer, the

EP clears the value after fetching it. In every case the EP produces a result, and stores its result in the cell indicated by the cell pointer. A flowchart for this general activity is given in Figure 5.18.

5.4 Example of Program Flow

This section presents an example of the flow of analysis for a program segment in SLANG. The segment to be parsed corresponds to line 7 of the example factorial program (see Figure 5.3):

WRITE (2) N,X;

We will concentrate on the analysis of this line in the context of analyzing the entire factorial program*

The data structure we seek to produce is the one shown in Figure 5.4.

Initially, before any input is seen, the ACU invokes a TP for the initial state of the grammar. This TP then sets up seven new TP's with the following goal tokens and node stacks:

TP1 n (STM2,PROG2) TP2 n (VAR2,ASG2,STM4,PROG2) TP3 'WRITE'(WRITE2,STM4,PRQG2) TP4 'READ' (READ2,STM4,PROG2) TP5 'DCL' (DCL2.STM4.PROG2) TP6 'ARRAY'(ADCL2,STM4,PROG2) TP7 'IF' (IFS2,STM4,PR0G2) 152

Q START

/(more Operands Require

Perform Function Get Operand

Write result to cell String?

DONE

Look up Fetch result in Symbol Clear result Table

Save value

Figure 5.18 Execution Processor General Flowchart 153

In the above, the TP's are the Token Processors, followed by the goal token and the node stacks which describe the Syntactic State (Def. 4.39) of each of the seven tasks. Associated with each node in the stacks but not shown above is a Semantic Sequence, a Current Result, and Auxiliary Data, forming the

Complete State (Def. 4.48). All of the CR and AD values are initially null.

At the conclusion of processing the first statement in the program, the system will again attain this configuration with two differences. First, there is one additional processor:

TP8 'END' (PROG3)

Second, associated with the PROG2 node for all the TP's is semantic data. The CR points to a SEQUENCE cell which was created after the first statement. The Leaf

Pointer in the AD points to successive entries in this cell. After each further statement this configuration is repeated. When the WRITE statement is reached, the

LP points to operand eight, since seven entries in the sequence have already been made.

At this point processing of the WRITE statement begins. The "W" in "WRITE” is advanced to the current character (CCHAR), and "R" becomes the next character

(NCHAR). The analysis proceeds as follows: 154

1. INPUT = WR. TP's 1-3 have partial recognition as the input is consistent with a variable name and with the keyword "WRITE." TP's 4-8 all fail. II M 2. INPUT • TP's 1-3 remain active.

3. INPUT = IT. TP's 1-3 remain active.

4. INPUT = TE. TP's 1-3 remain active.

5. INPUT = E/space. TP's 1 and 2 remain active. "Space" will be acceptable within a name, but will be ignored. TP3 now signals complete recognition and generates a new task;

TP3 '(' (WRITE3,STM4,PROG2)

In setting up this task, TP3 has executed the semantic sequence associated with node WRITE2. These sequences are shown in the Language Definition Tables of Appendix C.

This sequence generated a cell with a SEQUENCE function, which is now inserted as the CR associated with WRITE3.

6 . INPUT = space/(. TP1 and TP2 now signal complete recognition of the valid name "WRITE." TP1 advances to node STM2, performing a semantic sequence which saves "WRITE" as its CR and a possible label. It sets up a new task to search for colon:

TP1 ':' (STM3.PROG2)

Note that although NCHAR is already known to be a left parenthesis, not a colon, this task will still be set up. This is a tradeoff for reduced complexity of the TP's as discussed earlier.

TP2 moves to node VAR2 and sets up a search for '('. Since VAR2 is an exit node, it also sets up a possible transition to ASG3 with goal Finally, TP3 signals partial recognition since spaces may be intermixed with keywords. The status is now: 155

TP1 (STM3.PROG2) TP2 '(' (VAR3,ASG2,STM4,PROG2) TP3 '(' (\iVR ITE3 , STM4 , PROG2 ) TP4 ' = ' (ASG3,STM41PROG2)

It may be noted at this point that the cells in storage representing nodes STM4 and PROG2 in the several node stacks are the same cell, shared with multiple links.

7. INPUT => (/2. TP2 and TP3 have full recognition; TP1 and TP4 fail. TP2 sets up a possible transition to node VAR4. Its current CR is the variable name "WRITE." TP3 sets up a possible transition to node WRITE4 through the GV "WSEL." Its current CR is a SEQUENCE cell with no operands. The new configuration is:

TP1 i (VAR4,ASG2,STM4,PROG2) TP2 i (WSEL2,WRITE4,STM4,PROG2)

8 . INPUT =* 2/). Both TP ’s succeed. Both set up searches for right parentheses. TP1 generates a CR cell with function EVAR and operands "WRITE" and M1 ." TP2 generates a CR cell with function WSET and operand "1." It then makes a further transition to state WRITE4, entering this cell as the first operand in the SEQUENCE cell for WRITE. The new configuration is:

TP1 ')’ (VAR5,ASG2,STM4,PR0G2) TP2 •)’ (WRITE5,STM4,PROG2)

9. INPUT * )/space. Both TP's succeed. TP1 generates a transition to VAR5 which is only an exit node; it proceeds to state ASG2, returning the EVAR cell as LR. The transition to ASG2 generates a cell with function "ASG" as the CR at the ASG level. The EVAR cell pointer is inserted as the first operand. So far, the semantic output of this path is prepared to make an assignment into element 2 of an array called "WRITE."

TP2 generates a transition to WRITE5 which leaves its CR undisturbed. The configuration is now:

TP1 (ASG3,STM4,PR0G2) TP2 n (VAR2,WREL2,WRITE6,STM4,PR0G2) 156

10. INPUT = space/N. Both T P ’s find partial success and remain active.

11. INPUT =» N/comma. TP1 fails. TP2 succeeds, reaching state VAR2 which is an optional exit node; the next state WREL2 is an exit node only, and WRITE6 is an optional exit node as well. Thus TP2 generates the configuration:

TP1 •(• (VAR3,WREL2,WRITE6 ,STM4,PROG2) TP2 (WRITES,STM4.PR0G2) TP3 (STMS,PROG2)

At this point it is useful to summarize the inter­ relation of the nodes and the semantic data presently associated with each. The system structure is illustrated in Figure 5.19. In TP1 , at the level of the GV "VAR," the CR is simply the string "N". WREL has a null CR; if the path through VAR completed, it would receive "N" as its initial CR. WRITE has a subtree pointing to a

SEQUENCE cell which contains a complete "WSET" cell as operand 1; its LP points to operand 2. The STM CR is null as no part of its semantics has been completely assembled. The PROG CR is the main SEQUENCE cell into which earlier statements have been entered as subtree operands*.

In TP2 at the WRITE level, the CR points to a

SEQUENCE cell in which two operands have already been entered: "WSET 1" and "WRITE N". The LP points to operand 3. Note that this is in fact the same SEQUENCE cell pointed to by the WRITE level of TP1 . It may seem 157

TP1

NODE VAR3 SSEQ CR LP

SEQ

WREL2

. WSET TP2

WRITE5 WRITE

TP3

STM 4 STM5

PROG 2

SEQ

(Previous Statements)

Figure 5.19 Snapshot of Analysis 158 that a conflict will arise here as both paths try to store into the same operand. Fortunately the path of TP1 is about to fail and will not override operand 2 which TP2 has already set. In fact the non-ambiguity of the language ensures that we can structure our tree so such conflicts never arise.

If this conflict were possible, it could also be prevented by duplicating at most one cell (the SEQUENCE cell) when the WRITE path diverged.

Levels STM and PROG, in TP2, are the identical nodes as in TP1. Level STM in TP3 points to the complete

WRITE sequence, making three different pointers to this subtree. Level PROG in TP3 is again the same as TP1 and

TP2.

12. INPUT = comma/X. TP1 and TP3 fail. TP2 succeeds. The next configuration has one processor:

TP1 n (VAR2,WREL2,WRITE6 ,STM4,PROG2)

13. INPUT = X / ; . TP1 succeeds and generates three paths. The new configuration is exactly as in step 11 above. The entire system status is again as shown in Figure 5.19, except that a cell with function "WRITE X" has been entered as operand 3 of the WRITE sequence, and all of the LP's have been advanced one entry.

14. INPUT = ;/space. The space is from the next line, as line endings were suppressed by the preprocessor. TP3 succeeds, advancing to 5TM5 and thus to PROG2. The WRITE Sequence cell with its three operands is now entered in the PROG-level sequence. The successful TP now sets up eight new tasks as described at the beginning of this section. Processing continues with the next statement. 5.5 Simulation and Performance

The complete processing system as described in this chapter was simulated in software on a GRI-99 minicomputer,--. oo The simulator was written in assembly language using a structured language preprocessor developed by the author^. An attempt was made to organize the programs in a fashion as similar as possible to the intended machine organization. In addition to an overall control module, the simulator contains modules for analysis, execution, storage control, and language definition.

Although our computer does not offer associative storage, the calls to the storage control unit were structured as they might be in a real machine and then interpreted by the storage module. All accesses to simulated memory are made through this module.

The complete simulator accesses language definition tables in a separate module. Program input is taken from a disk file and preprocessed in the control module. Data input/output is via the terminal and controlled by the execution module. A block diagram of the simulator software is given in Figure 5.20.

The simulator, of course, cannot function in parallel.

Instead the analyzer proceeds by sequentially executing the task step for all active TP’s after each new input character. Next tasks are entered in a queue and assigned to TP's after all processing is complete. The Execution 160

SLANG' SIMULATED DEFINITION ASSOCIATIVE TABLES MEMORY

LANG. DEF. TB STORAGE CONTROL CONTROL

DATA IN ANALYSIS EXECUTION

Control

Preprocessor Program ^>To All Modules Text Subs, Errors

CONTROL MOOULE

Figure 5.20 Simulator Block Diagram 161

Module structures the OE's as a recursive subroutine, and always evaluates operands sequentially. Program listings for the Analysis and Execution Modules, respectively, are given in Appendices D and E,

The simulator was run with the Language Definition

Tables for SLANG as given in Appendix C. A number of test programs in SLANG were written and executed under the system, including the factorial program of Figure 5.3,

The most reasonable performance measures for a system such as we have developed here are the cost and complexity of hardware required for a given level of capability, and processing speed. We have only a part of a practical system. We have assumed only correct input and cannot discuss error handling or auxiliary services. While we must show our system to be theoretically correct, we have not provided fault-tolerant design and cannot discuss reliability. We have only a single-user machine and cannot discuss overall throughput or utilization levels.

It is, however, appropriate to discuss the extent of the hardware required. We have assumed hardware costs low but not zero. The principal hardware components of our system are the Token Processors, Operand Evaluators,

Execution Processors, Storage, and interconnection networks. We have discussed the number of TP's required, and we develop this further in Appendix A. The number 162 depends on the grammar and not on the program complexity.

We have estimated that 25-30 TP's of the current design should suffice for most reasonable languages. It is not unreasonable that each TP could be fabricated as a single

LSI chip, although speed may be compromised to limit the number of pins.

Similarly, the Operand Evaluators could be provided as 25-30 LSI chips. Their number is a limit to some extent on the complexity of the programs which can be processed! but if some degree of sequential processing of potentially parallel steps is tolerated, this number should suffice for most reasonable programs. We have made no empirical study of programs of high complexity, and the number is only an estimate.

The execution processors required are determined by the richness of functions in the language. The minimum in this area is a single microprocessor which interprets all functions as required. Hardwired implementations of particular functions may be added as they are justified.

The simulator includes a pseudo-associative memory composed of 96-bit cells. In running the SLANG programs we were generally surprised by the limited amount of storage required for analysis. The program includes mechanisms for releasing cells no longer required, and a measure on the largest number of cells used. The 163 factorial program required but 60 cells, and we might estimate that not more than 1-2K would be required for working storage in the analysis of most reasonable grammars and programs. This storage, of course, does not include data storage used by the running program, which may be arbitrarily large. It does, however, include the complete execution tree and symbol table, placing a ceiling on acceptable program length.

The language definition tables are conceived as being stored in a separate, perhaps faster, associative memory. The tables for SLANG required less than 200 cells. More complex languages would require many times this number.

Finally, the interconnection network must provide communication among perhaps 2 0 0 separate units, all physically close, and was described as a single bus with

100 or more lines. Many implementations are possible, and each have their characteristic costs and speed.

In discussing the processing speed of the system, it is reasonable to assume that the storage control will have very high utilization and will be the limiting factor on speed. We have assumed that only one access operation on the memory is possible at one time, although multi-access memories have been developed. All of the processing units must access memory cells at various stages of operation. 164

Accordingly we propose that the number of storage

accesses provides a good estimate on the running speed

to process a given program. Each access is taken to be

essentially a request for a basic memory operation:

match cells, read a previously matched cell, or write.

We assume each of these consumes an equal amount of time.

Present-day associative memories may have cycle times

from under 500 nanoseconds to 1 0 0 microseconds or more,

depending on their structure. Although the literature

is surprisingly silent, a reasonable operation time for

a bit-serial, 100-bits per word AM seems to be about 10 us.

In running the simple factorial program to calculate

"7-factorial," there were just under 2 0 0 0 such references.

Of these, about two-thirds occurred during the analysis

phase. This would give an estimated running time of

2 0 ms for interpretation and execution, exclusive of any

input/output delays. For comparison, a similar program

already in machine language would run on our computer in

1-2 ms. However, language processing even with an assembler would consume several seconds exclusiwe of overhead. Our

processor cannot compete for programs of extended life

cycle, which will be compiled once, filed in object form,

and run repeatedly or continuously. For typical compile-

and-go programs, though, our hardware interpreter will

be many times faster. 165 5.6 Summary

In this chapter we have presented an implementation for a complete language processor developed around a small language called SLANG. We first defined the language used, and developed a complete set of language definition tables. We then specified some hardware for a processor sufficiently powerful to handle this language.

As a model, the simple Analysis Control Unit was developed completely at the register level, with flowcharts, signal lists, and register transfer equations. Subsequent components were developed in less detail, with at least flowcharts provided and the principal signals identified.

In this manner we described implementations for the Token

Processors, Operand Evaluators, and Execution Processors.

We then described in detail the successive steps in analyzing a sample program segment in SLANG. Finally, we described a complete simulation of a processor for

SLANG and presented measures of its performance. CHAPTER VI PROCESSORS FOR REAL PROGRAMMING LANGUAGES

This chapter extends the ideas presented in the previous chapter for the language SLANG to the implementation of some common actual programming languages. The practical usefulness of the system depends in great degree on its ability to deal with complete languages that are currently in widespread use. We will not present complete design details of a system for processing these languages, but we will discuss the major problems which the languages present, and show through selected examples how our system could deal with these problems.

Section 6.1 is a detailed treatment of a method of implementation for FORTRAN IV. Section 6.2 gives a similar development for ALGOL 60. Section 6.3 then discusses some selected problems presented by other significant languages, especially PL/l, SNOBOL4, and ALGOL 6 8 . We bear in mind that our goal is a single processor which can be configured for any of these languages by changes to the

Language Definition Tables.

6.1 FORTRAN IV

This section considers the problem of implementing

ANSI Standard FORTRAN iV^Q on our language processor. The

166 167 goal is a system which will interpret and execute all standard FORTRAN programs. In what follows, we assume the reader is familiar with the ANSI FORTRAN definition.

6.1.1 General

FORTRAN lines start with a fixed-format header, and may be comment lines, continuation lines, or labelled or unlabelled initial lines of statements. Accordingly we will simplify the task by requiring the preprocessor to deal with these headers and to comments, process line endings, and pass on labels in a free format "meta- syntax" for storage by the analyzer. We will not discuss the structure of the preprocessor. Figure 6.1 shows the form of a short FORTRAN program before and after preprocess­ ing.

Blanks in FORTRAN are ignored everywhere, even inside token strings, except in Hollerith strings. The pre­ processor will not be smart enough to identify Hollerith strings, and so must let the blanks pass. They will be dealt with by the Token Recognizers.

We will continue with our assumption that all input programs are correct. We will not discuss error recovery or diagnostic output. We do include mechanisms for error detection, but the system response is not defined. a) Original Text:

C FACTORIAL PROGRAM3 Ci INTEGER K,X,N2 X=1 J READ (1 ,100) NJ DO 1 0 K=*1 , N$ X * X*K^ 10 CONTINUED WRITE (2,101) N ,X i STOP 2 100 FORMAT(15)2 101 FORMAT(13H THE COUNT IS,13,2 1/,14H THE RESULT IS,I5)j END2

( = end-of-line) b) After Preprocessing:

INTEGER K,X,N# X=»1 # READ(1,100) N# DO 1 0 K=»1 , N# X ■ X*K# [10] CONTINUE# WRITE (2,101) N ,X# STOP# [100] FORMAT (15)# [1011 FORMAT (13H THE COUNT IS, 13,/,14H THE RESULT IS,15)# END#

( # * end-of-statement)

Figure 6.1 FORTRAN Preprocessing Example 169

We depart from the ANSI FORTRAN standard in only two respects, necessitated by our unconventional storage system. The storage sequence specified for arrays will not be observed. We will not define any relationship between data entities of different type, or allow them to be equated or share storage through COMMON or EQUIVALENCE statements.

6.1.2 Token Classes

FORTRAN has a number of fixed tokens or keywords, and other tokens which can be partitioned into classes.

The token classes we will recognize are: Name, Integer,

Real Number, Octal Integer, Format String, and Hollerith

String.

A Name is a string of alphanumerics beginning with a letter; an Integer is a string of digits. These are similar to token classes in SLANG but for one issue: The first letter of a name may imply its data type. The

Token Recognizers will not be concerned with this since they preserve the entire string for future analysis. This problem will be dealt with by an Execution Processor.

A Real Number will be any acceptable real 0 £ double­ precision number form. Complex numbers need not be distinguished as tokens. 170

An Octal Integer is a digit string and appears only in

STOP and PAUSE statements. The Token Recognizer will need to append a code to this string when storing it to dis­ tinguish it from a decimal integer.

We consider a Format String to be simply any string up to the end-of-statement code. FORTRAN allows format strings to be input dynamically during execution. For this reason we avoid all syntax checking on these strings by the analyzer; this task will be reserved for a special execution processor, probably implemented in software.

The final class, Hollerith Strings, is unique because its format is context sensitive. Having detected an integer followed by "H", we must immediately evaluate the integer to find the format (length) of the rest of the token. Our Token Recognizers can handle this problem in a straightforward manner.

6.1.3 Syntax and Analysis

FORTRAN is not a completely context-free language.

In several cases, the interpretation of a statement is dependent on an earlier statement such as a type declaration. Nevertheless we can formulate a grammar which is "almost" context-free and express it as a set of syntax graphs. A complete set of graphs defining a

FORTRAN IV "program," including subprograms, is given 171 in Appendix F (actually the grammar is for the metasyntax output by the preprocessor). These graphs form the basis for our analysis.

As in SLANG, the object of analysis is to process program text with respect to these graphs, producing an execution tree. Many statement types are comparable to

SLANG statements or straightforward extensions of them, and these will not be discussed. We will concentrate on statements which raise significant new issues to be resolved during analysis. These include declarations, context- dependent statements, DO statements, function and subroutine handling, and equivalence and interprogram communication.

Declarations and Context-Dependency. All storage in

FORTRAN is considered to be static. Variables are declared explicitly by type declarations or implicitly by appearing in assignment statements. As in SLANG, we have the choice of allocating storage by the analyzer directly, or passing the declarations on as statements for execution. In

FORTRAN we choose to perform storage allocation as semantic tasks under the analyzer for two reasons:

1. FORTRAN includes DATA statements which define initial values for variables. We must allocate space to store these values.

2. Several statements and variable references depend for their interpretation on earlier declar­ ations. For example, the statement: 172

A(X,Y) - X + Y

is an assignment statement if A was declared to be an array by a DIMENSION statement; otherwise it is a statement function. Also, it is helpful if not necessary to distinguish variables declared as logical when analyzing expressions.

It is thus necessary to compile a dynamic symbol table as text is processed for analysis. It is also necessary to provide semantic primitives to look up a symbol and determine its attributes. These primitives will then cause a path to fail if the attributes are not as desired.

DO Statements. The FORTRAN DO statement is a special form of the controlled iteration statements found in most languages. Unlike block-structured languages, the code sequence to be iterated is not cleanly bracketed by the syntax, and we have only the label for the final statement as contained in the DO statement. However, the DO ranges are always well nested, and this suggests a structure for execution.

Figure 6.2 presents a program segment containing a

DO-Loop, and a corresponding portion of the execution tree. We have represented the DO by the component, activities which must be performed. It would be equally possible to provide a single function code "do’* or 173

SEQ

B+1 A+B ASG ASG Program Text

DO" Model

SEQ

ASG ASG COND

ADD

GOTO

SEQ

ASG ASG

ADD ADD

Figure 6.2 Execution Tree for a DO-Loop 1 74

"iterate" with parameters; this would then be handled by the execution section as a decomposable function as described in Chapter IV.

Structuring the DO range as a subsequence in the main processing sequence has useful properties which will also apply to block-structured languages. In the GOTO or branching mechanism described in earlier chapters, a transfer will only be honored if the destination is in a sequence which is also an ancestor of the GOTO itself.

An attempt to transfer into a DO from outside it will not work, and processing will terminate. Jumping out of a DO-Loop will cause no difficulty.

If the awkward "Extended DO" concept is allowed, we must also be able to get out and THEN back into a loop, and some additional record keeping is required. We will not discuss this further.

We have the desired form for our DO prototype, but the question of how to achieve it remains. Our expanding set of semantic primitives must keep a stack to record nesting levels. Into this stack will be placed the end label for each DO statement. Each label encountered must then be checked against the top of the stack. If a match is found, at least one DO-Loop ends here; the proper seman­ tics are then issued, the stack is popped, and the test is repeated. 175

Functions and Subroutines. FORTRAN provides a subroutine capability through statement functions, and through

function and subroutine programs. We treat these

subroutines in a uniform manner: they may have input

and output parameters, and each instance of execution

produces a value for the subroutine itself. When executing,

the subroutine has normal access to the current symbol

table.

Figure 6.3 shows the basic structure for subprogram definition and call. When defined, a subprogram is formed as an independent tree and linked into the symbol table

along with its name. A table for parameter values is also provided, and all parameter references are indexed through

this table.

A subprogram call invokes an execution function to evaluate the subroutine, and includes a name and a list

of actual parameters. These parameters will be linked to

the subroutine during execution, described later. Evalu­

ation of the call returns a result value which may or may

not be used.

We have assumed that all subroutines used are defined within the program being run. However, it is a straight­

forward matter to provide a starting symbol table, com­ plete with information on where to get subroutine proto­

types from mass storage as needed. With associative main 176

F(X,Y,Z) » X*Y+Z-2*A

SUBP ASG

Symbol Table

ADD Parameter Block

MUL SUB

MUL

J » F(Q,XtR)

ASG

CALL

Figure 6.3 Subprogram Definition and Call 177 storage, tree structures can be freely saved and restored as needed. Thus predefined subroutines and functions can be made available.

Equivalence and Interprogram Communication. In FORTRAN, like most languages, it is possible to define separate program modules and link them together for execution.

Some information must be shared between modules, and some information must be kept private by each,

The information involved is essentially contained in the symbol table. We now must provide an additional field in the table which partitions it into subtables, one for each program module and one for common data. Each module may access its own subtable and the common subtable.

The entries in the external table will be subprogram names and COMMON blocks. A COMMON block will be structured as a non-homogenous array or structure (cf. PL/l). We will not allow the sequence of data types in a common block to vary between references. The analyzer will define

COMMON blocks during the first module which references them.

FORTRAN also provides an EQUIVALENCE declaration between variables. We implement this for variables of the same type by placing equivalent pointers in the symbol table.

It is not allowed for variables of different type and will be ignored. Use of EQUIVALENCE for storage efficiency is assumed to be unnecessary. 178 6.1.4 Execution

For FORTRAN, a number of new functions must be provided through the Execution Processors, both for calculations and for program control. We must provide for addition, subtraction, multiplication and division of real and complex quantities as well as integer; the complex opera­ tions will be decomposable into real ones. An exponential

function must also be provided. We assume no actual distinction need be made between real and "double-precision" operations. Instead, all operations will be performed with high precision.

Witb multiple data modes we now have need for mode checking and conversion. The assignment operator must check the mode of the value to be assigned and convert it

if necessary. Conversion between integer and real is

required. In addition, we must convert real constants

from character string to internal form.

We require the logical operations AND, OR, and NOT.

The six logical relations must be implemented for

relational expressions.

An EP must be provided to start subroutine execution

and bind actual parameters to the subroutine.

Functions are necessary to perform input and output

as directed by Format Strings, and to perform auxiliary

I/O functions for REWIND, BACKSPACE, and ENDFILE. 179

The "Formatted I/O" functions are both logically complex and unique to FORTRAN. While other languages provide for I/O formatting, they do not use the FORTRAN syntax or philosophy. For this reason it would be sensible to implement these functions in software or firmware. All other functions mentioned are conceptually s mpler and of wide application. They should be provided as intrinsic functions in our system,

6.1.5 Summary

The functional capabilities which our system must possess to implement FORTRAN IV, beyond those required for SLANG, have been discussed above. These capabilities can be expressed as primitive operations which various subsystems must be able to perform. The principal opera­ tions required are summarized in Table 6.1.

6.2 ALGOL 60

This section considers the implementation of ALGOL 60.

The basis for the language is the Revised ALGOL report^*

We assume familiarity with this report.

6.2.1 General

ALGOL uses a free format, context-free syntax. Line endings are meaningless and may be stripped by the pre­ processor. Blanks are ignored except in strings and may 180

Table 6.1 Extended Primitive Functions for FORTRAN

CLASS FUNCTION USAGE

Token Semantics Evaluate Integer Hollerith Strings

Major Semantics Put entry in Sym.Tbl. Declarations Seek entry in S.T. Resolve ambiguity Create Parameter List Subprograms Store into stack DO loop setup Compare stack entry DO range checking Pop stack DO loop termination

EP Functions Bind Parameters Subprograms Test data mode Assignments Convert data mode Assignments Convert string Many Real Arithmetic Expressions Complex Arithmetic Expressions Logic Operators Expressions Logic relations Expressions Formatted I/O Read and Write Auxiliary I/O REWIND, etc. 181 be handled by the token recognizers. Comments have a specialized structure and must be recognized as special tokens and handled in the main syntax graphs.

ALGOL has no standard for input and output. We assume these are handled by calls to appropriate external procedures.

6 .2 . 2 Token Classes

ALGOL includes keywords which are considered by the definition to be unique and indivisible. In practice, these are either treated as reserved words or quoted in some manner. Either convention is acceptable to our token recognizers.

Other significant token classes required are variable name, integer, real constant, and chargeter string. The first three are not unlike their FORTRAN counterparts.

A character string is any quoted string. Left and right quotes are distinct and strings may be nested. The token recognizer can easily keep a count and record the levels of nesting.

Three special types of strings must also be recognized for comment conventions: "Letters only," any string not containing ";", and any string not containing end, or else. 182

6.2.3 Syntax and Analysis

A complete set of graphs defining an ALGOL 60 program is given in Appendix G. As before, we will concentrate on those aspects of the language that are not found in earlier material. The subjects of interest include Block

Structure, Control Statements, Procedures, and Switches.

Block Structure. The most striking aspect of ALGOL which distinguishes it from static languages is its block structure. As control enters and leaves blocks, the

"environment," which may be viewed as the current symbol table, changes. ALGOL uses dynamic space allocation and declaration of variables, and the same name may have ­ erent meanings at different points in execution. For this reason, interpretation of declarations must be deferred at least partially to the execution section.

Fortunately, ALGOL does not possess the context dependencies of FORTRAN, and statements can be analyzed unambiguously without consideration of earlier input.

An effective model of block-structured processes, based on "contours," has been given by Johnston^^*

Figure 6.4 reproduces his illustration of an ALGOL program and its (static) contour model. The example was chosen to illustrate the subtleties of environment changes in block-structured languages, rather than as a practical ENTER B1 begin integer N;

P P : 2 ENTER procedure P(X,C);value C

procedure X;integer C; begin ENTER B2 B2 begin ENTER R procedure R(T);label T; begin N : =»N+C; call X(T) X(T); goto T go to T ; end; i_f C > N then X(J) ca else P(R,C+1 ); EXIT J :end;

go to L; end; Q:1 6 ENTER Q procedure Q(T);label T; N:»N+1 I•

N:*2; 20 call P(Q,2) P(Q,2); L; print (N); end

Figure 6.4 Johnston Contour Model BLK END8 K

N int ASGN PRINT N P proc P Q proc q L label b1 ,3 Q CALL

c*- \ SUBR s

\/

BLK b2 ENDBK

V X proc *S 1 GOTO c int <*f2

Figure 6.5 Execution Tree for Johnston Algorithm 185 b2: BLK ENDBK

R proc r COND CALL J label b2,2

V > N CALL

V ADD

SUBR

BLK ENDBK

lab ASG ADD

SUBR

BLK ENDBK

lab CALL GOTOASG

ADD

Figure 6.5 (continued) 186 algorithm. In the figure, each contour represents a block or procedure and has associated with it a partial symbol table. Our analog of this model is the execution tree given in Figure 6.5. In this example we have introduced a new function designator, BLOCK, for block entry. This is similar to a “sequence" but includes as its first operand a pointer to a partial symbol table. The analysis section is charged with constructing the table and identifying it with a unique index. However, no storage is allocated.

We will need to recognize BLOCK'S in our Operand Evaluators*

We also introduce a function ENDBLOCK to deactivate a particular symbol table when the block is exited. The effect of processing these new functions will be considered in the next section.

Control Statements. The ALGOL for statements can be implemented in a manner analogous to the FORTRAN DO. The if...then...else is constructed using the COND function.

ALGOL has the interesting ability to imbed conditionals in an expression, e.g.,

X = Z + (i_f A > B then W else Y).

Since the execution subtrees in our system always produce a value, this construction can be handled without difficulty. 187

Procedures. Procedures in ALGOL are similar in many ways to FORTRAN subprograms, and may occur as value-returning functions or as subroutines. In ALGOL, procedures consist of a block, which introduces a symbol table. Parameters to procedures are passed both by value and by name. We handle this in a natural way by placing either the value or the actual name index or unevaluated expression subtree in the parameter block. The data items involved are self-describing. If a parameter lookup function encounters a name or expression, it will evaluate it whenever it is accessed.

Switches. Switches in ALGOL are in effect one-dimensional arrays of label values. They may be constructed and accessed in a straightforward manner including range checking•

6.2.4 Execution

The Johnston contour model represents execution of an ALGOL program by a snapshot for a given point in time.

This snapshot resembles a static algorithm model but includes a processor with an instruction pointer and environment pointer. We represent the environment by a

"currently active" symbol table. 188

The Johnston model preserves the algorithm from change and introduces copies of contours for each instance of execution. This allows recursion in a natural manner.

We would introduce copies of at least those components of our tree which occur within procedures and must be accessed recursively.

We may identify, with Johnston, the "height" at any point in execution, i.e., the number of procedures or blocks which have been entered and not yet exited.

The task of executing a BLOCK entry is to generate a copy of the associated symbol table marked with an

"active" bit and a field containing the height. The

"current height" is maintained and incremented on block entry. The BLOCK function is also responsible for allocat­ ing space for the necessary variables.

The ENDBLOCK function normally removes the copy of these symbols from storage, and decrements the height.

If block execution is stopped with a branch out of the block, the associated symbols must be deleted as if an ENDBLOCK was encountered.

Upon procedure entry, we must consider the height of the block containing the procedure definition, and deactivate temporarily those symbols that arose at a greater height. On procedure exit these symbols must be reinstated. 189

Finally, symbol lookup is now conducted by seeking the name wanted among entries in the "currently active" symbol table. If more than one match is found, a further test is made for maximum height, yielding the unique symbol definition.

An additional concern is the own attribute in declarations. This attribute provides for retention of local variable values between successive block entries.

It is widely considered to be poorly designed and is used infrequently; it would most conveniently be omitted. To provide this feature we must modify our process so that o copy of a block symbol table, marked inactive and complete with value entries for (at least) some data items, is retained when the block is exited. No harm occurs in treating other variables as "own," except for increased storage demands. Only one such symbol table copy need be retained for each static block, even if the block is called many times recursively. On block entry, if a dormant (or active) table exists for the block, it is used (or copied).

6 .2.5 Summary

The major new capabilities indicated for ALGOL among the various functional units of our system are listed in

Table 6.2. Only functions not required for FORTRAN are listed. 190

Table 6.2 Extended Primitive Functions for ALGOL

CLASS FUNCTION USAGE

OE enhancements Allocate symbols Start Block Copy Trees Procedure entry Release Symbols Go to

EP Functions Implication Logic expressions Equivalence Logic expressions Release symbols End block

6.3 Other Languages

This section highlights some issues which will occur in the implementation of other significant languages.

Complex Data Types. Many languages provide an assortment of data aggregates such as multidimensional arrays, PL/l- type structures, etc. As described in Appendix B, these entities can be easily represented in our associative memory for access either by index or by field association.

There is no problem in having a divergence of data types for the entries, or in variable array bounds which cause difficulty in conventional computers. It is also an easy matter to represent sets.

Extensible Types. Some languages such as SNOBOL4 and

ALGOL 68 allow new types or modes to be defined as built 191 up from other modes. Our system can process these modes by creating new Execution Processor prototypes in the

Language Definition Tables. These EP's can invoke existing

EP's for operators, mode conversion, etc.

Extensible Qperators. SNOBOL4 and ALGOL 68 further allow the definition or redefinition of operator symbols (+, -, etc.) and reassignment of priorities. In the Language

Definition Tables, the symbol definitions (EP associations) and the priority table are open for possible modification.

New EP's may be generated as above.

Name and Code Strings. SNOBOL4 provides an intimate connec­ tion between the character representation of a program and its execution. Variable names may be accessed as character strings. Strings may be created dynamically and later interpreted as program text. Name access poses no diffi­ culty since a symbol table is always kept and the string as well as its index is available. Code generation will require that the Execution Section be able to pass text back to the Analysis Section for interpretation. In principle, this should be a reasonable facility to provide.

Procedures and Statements as Values. ALGOL 68 treats procedures, statement groups, etc., as entities which may be assigned, may return values, etc. We have not made the classic distinction in our execution trees between expressions and statements. All such units are subtrees.

They may at least conceptually be used in any context, and if evaluated (executed), they produce a value. The unified treatment of ALGOL 68 follows naturally.

Interrupts. PL/1 provides facilities for branching or interruption on various exceptional conditions detected by individual EP's (e.g. arithmetic overflow) or conditions detected and signalled explicitly in program segments. An implementation approach would be to set up a vector of branch points. The appropriate EP or subtree would evaluate under the specified conditions to a special index to the proper branch, and the OE may perform the transfer.

Explicit Parallelism. Our system is structured to do many operations in parallel while processing a basically sequential language. Where language constructs provide explicit parallel operations it will be natural to accept these and handle them concurrently. Newer languages are including features such as collateral statement execution, simultaneous array operations, task spawning, coroutines, etc. With LIST and similar execution functions we may provide for execution in true parallel in most cases.

Where task synchronization is required, this can be handled by access to agreed-upon variables. CHAPTER VII

CONCLUSIONS AND FUTURE DIRECTIONS

7.1 Summary and Evaluation

In this dissertation we have presented a working design for a language processor capable of direct hardware interpretation of high-level languages. The key features of the design have been:

1. Ability to modify and extend the language definition at any time;

2. Use of a parallel processing philosophy through­ out the design.

A theoretical model for the language processor was developed in Chapter IV. We began by establishing the class of languages to be processed as a subset of the context-free languages. A method of representing a grammar as a set of directed grammar graphs was introduced. We discussed ways of categorizing the nodes and arcs of these graphs, and defined paths through a graph as attempts to recognize a valid sentence.

The Token Processor mechanism was introduced to explore individual paths in parallel. A Token Processor which reaches the exit node of the goal symbol graph has recognized a sentence (program) in the grammar.

193 194

In order to interpret the meaning of programs, we aug ented recognition in the TP with semantic primitives which gradually built an Execution Tree. When recognition is complete, this tree represents the semantics of the program in a form suitable for execution.

We then defined the Execution Section which is responsible for executing the program. This section consists of Operand Evaluators which process input operands for a function, and Execution Processors which carry out significant operations on these operands. The Execution

Processors include functions for input and output. All of these activities occur in parallel as much as possible.

We then discussed some extensions which expanded the concept of a "token" and made possible perturbations to the context-free character of our languages.

In Chapter V we put the model to work in a specific implementation. We first defined a simple, complete language called SLANG. A complete representation for the syntax and semantics of SLANG was formulated as a set of

Language Definition Tables for the processor. We defined the possible forms for execution tree elements, and the execution functions required*

We then considered the hardware design of a language processor based on our model, which could interpret SLANG.

The Analysis Control Unit was developed successively in 195 a flowchart, a signal diagram, and a set of register transfer equations. The Token Processors were divided into sections and designed to a similar level. W e next presented a design for the components of the Execution

Section, including flowcharts.

After developing the complete system design, we studied the detailed operation of the analyzer in recog­ nizing an example statement in the context of a program.

A full software simulation of the system for SLANG was implemented and described. The simulator included both analysis and execution, and was capable of complete interpretation of SLANG programs. Using the simulator, we processed sample programs and developed measures on the system performance.

Chapter VI then validated the usefulness of the system design by showing its ability to implement practical languages, especially FORTRAN and ALGOL. For each of these languages we developed a complete Graph System and discussed the semantic primitives and execution functions required. Special aspects of each language were seen to lead to enhancements in the processor design which would have wide applicability when included.

In Appendix A we discussed the problem of finding an upper bound for the number of TP's required for a given 196 grammar. Informally, the bound appears to be on the order of 1 0 for reasonable languages, if certain enhancements are made to the TP capabilities.

Finally, Appendix B discusses forms of data represent­ ation, and presents a model for the Storage Control System.

Our goal in the project was to design a processor which could interpret practical languages, change the language definition at will, and make effective use of parallel processing. These basic goals have been met.

Only a few previous high-level language machines have considered a variable language definition, and none have exploited parallel processors. We have also considered the implementation of complete popular languages in their current de facto definitions, a goal contemplated by few other systems.

We produced a complete design for a system for the simple language SLANG. The design was validated in terms of the function and interaction of the various processors by a software simulation. Hardware designs for the processors were proposed. These designs no doubt have bugs, but the building of working breadboard models should be straightforward.

Our simulation indicated that the language processor should give much better processing speed than the compile- 197 and-execute cycle of conventional systems, though not as good as execution alone. The processor is probably not useful for production software systems which are compiled but once and run constantly. For one-shot programs aimed at problem solving, though, the advantages are apparent.

The design required some intrinsic modification in proceeding from SLANG to various practical languages. This cycle should taper off after representative languages are implemented, so that further new languages will be less likely to require basic design changes.

The amount of storage required during analysis was encouragingly small. Space consumed by unsuccessful path traces and partial trees was minimal. The number of TP's and other processing units required appears to be lower than we first expected.

It has not proven greatly difficult to construct grammar graphs for various languages, or to design the other language definition information required. The graphs are intuitively clear. They are not fully optimal for processing efficiency, but the inefficiencies are not severe.

Much has been omitted that is needed in a practical system, such as good error recovery and diagnostic capabil- 198 ities. These features can be considered as modifications to the basic design and developed after actual experience.

The use of associative storage throughout has provided many benefits in such areas as speed of data access and efficiency of storage use. The partial use of conventional memory was rejected as much less attractive. The only significant disadvantage to associative storage lies in its cost.

7.2 Suggestions for Future Work

We developed a complete design for the simple language

SLANG, discussing only general processor requirements for other languages. A complete design and simulation encom­ passing block strucutre and full languages would be valuable. Many programs can then be run in such a simulator, and the performance can be compared with the same programs in compiler systems.

Interaction between the analysis and execution sections was mentioned only briefly. It should be possible to tie these sections together in a mure effective fashion, and overlap their activities, making the system more nearly a "single-phase" processor.

There is a need to investigate how the system could function with a limited number of physical processors in the various sections. Task queuing for virtual processors could be considered. 199

Procedures need to be developed for automatically generating efficient Language Definition Tables from grammar graphs and other descriptive information.

Algorithms are needed to check a grammar for non-ambiguitv and suitability for the system. It would be especially important to find an algorithm to determine the maximum

TP count required for a given grammar, as discussed in

Appendix A. A further refinement would be to manipulate the grammar until this count is at a minimum.

Reasonable error handling and response should be investigated. All possible errors should be detected as early as possible, and related in some way to the original text input for diagnostic purposes.

The design of a suitable Storage Control System, briefly outlined in Appendix B, can be the subject of more extended research.

Finally, the components of the system should be constructed and validated in actual hardware. Where possible, this should always be done in such a way that software simulation of components not yet built may be included to form a complete working processor. APPENDIX A

THE UPPER BOUND PROBLEM

In development of the language processing system, we observed that for a practical implementation it would be necessary to establish a provable upper bound on the number of Token Processors, i.e., the number of partial paths, which may be simultaneously active for a given grammar. This bound would depend only on the grammar, in contrast to the Operand Evaluators whose number is determined by the complexity of the program. This appendix discusses approaches to finding this bound.

We must start by observing that not all grammars which can be theoretically defined have a finite bound.

For example, if G is defined by the productions:

S Az A =* aa A ^ bb A -=^ aAa A bAb we have a grammar which recognizes any (arbitrarily long) string of a's and b's, followed by the same string reversed, followed by an end symbol. If the partial input string has consisted of some odd number, n, of a's, there will be

(n+l)/2 strings in the completion set. Each of these

200 201 represents a separate partial path requiring a Token

Processor, so there can be arbitrarily many TP’s active.

Fortunately, this is not representative of programming languages in general. Most programming languages are deterministic. Intuitively, this means that a limited amount of knowledge about the state of the input is enough to decide what must be done next. The formal definition of a deterministic language is beyond the scope of this presentation. It is related to a mechanism known as a pushdown automaton for recognizing sentences in a grammar. If such an automaton never has a choice of actions at any point in processing, the language is deterministic.

A related and more intuitive concept is the following:

Definition A.1. a context-free grammar G is LR(k) if for any sentential form Of , there is a unique way to write of = such that there is a rightmost derivation, S =$ fiA S where A replaced y at the last step; and where A and y can be determined uniquely by scanning

In other words, by "looking ahead" at most k input symbols, we can be certain of the only possible parse for the input string so far.

It is known that any LR(k) grammar is deterministic.

Moreover, for any deterministic language there exists a grammar which is LR(1) 0 This would seem to make parsing quite simple, since a one-character lookahead would be 202 enough to determine the single valid path. However, actually finding such grammars is difficult, and there is no guarantee that the semantic content of the language can be properly preserved. In fact, there is no algorithm to show that a given language is^ deterministic. Nonetheless, we postulate that the number of processors needed for most

Real Programming Languages is small, and that a bound on this number can be determined for a given grammar.

When initially processing any program, only one TP is active. For this number to increase, a TP must have occasion to generate more than One successor. Discussion will be simpler if we now assume that the TP has the capability to precheck the next input token, or at least the next character, and start new paths only if they are compatible with this character. Although this feature was not included in our system implementation, it was discussed as an extension, and it can be added with no new concepts.

There are two situations which can lead to splitting of partial paths: an internal node with multiple branches, or a complex exit node. These are illustrated in Figure A.1.

The number of active paths increases if two or more successor arcs are compatible with the next input character.

In the case of the complex exit node, some of the successors are found in the higher-level graph which invoked the current one. 203

a) INTERNAL NODE:

P2 b) EXIT NODE: >

Figure A.1 Splitting of Partial Paths

A path which traverses a terminal arc will of course remain a single path. If a path enters a non-terminal arc, we must examine the graph of the associated GV to see what paths may appear at the exit. It is clear that two or more paths may not exit from a particular GV at the same time, else the two paths would have identical prospects for complete success and the grammar would be ambiguous.

However, one path may exit while another remains active within the GV; this other path could then exit at a later time. This situation is illustrated in Figure A.2. We make the following definitions: 204

Figure A.2 Non-closure of Graphs

Definition A.2» A non-terminal arc in a graph system is closed if it is not possible for a path to reach the terminal node of the arc while another path remains active in a lower level graph.

Definition A.3. A graph is intrinsicallY closed if any arc containing the associated GV xs necessarily closed regardless of its successors.

Definition A 04 . A graph is effectively closed in a particular graph system if the associated GV is not contained by any arc in that system which is not closed.

Definition A.5 . A grammar represented by a graph system in which every graph is effectively closed is a closed grammar. 205

If a graph $ is closed and contains only terminal arcs, we may set a bound on the number of paths that can be

simultaneously active in if. Every path through ft has

a length (Def. 4.36), or a range of possible lengths,

based on the number of characters (not tokens) which may

be encountered. For a given length there is a finite

count of possible paths which may have that length.

Viewing the length as the number of time increments (i.e.,

input characters) since the graph was entered, we see that

the maximum of this count over all possible lengths is a

bound on the number of simultaneous paths. In fact by

considering compatibility of the tokens on the various

paths, we could find an actual bound which is much smaller.

Now consider a closed grammar G. Suppose that we

can order the graphs of G: (=goal), ^ » . . . , such

that may appear on a non-terminal arc of /$\ only if

i > j. In this case we have a regular grammar. It is

straightforward to calculate the bound for each graph

starting with the highest-numbered until we determine a

bound for S, and this is a bound for the TP's required in

the grammar.

If the graphs are all closed but the grammar is only

context-free, there is some recursion and no such ordering

is possible. This is shown, for example, by graphs EXP

and TERM in the SLANG grammar (Fig. 5.1). To find a 206 starting point here we mast observe that the mutual nesting of expressions and terms has to end with a term that follows the alternate arc and does not contain an expression.

We can determine a count for TERM using only the alternate paths, use this count to find a bound on EXP, and recheck

TERM as necessary to be sure the count does not increase.

We are now able to make an informal analysis of the

SLANG grammar. This is a fully closed grammar. To show this, we see that graphs STM, IFS, and ASG are clearly closed. PROG can be seen to be closed since 'END'/; is not compatible with any path entering the STM loop.

TERM is also closed since no two of the paths leaving node

1 could be started simultaneously.

Graphs DCL, ADCL, READ, and WRITE could lead to non-closed arcs if a successor arc contained a comma, but these are called only in STM and the only valid successor is a semicolon. They are effectively closed.

VAR has a variety of successor arcs, but they can be enumerated and none contains a semicolon. Finally, EXP can also be seen to occur only on closed arcs.

The maximum counts for VAR, DCL, and ADCL are clearly one. Using VAR we find a count of one also for READ,

WRITE, and IFS. The count for TERM is one if the EXP branch is not taken. Using this the count of EXP is one, and rechecking TERM confirms one as a consistent value.

Then the count of ASG is also one. 207

STM is a bit more interesting, giving a count of three. Certain input characters (letters) may start 3 paths: 1-2, 3-4(ASG), and 3-4 for at most one of the five keywords. Only one of these paths will survive to node 4.

In PROG, our goal symbol, node 2 branches to STM and also to 'END.' The first letter of 'END' does not match that of any other keywords. The count stands at three.

We can thus conclude that only three TP's are needed to analyze a SLANG program, regardless of the complexity of the program, if the TP's precheck the next character before starting other processors. In our design and simu­ lation this prechecking was omitted, and in fact we used up to 8 TP's. This worst case occurred at node 2 of STM.

We were required to start TP's to look for a label name, variable name, and all six possible keywords.

Analysis of the full grammar for FORTRAN or ALGOL is more complicated but no different in principle. In each case we find that the grammars are not completely closed, but can be made so by transformations which will not change the language recognized.

Consider the FORTRAN grammar. Most of the graphs can be seen by inspection to be closed, at least effectively.

For example, GOTOST, CONST, and ADCL are all closed, and further have a count of one. VREF also has a count of one. It is not intrinsically closed since there is a 208 complex exit node, but it does not appear anywhere with a left parenthesis as a valid successor. DVAL has a count of two, since an initial integer can launch two paths

(1-2 and 4-5). Although there is no successor node, it

is not intrinsically closed; the sequence "i/*" could lead to two paths at 1-2-3 and 4-5-exit. However, the asterisk appears nowhere as a valid successor, so closure is preserved.

A problem arises with IOLIST. This recursive graph

is in fact not closed. When it calls itself, a comma at node 1 2 can cause a loop back via 1 2 - 1 or an exit through

3-4. If we seek a count here we run in circles. However, without compromising the grammar we may IOLIST into the two graphs shown in Figure A. 3. These graphs are effectively closed, since IOLIST does not appear with a comma as successor. In this system we may observe that

a sequence which begins "(/VREF/;/VREFU may cause three

paths to be active, and that no possible sequence will

generate more.

Additional difficulty arises when we consider

expressions. The graphs of interest are APRIM, LPRIM,

AEXPR, LEXPR, , and FREF. All of these are closed,

but there is syntactic ambiguity. To find a reasonable

bound we must consider the semantic primitives involved.

They will ensure that; 209

a) Original:

IOLIST IOLIST

VREF VREF

b) Modified:

IOLIST:

IOEL IOEL:

VREF VREF

Figure A.3 Modification to IOLIST 210

1. No variable name can start both a VREF and an FREF;

2. Logical and Arithmetic Expressions must be composed of elements of the proper type.

Even considering these rules which will, for example,

prevent a variable declared LOGICAL from being accepted

as an APRIM, we have major inefficiencies. When seeking

an EXPR, an entire AEXPR will be scanned twice, both

for itself and as the first part of a relational expression.

Although it would still be possible to estimate a limit, here is a case where it would be much simpler to make our grammar more forgiving, allowing a more general expression

form to replace all references to AEXPR and LEXPR. Invalid

forms can be rejected by semantics or by EP's. This modification is shown in Figure A.4. Here EXPR and PRIM are effectively closed and have a count of 8 ; the high count is due to the accident that the two logical operators and six relational operators all begin with period.

We will make one additional modification by combining the first part of ARIFST and LGIFST. The count for

all of the FORTRAN graphs can then be found. The results are listed in Table A.1. Graphs marked with an asterisk are intrinsically closed; others are effectively closed.

The result of interest is the count for PROG, a full

FORTRAN program, which is 8 . 211

PRIM: NOT.

VREF FREF CONST .TRUE. .FALSE.

EXPR: PRIM

RELOP .AND. .OR.

Figure A.4 Modified Expression Graphs Table A.1 Count for FORTRAN. Graphs

GRAPH COUNT GRAPH COUNT

♦LABEL 1 ♦ASGST 8 ♦CONST 2 EXT ST 1 OVAL 2 EQVST 1 CVAR 1 COM ST 1 SUBEX 1 DIMST 1 VREF 1 ♦WRST 3 ♦FREF 8 ♦RDST 3 ♦ADCL 1 ♦DOST 1 IOLIST 3 ♦CALLST 8 ♦IOEL 3 ♦SUBRST 1 ♦RELOP 6 ♦FUNCST 1 ♦PRIM 8 ♦SFNCST 8 EXPR 8 DATST 2 ♦ENDST 1 TYPST 2 ♦BDATST 1 ♦EXECST 8 ♦AUXST 1 ♦SPEC ST 2 TERMST 1 ♦PBODY 8 ♦CONTST 1 ♦BDATA 4 ♦RTRNST 1 ♦FUNC 8 ♦FORMST 1 ♦SUBR 8 ♦STMA 8 ♦SUBP 3 ♦IF ST 8 ♦MAINP 8 ♦GOTOST 1 ♦PROG 8 ♦LASGST 1 213

It is instructive to consider intuitively the effect of variations in the TP complexity on scanning, for example, for a FORTRAN executable statement (EXECST). Such a state­ ment may begin with a variable name or any of 14 keywords; at most three of these keywords start with the same letter.

For TP's with no lookahead, 16 paths must be started here.

With one-character lookahead, only four are needed.

Another possible improvement is to allow a single TP to scan for a set of mutually exclusive keywords. In this case, even with no lookahead, only two TP's are needed, and the number for the entire grammar can likewise be reduced.

A similar analysis can be made for ALGOL. APPENDIX 8 MEMORY SYSTEM STRUCTURE

The design for the parallel language processor presented in this dissertation relies on an unconventional memory system. This appendix discusses a possible structure for that system.

A block diagram for the language processor memory system is shown in Figure B.1. The heart of the storage system is a content-addressable or associative memory (AM), of which many types now exist. We envision a memory organized into cells on the order of 100 bits; this bit width should be adequate to hold any simple data item with suitable descriptors and classifiers. The form of individual data items will be discussed later.

We assume that cost of storage is not an obstacle and that the number of cells will be "adequate," providing storage for analysis, symbol tables, and other processing overhead, plus as much data storage as a reasonable conventional system. The present size limit for a single direct-access associative memory is about 4-8K cells.

This limit is increasing, and systems are being developed to organize AM modules in hierarchies for large data-base

214 215

ASSOCIATIVE MEMORY MODULES

4K x 100

TO OTHER PROCESSORS

Figure B.1 Memory System Block Diagram 216

applications. Our diagram shows such an array of modules,

divided arbitrarily at 4K cells. This division would not

be visible outside the memory system.

The intrinsic capabilities required for the AM itself

are common ones. It must do a masked search and read back

the matched cell. In the case of multiple responders it must be able to read them one at a time in some manner, as,

by selecting one each time and unmarking it. Since many

processors may be using the storage system independently,

the mark field must be several bits wide and use a unique

identifier supplied by the calling process.

The memory must be able to perform a full or masked write to selected cells. A multi-write capability is

less important. It will also be useful in some applica­

tions to search for maximum or minimum on a given field;

however, this is not essential.

The AM is accessed by the Storage Control Unit. This

is the only unit to communicate with cells at the level of

operation described above. All other processors view the

SCU as providing access to virtual data structures such as

lists, queues, and tables.

To discuss the role and operation of the SCU we must

define a philosophy of data representation. Our approach

is to make all storage entities self-describing. This 217 approach forms a natural complement with associative access to provide advantages in efficiency and in validity checking without overhead, lacking in conventional systems. It is simple, for example, to reject an out-of-bounds array reference or detect a dangling pointer.

Each cell in the AM contains one or more data entities together with type codes, information on the higher-level structure to which the data belongs, and a unique identity code. In most cases only one data entity occurs per cell and packing is avoided. Where a number of entities are required we will generally use pointers to the actual data.

A data entity is an indivisible unit of information in a particular format, together with a type code. Examples include integers, real numbers, pointers, etc. "Undefined" is a default type for vacant data entities. Representation of some basic data is sketched in Figure B.2. This of type coding has been developed by McMahan and Feustal^Q.

Johnston^, and others.

The type code may be used or ignored by processing units, depending on their function. Type codes would be widely ignored in fetches by the SCU, except that it may automatically dereference some types of pointers. An

Execution Processor trying to add two integers must be sure the data are integers. An entity not intended as a pointer could not accidentally be used as one. 218

TYPE

INTEGER: int length. value

REAL: real length exponent mantissa

CHARACTER: char

POINTER: pntr ID

UNDEFINED: 0 - - -

Figure B.2 Data Entities 219

Cells containing data entities are grouped into data

aggregates of various forms. Each cell contains an ID

(a pseudo-address) which identifies it uniquely, a descrip­

tor code for the group to which it belongs, and an identi­

fier for the group. There are several possible ways of

grouping cells depending on their logical relationship and

methods of access. Figure B.3 shows four possible types of

organization. A double pointer can implement general linked

lists or trees. A single pointer can implement linear

sequences, queues, etc. An index can be used for indexed

sequences, vectors, and arrays. Unordered tables and sets

can simply be associated by the group identifier. The

data elements in these cells may be actual values or

pointers.

The task of the SCU is then to present virtual

stacks, tables, etc. to the calling Processors by building

on these concepts. The caller will view operations such

as "attach entry to queue" as primitive, even if several

accesses to the AM are involved. Some examples of SCU

tasks follow:

Stacks. Organize stacks as indexed sets. Create a stack with a name supplied by the caller by writing one cell with index 0. Access the top element by doing a maximum

search on the index field. To add an element, access the

top and write with the next highest index. There is no 220

LIST: ID 1 group data ptr ptr

SEQUENCE: ID 2 group data ptr

VECTOR: ID 3 group data index

TABLE: ID 4 group data data data

Figure B.3 Structure Cells 221 need to predict how large a given stack might grow.

Entries can be any element or structure, including other stacks.

Vectors. Vectors can also be implemented as indexed sets.

A special cell in the group can be used to identify the index bounds imposed, if any. Storage is used only for components actually occupied. Reading automatically fails if the index does not represent a cell previously written. Writing can check the bounds descriptor to ensure validity. Data entries can be vectors again; in this way multidimensional arrays can be constructed.

Tables. Tables are among the simplest structures to implement as a natural manifestation of associative storage. A table is a group of entries, each containing two or more data items in a fixed format. Items to be stored can simply be written, after perhaps checking for duplication. Lookup is a simple match on the appropriate fields. If all the data for an entry cannot be contained in a single cell, cells may be linked. The organization

is then more complex but not different in principle.

Besides implementing basic operators on data struc­ tures, the SCU can perform some specialized functions.

For example, as most data is reached only through pointers,

it is possible to check whether any pointers remain to a given structure. If not, the structure can be deleted. APPENDIX C

LANGUAGE DEFINITION TABLES

This Appendix presents listings for the Language

Definition Tables for the Language SLANG, as used in the software simulation.

2 2 2 TITLE "SLANG LANGUAGE DEFINITION TABLES"

} LANGUAGE DEFINITION TABLES i FOR THE DEMONSTRATION LANGUAGE "SLANG" / i JIM MOONEY

J t } INTERNAL GLOBALS

ENTRY 3TTBL,CLAST,KWSTR,KWCOD,PRTAB

; CONSTANT DEFINITIONS

LBIT=1008 ;LETTER FLAG DBIT=2000 iDIGIT FLAG CST“0 STATE COUNT

} ASCII CODES

BLANK=240 T h B=211 E0F=234 END OF FILE i KEYWORDS

LPAR=0 ;< RPAR=1 EQ = 2 .; = C0MMA=3 i , PLUS=4 } + M IH U S = 5 i- STAR=S SLASH=? ;/ COLON=10 SEMI =*11 a 5iREAD=12 i "READ" *'.WR I TE= 13 i "WRITE B ■'.DCL--14 :"DCL rt *;END= 15 "END ii IF = 1S ” IF 11 GTHAN=1? ; > LTHAN=20 ; < XARRAY=21 ;"ARRAY" EJECT 224 } SEMANTIC designators i *;'ASG = 40 ASSIGN 5SSE0 = 4I ! SEQUENCE >:LI3T = 43 j LIST ';EVh R = 44 ;EVALUATE ARRAY ELEMENT "iRSET = 50 i SETUP READ CHANNEL 5iRD*51 ; READ *4USET®52 > SETUP WRITE CHANNEL ■;UR = 53 } WRITE >JDCLI=54 ; DECLARE INTEGER ■4ADCL-55 } DECLARE ARRAY ‘i A B D = 14 -ADD SSUB=15 ■SUBTRACT %MPY = 16 MULTIPLY >IBIV=17 ;INTEGER DIVIDE LT = 2Q :< TEST LESS THAN JiEG = 21 ;TEST EQUAL :;GT = 22 TEST GREATER THAN XCOND = S0 ; CONDITIONAL :';goto=si } branch

ENDS ; 225 > SYNTAX SECTION j ■ MACRO DEFINITIONS

} STANDARD TRANSITION STATEWORD

DEFM ST, A + A WRD CST WRD B WRD D-l + E WRDF WRD C WRD O, O WRD . +1 ENDM

NEW STATE DEFINITION

DEFM STATE,A A=CST+i HT = A ENDM ) ENDS MAIN STATE TRANSITION TABLE

TTBL=.

PROGRAM

STATE PROG ST 1, NS1, 0PRG3, STM, 2 , $SP1 ST 1, HS1,0PRG1,VAR,2,TSP1 ST 1, #*READ, 0PRG2, RD, 2, $SP4 ST 1, #5*URITE, 0PRG2, WR, 2, $SP4 ST 1, itDCL, 0PRG2, HCL, 2, TSP3 ST 1, # A R R A Y, 0 P R G 2, A D C L, 2, T S ST 1,it* IF, @PRG2 I FS, 2, 0 ST 2, NS1, 0PRGS, STM, 2, TSP1

ST lL ; MSI, 0PRG4, VAR, 2, TSP1 ST 2; #JiREAD, 0PRG3, RD, 2, $SP4 ST 2, #:-;UR I TE, 0PRG5, MR, 2, $SP4 ST 2, tt:;BCL, 0PRG5, riCL, 2, TSP3 ST 2, #5iARRAY, 0FRG5, ADCL, 2, $S ST 2, IF, 0PRG5, I FS, 2, 0 ST 2. #*END, 0, PROG, 3, 0 ST 3, itSEMI, 0, PROG, 4, 0 ST 4, 0, 0, 0, 0, 0

STATEMENT

STATESTM ST 2, iiCOLON, 0, STM, 3, $Lh B1

ST ■1* > itDCL, 0STH2, DCL, 2, TSP3

ST / ARRAY, 0STM2, ADCL, 2, TSP3

ST •_> t it "i READ, GST M2, RD, 2, TSP4 c y o t tt^liJR I TE, 0STM2, UR, 2, $SP4 "7 ST t #*IF, 0STM2, I FS, 2, 0

ST O t MSI, 0STM1, VAR, 2, TSP1 ST 4, ttSEMI, 0, STM, 5, 0 ST 5, 0, 0, 0, 0, 0

READ STATEMENT

STATE RD ST 2, itLPAR, 0, RD, 3, 0

ST J IS1,0RD1,RSEL,2,$RSEL1 ST 4, ttRPAR, 0, RD, 5, 0 ST 5, NS1, 0RD2, VAR, 2, TRDEL1 ST 6, ttCOMMA, 0, RD, 5, 0 ST S, 0, 0, 0, 0, 0 227

WRITE STATEMENT

STATE WR ST 2, #LPAR, 0, WR, 3, 0 ST •i> t 131, 0WR1, WSEL, 2, TWSEL1 ST 4, ttRF', 0, WR, 5, O ST 5, NS1, 0WR2, VAR, 2, TWREL1 ST 6, ttCOMMA, 0, WR, 5, 0 ST S, 0, 0, 0, 0, 0

DECLARE STATEMENT

STATE DCL ST 2, N31, 0BCL1, HCEL, 2, $BCEL1 ST 3, ttCOMMA, 0, DCL, 2, 0 ST 3, 0, U, 0, 0, 0

ARRAY DECLARATION

STATE ADCL ST 2, HS1, 0ADC1, ABCEL, 2, $ADCL1 ST 3, ttCOMMA, 0, ADCL, 2, 0 ST 3, O, 0, 0, 0, 0

IF STATEMENT

STATE IFS ST 2, # L P A R, 0, I FS, 3, $IFS1 ST 3, NS1, 0IFS1, VAR, 2, $SP1 ST 4, #RF'AR, 0, IFS, 5, 0 ST 5, HS1, 8, IFS, 6, $IFS2 ST 6, 0, 0, 0, 0, 0

ASSIGNMENT STATEMENT

STATE ASG ST ic. > #EQ, 0, ASG, 3, TASG1 ST 3, #LF'AR, 0ASG2, TERM, 2, 0 ST 3, 131, 0ASG1, TERM, 4, $SP1 ST 3, NS1, 0ASG1, VAR, 2, $SP1 ST 4, 0, 0, 0, 0, 0

ENDS LOGICAL EXPRESSION

STATE LEXP ST 2, #GTHAN, 0, LEXP, 3, $LEX1 ST 2, #EQ, 0, LEXP, 3, TLEX2 ST 2, 4LTHAN, 0, LEXP, 3, TLEX3 ST 3, MSI, 0LEX1, VAR, 2, $SP1 ST 4, O, O, 0, 0, 0

EXPRESSION

STATE EXP ST 1, #LPAR, 0EXP2, TERM, 2, 0 ST 1,131, 0EXP2, TERM, 4, $SP1 ST 1, NS1, 0EXP 1, VAR, 2, $SP1 ST 2, 4PLUS, 0, EXP, 1, $EXP1 ST 2, #MINUS, 0, EXP, 1, TEXP1 ST 2, #3TAR, O, EXP, 1, TEXPl ST 2, #SLASH, 0, EXP, 1, $EXP1 ST 2, 0, 0, 0, 0, 0

TERM

STATE TERM ST 2, #LPAR, 0PRM2, TERM, 2, 0 ST 2, IS1, OPR M2, TERM, 4, $SP1 ST 2, MSI, 0PRM1, VAR, 2, *SP1 ST 3, ttRPAR, O, TERM, 4, 0 ST 4,0,0, 0,0,0

VARIABLE

STATE VAR ST 2, ttLPAR, 0, VAR, 3, $VAR1 ST 2, 0, 0, 0, 0, 0 ST > I SI, 8, VAR , 4, $SP2 ST NS1, 0, VAR , 4, *SP2 ST 4, #RPAR, 0, VAR, 5, O ST 5, 0, 0, 0, 0, 0

ENDS 229

DECLARE ELEMENT

STATE DCEL ST 2, 0.> 0> 0> Q j 0

ARRht DECLARE ELEMENT

STATE ABCEL ST 2, #LPAR, 0, ADCEL, 3, 0 ST 3, IS1, 0, ABCEL, 4, $SP1 ST 4, #RF'AR, 0, ADCEL, 5, 0 ST 5, 0? 01 01 0, 0

READ SETUP ELEMENT

STATE RSEL ST 2, 0, 0, 0•> 0, 0

WRITE SETUP ELEMENT

STATE WSEL ST 2, 01 0, 0j 0, 0

END OF TABLE

WRD -1 ENDS 230 i STACK MODIFIERS f 0PRG1: WRD ASG+1,*SF1,0PRG2 0PRG2-. WRD STM+3,TSP1,0PRG3 0PRG3: WRD PROG+1, TPRG1, -1 0PRG4 = WRD R S G + 1 .■ f S P11 0 P R G 5 0PRG5: 10 RD STM+3,TSP1, 0PRG6 0PRG6: 1.0 RD PROG+1,$SP2,-1

0STM1: 1.0 RD ASG+1,TSP1,0STM2 0STN2: 1.0 RD STM + 3, $SP1, -1

0BCL1: 1.0 P. D DCL + 2, $3P2, -1

0h BC1 j 1.0 P. D ADCL+2,3SP2,-1 t 0RD1 : 1.0 RD RB + 3, $SP2,-1

0RB2: 1.0 RD RB + 5, 3SP2, -1

0UIR1 : 1.0 RD l.OR + 3, 3SP2, -1 I 0UR2: l.O RD l.OR + 5, 3 S P 2, -1

0IFS1: 1.0 RD LEXP+1,3SP1,0IFS2 0IFS2: 1.0 RD IF3+3, 3SP2, -1

0ASG1: 1.0 RD TERM+3,3SP1,0ASG2 0ASG2: 1.0 RD EXP+1,3SP1,0ASG3 0ASG3: 10 RD ASG + 3,3SP1,-1

0LEX1: l.O RD LEKP + 3, 3SP1, -1

0EXP1: 1.0 RD TER.N+3, 3SP1, 0EXP2 @E*P2: 1.0 RD EXP+1, 3SP1, -1

0PRM1: 1.0 RD TERM + 3,3SP1, 0PRM2 0PRM2: 1.0 RD EXP+1.TSP1,0PRM3 0PRM3: 1.0 RD TERM+2,3 3 P1,-1

ENDS ; TOKEN SYNTAX TABLES } } INTEGER

I SI: I.J RD S, 377, BLANK, 0, 0, I SI, 0, . +1 WRD 0, 377, TAB, 0, O, ISl, 8, . +1 WRD 0, DBIT, 0, 8, 0, -1, 0, . +1 WRD 1, DBIT, 0, 0, TSEM1, 0, 0, . +1 WRD 0, 0, 0, 0, TSEM1, I32, 0, . +1 t ioc.:TOO. WRD 2, 377, 10, 0, 0, -1, 0, . +1 WRD 1, DBIT, 0, 0, TSEM1, 0, 0, . +1 WRD 0, 0, 0, 0, TSEM1, 132, 0, . +1

} IDENTIFIER NAME f NS1 : WRD 0, 377, BLANK, 0, 0, NS1, 0, . +1 WRD 0, 377, TAB, y, 0, MS 1, 0, . +1 WRD 0, LB IT, 0, 0, 0, -1, 0, . +1 WRD 1, LBIT + DBIT, 0, 0, TSEN1, O, 0, . +1 WRD 0, 0, 0, R, TSEN 1, NS2, 0, . +1 } NS2: WRD 2, 377, 15, 0, 0, -1, 0, . +1 WRD 1, LBIT + DBIT, 0, O, T3EMI, 0, 0, . +1 WRD 0, 0, 0, 0, TSEM1, NS2, 0, . +1 t ENDS 232 ASCII CODE CLASS TABLE

CLAST: I.IRD 9, 0, 0, Q WRD 0, 0 , 0 , 0 WRD 0.i 0.i 0j 0 WRD 0.. 0, 0, 0 WRD 0.. 0,. 0, 0 WRD 0.. 0. 0, 0 W R D 0 .■ 0 0 , 0 WRD 0, 0, 0, 0 WRD 0, 0, 0, 0 WRD O, 8, 0, 0 WRD 0. O. O, 0 WRD 0.. 0. 0.- 0 WRD DBIT.. DBIT, DBIT/ DBIT WRD DBIT,DBIT,DBIT,DBIT WRD DBIT, DBIT, 0, 0 WRD 0, 0, 0, 0 WRD 0, LB IT, LB IT, LB IT WRD LBIT,LBIT,LBIT,LBIT WRD LBIT,LBIT,LBIT,LBIT WRD LBIT,LBIT,LBIT,LBIT WRD LBIT,LBIT,LBIT,LBIT WRD LBIT,LBIT,LBIT,LBIT WRD LBIT,LBIT,LBIT,0 WRD 0, 0, 0, 0 WRD 0, 0, 0, 0 WRD 0, 0, 0, 0 WRD 0, 0, 0, 0 WRD 0, 0, 0, 0 WRD 0, 0, 0, 0 WRD 0, 0, 0, 0 WRD 0, 0, 0, 0 WRD 0, 0, 0, 0 ENDS : KEYWORD STRINGS } ASC s i s l.J RD 0.- 0.. 0i 0, 0, 0 O ASC ✓ ) / WRD 0.. 0/ 0! 0, 0 > 0 0 ASC / = / 10 RD 0.- 0.. 01 0.| 0, 0 0 ASC s , s 10 RD 0.. 0.| 0> 0, 0 > 0 0 ASC / + / 10 RD 0..0.| 0..01 01 0 0 ASC /-/ 10 RD 0.. 0 > 0 > 0.. 2, 0 0 ASC /*/ l.O RD 0.- 0.| 0..0..0 j 0 0 ASC II ^ II l.O RD 0.|0.| 0..0 > 0 > 0 0 ASC s ■. / 10 RD 0..0.| 0/ 0..0 f 0 0 ASC / J / l.O RD 0/ 0.. 0i 0 r 0i 0 0 ASC /READ/ 1.0 RD 0.. 0.. 0.-0, 01 0 ASC /WRITE/ l.O RD 0,| 0.| 0/ 0.| 0 ASC /DCL / 1.0 RD 0.. 0 / 01 0, 0 > 0 ASC/END1/ 1.0 RD 0, 0.. 0, 0, 0 > 0 ASC /IF/ 1.0 RD 0..0 > 0 > 0 > 0, 0 0 ASC />/ l.O RD 0.- 0.. 01 0 f 0 /0 0 ASC / 0 0 AbC /ARRAY/ l.O RD 0.| 0/ 0, 0 / 0 J

} END OF SYNTAX SECTION

t ENDS j START OF SEMANTIC SECTION

l r =s RCELL=] ;ROOT CELL LCELL=2 LEAF CELL ST0LP=: }STORE AT LP IHCLP=‘ ;INCREMENT LP GPR10 = !: ■ TEST PRIORITY STLAB=f STORE LABEL NEXT ~ ? ;BRANCH

TSP1 : WRD STOLP,#LR WRD 6

TSP2: WRD STOLP,#LR WRD INCLP WRD 6

$SP3: WRD LCELL, JSLIST WRD 0

$SP4: WRD LCELL, *SEQ WRD 0

TLAB1: WRD STLAB WRD 0

TPRG1: WRD LCELL,2SEQ WRD STOLP,ttLR WRD INCLP WRD 0

SIFS1: WRD LCELL, JiCOND WRD 0

TIFS2: WRD LCELL, :-:GOTO WRD STOLP*ttLR WRD 0

T h S G 1: WRD RCELL, JiASG WRD INCLP WRD 0 EJECT 235 $LEX1: WRD RCELL, 5SGT WRD INCLP WRD 0 ,1 *LEX2: WRD RCELL, ’4E Q WRD INCLP WRD 0

$LEX3= WRD RCELL,XLT WRD INCLP WRD 0

JEXP1: WRD GPRIO,$EXPA WRD RCELL,#LR WRD NEXT,$EXPB $EXPA: WRD LCELL,#LR SEXPB: WRD INCLP WRD 0 ji $VAR1: WRD LCELL,XEVAR WRD INCLP WRD 0

TDCEL1: WRD LCELL,XDCLI WRD STOLP,#LR WRD 0 iADCLl: WRD LCELL,XABCL WRD STOLP,#LR WRD INCLP WRD 0 f TRSEL1: WRD LCELL,XRSET WRD STOLP, #LR WRD 0

$RDEL1: WRD LCELL,XRD WRD STOLP,#LR WRD 0 J JbJSEL 1 : WRD LCELL,XWSET WRD STOLP,#LR WRD 0

SWRELl: WRD LCELL,XWR WRD STOLP,#LR WRD 0

ENDS ; PRIORITY TABLE j WRD PLUS, 2 WRD MINUS,2 WRD STAR, 1 WRD SLASH,1 WRD -1

} KEYWORD CODES

KUCOB: WRD 0,0, 0,O WRD ADD, ’:SUB, £MPY, T'.DIV WRD 0, 0, 0, 0, 0, 0, 0, 0 WRD 0, 0, 0, 0, 0, 0, 0, 0

i TOKEN SEMANTICS

TSEH1: WRD 1 ;ADD CC TO STRING WRD O J $$EHB: END APPENDIX D

ANALYSIS MODULE PROGRAM LISTING

This Appendix presents the listing of the Analysis

Module in the simulation software system. The program is written in the SALP^ structured assembly language.

237 TITLE "ANALYSIS MODULE" 23a

HIGH-LEVEL LANGUAGE PROCESSOR SYSTEM SIMULATOR

THIS MODULE CONTAINS THE TOKEN PROCESSORS } FOR ANALYSIS AND TREE BUILDING } f } -JIM MOONEY

; ANALYSIS SECTION

} INTERNAL GLOBALS

ENTRY ANLYZ ENTRY PBLKS, PRCMX

■ EXTERNAL GLOBALS

MAIN STORAGE ACCESS

EXTRN ENTh B, R L C E L .• RLCHN, LKUPS, PTSTR, GTURD EXTRN GTCEL, F'TCEL, ALCEL, STCEL, LKCHN, PTURD

LANGUAGE DEFINITION ACCESS

EXTRN GTLDU,GTLDS,GST AT,GTSMA,GLDPR EXTRN CLAST,GTKUIC

MISCELLANEOUS

EXTRN ^NCHAR, XCCHAR, fcRSULT, LSMTAB EXTRN GTCHR,PTCHR,MVBLK,CLBLK,PSCAN EXTRN MSKXH,MSKXL,M8KX,ABORT

; CONSTANT DEFS

E0F=234

PBLTH=20 F'ROC BLOCK LENGTH PBT0T=20+PBLTH TOTAL TP BLOCK AREA CLGTH=10 CELL LENGTH

ENDS 239 } ANALYSIS MODULE > ANLYZ: SUBROUTINE CALL IHITA .INITIALIZE LOOP CALL SETTP ;SETUP NEW TP TASKS CALL TPCNT jCOUNT ACTIVE TP'S EXIT IF TPAC ETZ jNONE EXIT IF ’;SU C C F GTZ ;SUCCESS CALL PSCANi GET NEXT INPUT CHAR MOVE =PBLKS.+PRCIX ;RUN ACTIVE TP'S RR AX.. XR DOWHILE #0 GEZ CALL IF .AX GTZ.TPROC ABB2 PRCIX*. =PBLTH. PRCIX MR PRCIX.XR EHBWHILE EHDLOOP IF XSUCCF ETZ ZRC AX j INVALID ELSE IF XSUCCF GT 1 OR IF TPAC GTZ ZR PI. AX ;AMBIGUOUS ELSE ZR AX ;OK END IF RETURN ANLYZ

ENDS 240 / i ABORT TRANSFERS

ABRT4: MR I 4, GRS SKP ABRT5: MR I 5, GRS iTP'S EXHAUSTED SKP ABRT 6: MR I 6> GRS ;BAD CELL NAME SKP ABRT7: MR I 7, GRS JSTORAGE EXHAUSTED SKP ABRT8: MR I IS,GRS ;STRING AREA EXHAUSTED JU ABORT

ENDS 241 } INITIALIZE ANALYZER

INITA: SUBROUTINE CALL CLBLK iCLEAR P-BLOCKS ZM $FAIL ZMDEF PRCMX iMAX PROC COUNT ZM ’:SUCCF ZM SiRSULT ZM *;SMTh B ;SYMBOL TAB PTR ZMDEF TQPR ZM P 1 .• TCNODE ;SETUP FIRST TASKS CALL TASKS RETURN INITA } ENDS SETUP HEW TP TASKS

ETTP: SUBROUTINE MOVE =PBLKS,PRCIX BOWHILE TQPR NEZ MR PRCIX,XR BOWHILE WO NEZ JC AX, LTZ, ABRT5 ADB2 PRCIX,=PBLTH,PRCIX MR PRCIX,XR ENBWHILE CALL CLBLKCTPBLK,PBLTH) MR TQPR, XR MOVE #7,TQPR MOVE #0,fTSTAT MOVE #2,TSTAKP ZM PI,TAFLG RR XR, AX CALL RLCEL i RELEASE LINK CELL JC AX,LTZ,ABRT6 CALL RSTPB jSTORE THE TP BLOCK ENDWHILE RETURN SETTP ENDS * 243 ■ COUNT ACTIVE TP'S

TPCNT: SUBROUTINE ZMDEF TPAC MOVE =PBLKS,PRCIX PR AX,XR DOWHILE #0 GEZ IF .AX NEZ I NCR TPAC END IF ADDS PRCIX,=PBLTH,PRCIX RR AO, XR ENBWHILE IF TPAC GT PRCMX MOVE TPAC, PRCMX ; RECORD MAX END IF RETURN TPCNT ENDS TOKEN PROCESSOR

TPROC: SUBROUTINE CALL GETPB ;GET P-BLOCK DATA MOVE SSTAKP,*STPSV i SAVE FOR RELEA CALL TKREC i TRY TO RECOGNIZE IF .AM LTZ ;IF FAILURE ZM SAFLG ELSE IF .AX GTZ j IF SUCCESS LOOP CALL GTTOP ; GET TOP OF STACK ZM SFAIL IF SSSEQ GTZ ;IF SSEQ NOT NULL LOOP j DO SEMANTICS MR SSSEQ, XR CALL GTLDU EXITIF .AX LEZ CALL INDEX .AX, XTABL,=XTBSZ ENDLOOP END IF EXITIF $FAIL NEZ ;IF FAILURE CALL TASKS >SETUP TASKS EXITIF .AX NEZ j NOT AN EXIT NODE IF SLINK ETZ jIF STACK EMPTY I NCR XSUCCF :.SET SUCCESS MOVE SCR,XRSULT EXITLOOP ELSE MOVE SLINK, SSTAKP END IF MOVE SCR, SLR ; NEW LR ENDLOOP ZM SAFLG i RELEASE THE TP END IF CALL RSTPB ;RESTORE FLAGS IF SAFLG ETZ MR STPSV,AX }RELEASE GARBAGE CALL RLCHN END IF RETURN TPROC ENDS 245 i TRY TO RECOGNIZE A TOKEN ) TKREC: SUBROUTINE IF $TSTAT LTZ CALL KWREC ;KEYWORD ELSE IF .AX GTZ CALL CLREC ;TOKEN CLASS ELSE ZRC AX i ESCAPE ENDIF RETURN TKREC ENDS KEYWORD RECOGNITION

KldREC: SUBROUTINE IF TTPTR ETZ ; IF FIRST TIME MR XCCHAR, AX CALL MSKXL IF .AX EQ =BLANK OR IF .AX EQ =TAB ZR AX ELSE MR JTSTAT,AX ;GET KEYWORD CALL GTLDSCTSTRNG) ZR PI, AX END IF END IF IF .AX NEZ > IF NOT SKIPPING BLANK MR I JSTRNG, XR ;CHECK NEXT CHAR MR $TPTR,AX CALL GTCHR MR XCCHAR,AX CALL MSKXL IF .AX HE .GR1 ZRC AX ;NO MATCH ELSE I NCR $TPTR MR I -tSTRNG, XR MR $TPTR,AX CALL GTCHR IF .GR1 ETZ MR £TSTAT,AX CALL GTKUC RM AX, *LR ZR PI, AX ELSE ZR AX ;PARTIAL END IF END IF END IF RETURN KWREC ENDS i TOKEN CLASS RECOGNITION f CLREC: SUBROUTINE MR STSTAT,AX LOOP Ch LL GSTAT <$TSCEL> MR $TSCEL,XR MR #TCHRS,XR MR 4tQ, GR1 MR $ T S C E L + 3 j GR2 RR GR1,AX EQ/NE FLAG MR $TSCEL+1,AY CALL MSKX ;MASK IF .AX EQ TTSCEL+2 .: TEST & ADJ. FLAG RSC GR2 END IF EXITIF . GR.2 NEZ ;IF TEST SATISFIED MR $TSCEL+7,AX ENDLOOP ;TRY NEXT CELL MOVE TTSCEL+5,$TSTAT ;NEXT STATE M 0 V E TTSCEL + 4,+ R C SIX ;S E M ANTIC INDEX IF RCSIX GTZ ;IF NOT NULL LOOP ;EXECUTE MR RCSIX,XR I NCR RCSIX CALL GTLDUI EXITIF .AX ETZ CALL INDEX . AX, TSTBL,=TSTSZ ENDLOOP END IF MR TTSTAT,AX IF .AX GTZ i IF NOT FINISHED ZR AX ;SET FLAG ELSEIF .AX ETZ ;IF FULL MATCH CALL LKUPS (. TSTRNG > .: LOOKUP IF .AX LTZ ;IF NOT PRESENT CALL PTSTR($STRNG} JC AX,LTZ,ABRT3 END IF RS AX, LI STL RM AX, R1,$LR ; SET LR ZR PI, AX END IF RETURN CLREC } ENDS CHARACTER LIST

TCHRS: I.JRD XCCHAR l.i.IRD x n c h a r I.JRD $TPTR I.JRD $ T 1 I.JRD *T2 f ■ GET TP DATA

GETPB: SUBROUTINE MOVE PRCIX, GPB1 CALL MVBLK<+GPB1, TPBLK, PBLTH) RETURN GETPB

; RESTORE TP DATA

RSTPB = SUBROUTINE MOVE PRCIX,RPB1 CALL MVBLK '■ TPBLK, +RPB1, PBLTH) RETURN RSTPB } ■ GET TOP OF STACK

} GTTOP: SUBROUTINE MR TSTAKP,AX CALL GTCEL MR $TCELL,BSW RR BStJ, AX CALL MSKXL RN AX,iCNODE jCURRENT NODE MOVE $ T C E L L +1,$ C R ;CR MOVE TTCELL+2,TSSEQ i SEMA SEQ MOVE TTCELL+3,TLP ; LP MOVE TTCELL+4,$LP+1 MOVE TTCELL+5,$D1 ;OTHER DATA MOVE STCELL+6,$D2 MOVE fTCELL+7,$LINK ;LINK TO NE RETURN GTTOP ENDS ETUP RESULTING NEXT TASKS

KSs SUBROUTINE MR $CNODE, AX CALL GTSMA ;GET FIRST STUD JC h X,LTZ,ABRT4 ;HONE CALL GSTAT< STURIO jFETCH ENTRY LOOP MR STURD+1,AX EXITIF .AX ETZ ;EXIT NODE FLAG RM AX,TTOKEN CALL MKTSK ; MAKE A TASK MR STUJRD + 7, AX CALL GSTAT <. STURD > ;FETCH ENTRY MR STURD,AX UNTIL .AX NE TCNOBE ;NO MORE RETURN TASKS ENDS 250 ■ MhKE A TASK ENTRY } MKTSK: SUBROUTINE CALL ALCEL ;ALLOCATE CELLS JC AX,LTZ,ABRT? CALL ALCEL MOVE $TDKEH, TT CELL MOVE CELL 1,$TCELL+2 MR CELLO,AX CALL PTCEL<$TCELL> LOOP CALL CLBLK TTCEL2,CLGTH) MR STbJRD+ 2, BPK ; NEXT NODE ZR PI,BPK jREF. COUNT RM BPK,TTCEL2 MOVE STURD+3,TTCEL2+2 EXITIF 3TURD+4 LEZ jEND OF NEW ENTRIES MOVE CELL 1,+CELLH CALL ALCEL JC AX,LTZ,ABRT? MOVE CELL 1, $TCEL2+? MR CELLH,AX CALL PTCEL < $TCEL2 > MR STURD+4,XR ;GET NEXT STACK ENTRY MOVE #@,3TURD+2 MOVE #1,STURD+3 MOVE #2,STURD+4 ENDLOOP MOVE $CR,TTCEL2+1 MOVE $ L P,$TCEL2 + 3 MOVE $LP+1,$TCEL2+4 MOVE $ D1,$TCEL2 + 5 MOVE $D2,TTCEL2+6 MOVE $LINK,$TCEL2+? MR CELL1,AX CALL PTCEL<$TCEL2> MR TL I NK,AX CALL LKCHN >LINK TO CHAIN MR TQPR,XR ;ENTER IN TQ IF .XR ETZ MOVE CELLO,TQPR ELSE DOUHILE #? NEZ RR AX,XR ENDUHILE MOVE CELLO,#? ENDIF RETURN MKTSK ENDS 251 MAJOR SEMANTICS } } SEMANTIC CODE TABLE }

I.JRD SHULL ; NULL URB RCELL ;CELL AT ROOT IJRD LCELL ;CELL AT LEAR URD STOLP i STORE AT Z URD INCLP INCREMENT Z URD GPRIO >GET PRIORITY URD 3TLAB ;STORE LABEL URD NEXT ;BRANCH URD SHULL o T E? <£ —* * *TABL-1 t ENDS 252 } NULL SEMANTICS

SNULL: INCR $SSEQ RR TRP,SC

BRANCH } NEXT: SUBROUTINE MR SSSEQ, P1,XR CALL GTLBUI RM AX,SSSEQ RETURN NEXT t i STORE AT LP

STOLP: SUBROUTINE MR SSSEQ,PI,XR CALL GTLDU CALL -GVAL CALL STLP I NCR SSSEQ I NCR SSSEQ RETURN STOLP

} INCREMENT LP ,» INCLP: SUBROUTINE IF SLR GTZ I NCR SLP+1 END IF I NCR SSSEQ RETURN INCLP

ENDS NEW CELL AT ROOT

RCELL: SUBROUTINE CALL CLBLKCTSMCEL, CLGTH) MR -TSSEQ, PI, HR CALL GTLDU CALL GVAL RM AX,SSMCEL+l MOVE $CR,$SMCEL+2 CALL STCEL < TSMCEL ) RM AX, $CR RM AX,$LP MOVE 2,$LP +1 I NCR TSSEQ I NCR TSS'EQ RETURN RCELL

HEW CELL AT LEAF

LCELL:SUBROUTINE CALL CLBLp:<$SMCEL, CLGTH) MR $SSEQ,PI, XR CALL GTLDU CALL GVAL RM AX, f-SMCEL + 1 CALL GETLP RM AX, $ SMC'EL+2 CALL STCEL < TSMCEL) CALL STLP RM AX, $LP MOVE 2,£LP + 1 I NCR TSSEQ I NCR TSSEQ RETURN LCELL ENDS GET VALUE FROM LEAF POINTER

GETLP: SUBROUTINE IF $LP ET2 MR $CR, AX ELSE ABD2 $LP, $LP+1, . XR CALL GTURD RR GR1/AX ENDIF RETURN GETLP ) } STORE VALUE AT LEAF POINTER } STLP: SUBROUTINE STBEF .AX,STLPV IF TLP ETZ MOVE STLPV, TOR ELSE IF $LP+1 GE 7 CALL ALCEL < STLPT) ABB2 $LP, 7, . XR MR STLPT,GR1 CALL PTURB MOVE STLPT,$LP ZM $LP+1 END IF ABB2 $LP,TLP + l, . XR MR STLPV, GR1 CALL PTURD RR GR1,AX ENDIF RETURN STLP

STLPT: URD 0

ENDS 255 GET PRIORITY

GPRIO: SUBROUTINE MR SLR,AX CALL GLDPR ;FETCH PRIORITY SUB2 .AX,$D1 RM AX, $B1 ;UPDATE IF .AO LTZ MR SSSEQ,PI,XR CALL GTLDU RM AX, SSSEQ jTAKE BRANCH ELSE I NCR SSSEQ I NCR SSSEQ ENDIF RETURN GPRIO

CONVERT VALUE IN AX

GVAL: SUBROUTINE IF .AX LTZ MR SLR, AX ;GET LR VALUE ENDIF RETURN GVAL

ENDS 256

; STORE A LABEL

STLh B: SUBROUTINE CALL CLBLK<$SMCEL, CLGTH> MOVE $CR, fSMCEL ;LABEL NAME 2M PI,$SMCEL+1 MR $STAKP,AX ; LOOK BACK IN STK CALL GTCEL1'. $TCEL2> MR TTCEL2+CLGTH-1,AX CALL GTCEL < $TCEL2) MOVE $TCEL2+1,TSMCEL+2 ;RP MOVE TTCEL2+3,*SMCEL+3 j LP MOVE $TCEL2+4,TSMCEL+4 ;LP<2) MR XSMTAB,AX CALL ENT AB < TSMCEL ) jENTER IN S',f’MTAB •JC AX, LTZ, ABRT? ;NO ROOM RM AX,XSMTAB I NCR $3SEQ RETURN STLAB

ENDS TOKEN SEMANTICS

CODE TABLE

I.JRD TSNUL URD ADDCC I.JRD TSNUL I.JRD TSNUL I.JRD TSNUL I.JRD TSNUL URD TSNUL I.JRD TSNUL I.JRD TSNUL T S T S 2 =.-TSTBL-1 ) HULL

TSNUL: RR TRP,SC i ADD CCHhR TO STRING

ADDCC: SUBROUTINE MR XCCHh R, AX CALL MSKXL RR AX,GR1 MR I TSTRNG,XR MR TTPTR,AX CALL PTCHR I NCR TTPTR RETURN AUDCC WORKING CELL BUFFERS

CELLO: URD 0 CELL 1: URD 0 SMCLP: URD 0 SSMCEL: LOC . +10 STURD: LOC . + 10 ;MAJOR STATEUORD STSCEL: LOC . + 10 i TOKEN STATEUORD STCELL: LOC . +10 STCEL2: LOC . +10

; GENERAL PSEUDO-REGISTERS

JiSUCCF: URD 0 :• SUCCESS FLAG

} TP TRANSIENT PSE JBO-REGISTERS

$ C H 0 B E: URD 0 }CURRENT NODE SLINK: URD 0 i LINK STOKEH: URD 0 }TOKEN SSSEQ = URD 0 }SEMANTIC SEQUENCE SLR: URD 0 ;LOCAL RESULT SCR: URD 0 CUMULATIVE RESULT SLP: URD 0.. 0 jAUX. DATA SB1 : URD 0 SB2 : URD 0 SFAIL: URD 0 }FA ILURE FLAG

1 J MODEL TP BLOCK

TPBLK*. SAFLG: URD 0 ;ACTIVITY FLAG SSTAKP: URD 0 ;STACK POINTER STSTHT: URD 0 }TOKEN STATE STPTR: URD 0 ;STRING POINTER ST 1 : URD 0 /TOKEN DATA STS : URD 0 STS: URD 0 ST 4 : URD 0 SSTRNG: URD 0.. 0, 0/O }STRING BUFFER URD 0.. 0/ 0/ O

} PROCE SSOR BLOCKS

PBLK.S: LOC .+PBTOT URD -1

SSEHD-. END APPENDIX E

EXECUTION MODULE PROGRAM LISTING

This Appendix presents the listing of the Execution

Module in the simulation software system. The program is written in the SALP^ 7 structured assembly language.

2S9 TITLE "EXECUTION MODULE JUNE 23, 1977”

} HIGH-LEVEL LANGUAGE PROCESSOR ■ SYSTEM SIMULATOR

} THIS MODULE CONTAINS THE OPERAND } EVALUATORS AND EXECUTION PROCESSORS

} JIM MOONEY ) ENTRY EXEC, OEVMX } EXTRN GTCEL, GTlilRD, PTWRD, MVBLK EXTRN XRSULT, X S M T A B, ABORT,RPLST EXTRN ENTAB, CLBLK,L K TAB EXTRN GTSTR,GTCHR,MOUT, CMLIN, CMPTR EXTRN CRARR, GTAEL j OELI M=20 ;MAX OE'S CLGTH=10 jCELL LENGTH CR = 215 ESC = 23-3

} EXECUTION CODES f X A 3 G = 4 0 X S E Q = 41 X LIS T = 4 3 X£VAR=44 XRSET = 50 XRD-51 XUSET=52 XUR=53 X D C L = 5 4 XADCL=55 XADD=14 XSUB=15 XMPY=KS X DIV = 17 X L T=2 0 X E Q = 21 XGT=22 X C 0 N D = 6 0 XG0T0=61

ENDS 261

Mh IN EXECUTION SEQUENCE

EXEC: SUBROUTINE MOVE = S T A C K > +STAKP P1DEF OEVCT P1DEF OEVMX MR ':RSULT, AX IF .AX GTZ CALL OPEVAL ENDIF RETURN EXEC

ENDS } EVALUATE AH OPERAND ) OPEVAL: RM TRP,$OETRP ; SAVE TRAP RM h X.$ECELP >SAVE CELL PTR CALL GTCEL v TECELL > ;CURRENT CELL MOVE $ECELL+1,DECODE ;EXEC. CODE MR $ECELP, XR ZR P1..GR1 CALL PTWRD ;FLAG CELL MOVE TECELP, £OPIX j WORKING PTR MOVE 2, f 0P IX +1 IF fECODE EQ =XCONB j IF CONDITIONAL CALL EVCON ; EVAL CONDITIONAL ELSE CALL EVNOR i NORMAL EVALUATION ENDIF • VALUE IN AX MR $ECELP,XR RR AX,GR1 CALL PTLIRL ; INSERT VALUE RR G R1,AX JUD iOETRP ;RETURN j ENDS f 263 } CONDITIONAL EVALUATION

EVCON: RM TRP,$02TRP MR $ECELL+2, AX CALL EVALU ;CH < & EVAL MRI 37777, AY AND RR AO,AX ADD IF .AX NEZ ;IF "TRUE" MR TECELL+3, AX ;GET TRUE OP ELSE jIF "FALSE" MR $ECELL+4,AX ;GET FALSE OP ENDIF IF .AX ETZ j IF NULL OP MRI 40080,AX ELSE CALL EVALU CHECK & EVAL IF .AX ETZ CALL EVCAN i PROCESS "CANCEL" ENDIF RETURN RESULT ENDIF •JUD $02TRP ENDS • 264

} NORMAL EVALUATION

EVNOR: RM TRP,$02TRP LOOP CALL GTOP ;GET OPERAND ZR PI, GR1 EXITIF .AX ETZ ;IF HULL CALL EVALU J CHECK $, EVALUATE STDEF .AX,EVNT1 IF .AX ETZ CALL EVCAN ;HANDLE CANCEL ELSE CALL NX OP ZR PI, AX ENDIF ZR GR1 : i t i f .AX ETZ IDLOOP . GR1 NEZ ;IF NOT CANCELLED MR EV NT 1,GR1 i SAVE RESULT IF TECODE EQ =XSEQ ORIF DECODE EQ = XLIST RR GR1,AX ;RETURN LAST RESULT ELSE MR fECELP,AX CALL GTCEL(TEXCEL) ;FOR EP'S MR TECODE,AX CALL RPLST-CEXLIS) ; LOOKUP IF ZM $QPIX+1 END IF MR TOPIX+1,XR MR ttiECELL,AX RETURN GTOP

} INDEX TO NEXT OPERAND

HXOP: SUBROUTINE I NCR $GPIX+1 RETURN HXOP

ENDS i 266 i CHECK, EVALUATE, WAIT

EVALW: RM TRP, $EOJTRP IF .AX GTZ ;IF POINTER STDEF .h X,EWT1 CALL OPUSH i PUSH OE DATA HR EUTl.AX CALL OPEVAL ;RECURSIVE CALL RH AX, EbJTl CALL OPOP HR EWT1,AX END IF ■JUD $EWTRP i PROCESS A CANCELLATION j EVCANs SUBROUTINE IF TECELP EQ JjBRNCH HOVE BRHCH + 1, TOPIX HOVE BRNCH + 2, SOPIX+l HR TOP IN,AX CALL GTCEL($ECELL> ZM JJBRNCH ZM B R N C H + 1 ZM V.BRNCH + 2 ZR PI, AX ELSE ZR AX END IF RETURN EVCAN

ENDS 267 : PUSH OE DATA

□PUSH: SUBROUTINE IF OEVCT GE =OELIM MRI ll,GRS CALL ABORT ;TOO MANY ELSE MOVE STAKP;0PSHX CALL MVBLK < ODATA;*@PSHX;ODATL > ABD2 STAKP;=ODATL;STAKP I NCR OEVCT IF OEVCT GT OEVMX MOVE OEVCT,OEVMX END IF END IF RETURN OPUSH J } POP OE DATA } OPOP; SUBROUTINE IF STAKP LE =STACK MRI 12,GR6 JU ABORT ELSE 3IJB2 STAKP; =OBATL; STAKP MOVE STh KP;@POPX CALL MVBLK C+0POPX; ODATA;ODATL) DECR OEVCT END IF RETURN OPOP

ENDS 268 : EXECUTION TRANSFER L

EXLIS: PKB 1, XASG PKB 2, SiEVAR PKB 3, SiRSET PKB 4, X.RD PKB 5, JiUSET PKB 6, >:wr PKB 7, ' 4 B C L PKB 10,XADD PKB 11, ’:SUB PKB 12, ":MPY PKB 13, JiDI V PKB 14, X.LT PKB 15, *:EQ PKB lb, K ij T PKB 17, V.GOTO PKB 20, JsADCL WRD 0

; EXECUTION P R Q C £ b S 0 R

EXTAB: WRD *NULL WRD $h 3G WRD TEVAR WRD fHULL WRD *RD WRD TNULL WRD $WR WRD $BCL WRD $ADD WRD *SUB WRD $MPY WRD $BIV WRD $LT WRD *EQ WRD $GT WRD $GOT0 WRD *h BCL EXTS2=. -EXTAB-1

ENDS 269 • EXECUTION PROCESSORS

* £h SG: SUBROUTINE MR $EXCEL+3,AX CALL GTVAL RM GR1,0P2 MR *EXCEL+2,AX CALL GTVAL IF . AX EQ 2 ;IF VARIABLE MR 0P2,GR1 CALL PTIORD END IF MR 0P2, AX CALL STVAL RETURN $ASG

■ DECLARE

$DCL.- SUBROUTINE MR SEXCEL+2,AX RM AX,OP1 IF .AX LTZ ;IF STRING CALL CLBLK($VCELL,CLGTH) MOVE OF'l, $VCELL MOVE 2,£VCELL+1 MR XSMTAB,AX ADD TO SYMTAB CALL ENTAB c! $VCELL > RM AX,XSMTAB ENDIF MR DPI,AX CALL STVAL RETURN $DCL

ENDS } ARRAY DECLARE

$ADCL: SUBROUTINE MR $EXCEL+3> AX ;GET LENGTH CALL GTVAL RM GR1/0P2 MR $EXCEL+2, AX ; GET NAME RM AN, OP1 IF .AX LTZ j IF STRING CALL C L B L K $VCELL, CLGTH) J BUILD ST ENTRY MOVE 0 P1,$ V C E L L MOVE 4,$VCELL +1 jARRAY CODE MR OPS, AX CALL CRh RR ;CREATE AN ARRAY RM XR,TVCELL+2 MR XSMTAB,AX CALL ENT AB<$VCELL> RM AX,XSMTAB END IF MR 0 P 1, AX CALL STVAL RETURN $ADCL

■ EVALUATE AN ARRAY ELEMENT

SEVh R: SUBROUTINE MR $EXCEL+3,AX , CALL GTVAL >GET INDEX RM GR1,0P2 MR $EXCEL+2,AX CALL GTVAL IF AX EQ 4 RR GR1,XR MR 0P2, AX CALL GTAEL GET ELEMENT IF .XR GTZ ; IF VALID RR XR, AX CALL SET 15 ;POINTER ELSE MRI 4O000,AX END IF ELSE MRI 40000,AX END IF RETURN $EVAR 271 i READ

$RB: SUBROUTINE CALL INDEC ;GET A DECIMAL INPUT RM AX,OPS MR TEXCEL+2,AX CALL GTVAL IF .AX EQ 2 MR OPS,GR1 CALL PTURD END IF MR OPS,AX CALL STVAL RETURN $RD

; WRITE ti SUR: SUBROUTINE MR TEXCEL+2, AX CALL GTVAL RM G R1,0 P1 IF .AX GE 2 RR GR1,AX CALL QUBEC END IF MR OP1,AX CALL STVAL RETURN $ I..J R

ENDS 272 ; GO TO j £GOTO: SUBROUTINE MR *EXCEL+2, AX RM AX, OP1

R R h X .< G R 1 MR XSHTAB,AX ;LOOKUP LABEL CALL LKTAB IF .AX GTZ CALL GTCEL<$VCELL> IF TVCELL+l EQ 1 MOVE $ C E L L + 2 .■ XBRNCH MOVE TVCELL+3,XBRNCH+1 MOVE TVCELL+4,XBRHCH+2 END IF END IF ZR AX CANCEL RETURN TGOTO

} HULL t fNULL: SUBROUTINE MRI 40O0O,AX RETURN $NULL

ENDS 273 i ADD t £ABBs SUBROUTINE CALL GETS ADDS OP1,OPS, .AX CALL STVAL RETURN $ABD f } SUBTRACT

$SUB: SUBROUTINE CALL GETS SUBS OP1, OPS, .AX CALL STVAL RETURN $ 3 U 6

; MULTIPLY

*MPY: SUBROUTINE CALL GETS MR OP 1,AX MR OPS, XR CALL MPY CALL STVAL RETURN $MPY f ■ DIVIDE

$DIV: SUBROUTINE CALL GET2 MR DPI,AX MR OPS,XR ZR AY CALL DIV CALL STVAL RETURN $BIV i ENDS j 274 i LESS THAN ) $LT: SUBROUTINE CALL GET2 IF OP1 LT 0P2 ZR PI.. AX ELSE ZR AX END IF CALL STVAL RETURN $LT f EQUAL TO } TEQ: SUBROUTINE CALL GET2 IF OP1 EQ OP2 ZR PI, AX ELSE ZR AX END IF CALL STVAL RETURN $EQ

GREATER THAN

$GT: SUBROUTINE CALL GET2 IF OP1 GT 0P2 ZR PI, AX ELSE ZR AX END IF CALL STVAL RETURN $GT / ENDS

{ ; GET 2 OPERANDS

GET2: SUBROUTINE MR $EXCEL+2, AX CALL GTVAL RM GR 1, OP1 MR $EXCEL+3,AX CALL GTVAL RM GR1,0P2 RETURN GET2

SET BIT 15

SET15: RS AX, LI STL RS AX, R1 RR TRP,SC

SET VALUE FLAGS

RS AX, LI RS AX, LI STL RS AX, R1 CLL RS AX, R1 RR TRP,SC

ENDS f 276 i GET A VALUE

GTVAL: SUBROUTINE STDEF .AX;GVLX1 IF AX GT2 i IF POINTER RR AX; XR CALL GTbJRD ;GET VALUE RR GR1;GR2 ;SAVE ZR GR1 ;CANCEL CALL PTbJRD RS GR2;LI IF MOVE GVLN1,TVCELL j SEE IF CONSTANT CALL CONST IF .AN GEZ ;IF OK RM AN,TVCELL+2 ;ENTER IN SYMTAB MOVE 3,$VCELL+1 MR NSMTAB,AN CALL ENTAB < TVCELL "> RM AN,NSMTAB MRI 3, AN MR TVCELL+2,GR1 ZRC NR END IF END IF END IF RETURN GTVAL ENDS } EVALUATE h CONSTANT ,» CONST: SUBROUTINE RS AX.. LI CLL RS AX, Rl CALL GTSTR<*ESTRG> ZMDEF CVAL ZMDEF CIHDX ci R G R 5 LOOP i BUILD DECIMAL NUMBER MRI TESTRG,XR MR CINBX,AX CALL GTCHR RR GRi, AX IF .AX LE ' 9 ANDIF .AX GE '0 RR AO,GRI MR CVAL,AX CLL RR AX, LI, AY RS AY,LI RR AO, LI, AX RR GRI, AY RM AO,CVAL RS G R 5,PI ELSEIF .AX ETZ EXITLOOP ;END OF STRING ELSE ZR GR5 ;INVALID CHAR EXITLOOP END IF I NCR CINBX ENBLOOP IF .GR5 GTZ MR CVAL,AX ELSE ZRC AX END IF RETURN CONST ENDS 279 } INPUT A DECIMAL NUMBER

INDEC: SUBROUTINE CALL MOUT

ENDS 280

:■ OUTPUT ft DECIMAL HUMBER

OUDEC: SUBROUTINE RR AX,GRI MOVE =DNUM-1,+OBCP MRI DECTB,XR RR GRI,AY DOWHILE #0 NEZ RS HR,PI MRI •■0, GRI DOWHILE .AO GEZ RS GRI,PI RR AO,AY ENDUHILE RMD GRI, 0DCP EHDWHILE CALL MOUTCOUMSG) RETURN OUDEC

OUMSG: ASC ^OUTPUT: / DMUM: WRD 0, 0, 0, 0, 0 WRD CR } DECTB: WRD -"D16000 WRD -"D1000 WRD -"Bl00 WRD - 11D1 0 WRD -1 WRD 0 > ENDS ; 281 } SIGHED MULTIPLY } MPY: SUBROUTINE ZR GR2 IF .AH LTZ RSC AH,PI RS GR2,PI END IF IF .HR LTZ RSC HR,PI RS GR2,PI END IF ZM3 MRI 1000O0,GRI ZR AY LOOP RS HR, Rl SFM HOT LNK RR HSR,R1,0 ,AOV TO LNK RR h O, AY RS AY, Rl RS GRI,R1 UNTIL

; * 282 } SIGNED DIVIDE j DIV: SUBROUTINE ZHS IF .AY LTZ RS GR2, Pi RSC AX,PI RSC AY IF < B O V RS AY, PI END IF END IF IF .XR LTZ RS GR2, PI RSC X R,PI END IF ZMS RR AX,GRI RRC XR, PI, AX ZR P 1,X R LOOP RS GRI,LI RS AY, LI SFA NOT AOV RR AQ, AY STL RS XR, LI UNTIL < LNK RRC AX, PI, GRI RS GR2, R1 IF < LNK RRC XR, PI, AX ELSE RR XR, AX END IF RR TRP, XR RETURN DIV ENDS REGISTERS AMD CELLS } $VCELL: LOC . +10 iEXCEL: LOC . +10 iESTRG: LOC . +10 0 P 1 : WRD 0 OP2: WRD 0 SiBRNCH: WRD 0 .■ 0 > 0

OE WORKING REGISTERS

ODATA=. $OETRP WRD 0 j TRANSFER SAVES S02TRP WRD 0 iEWTRP WRD 0 $ECELP WRD 0 BASE CELL PTR $OPIX: WRD 0.- 0 }OPERAND INDEX DECODE •. WRD 0 ;EXECUTION CODE $ECELL: LOC . +10 ;WORKING CELL ODATA

STACK j STKSZ=GDATL*QELIM

STACK: LOC .+3TKSZ

fTEND: END APPENDIX F

FORTRAN IV SYNTAX GRAPHS

PROG: MAINP SUBP

MAINP: / PBODY a

SUBR FUNC SUBP: :— b d a t a _____ ^

SUBR: 1* SUBRST 3 eos 4 PBODY b------;— y>— ------

FUNC: ,x LABEL V 2 FUNCST J eos .4 PBODY ^ ------:----- XS>

APS

BDATA: K LABEL V BDATST * eos LABI S BDCLST

DATST ENDST

eos y LABEL ENDST DATST

284 BDCLST: TYPST/DIMST/COMST/EQVST

eos

SPECST r. ^ SPECSP PBODY: label\j2 formst eos •ABELNj 5-FORMST

■—-v^eos SFNC^ SFNCST DATST LABEL DATST ,4 eos LABEL FORMST

■—- ^ O S EXECSV DATST LABEL EXECST eos LABEL. FORMST 7 5 £NDST eos

SPECST: t y p s t /d i m s t /c o m s t /e o v s t /e x t s t

EXECST: a u x s t /t e r m s t /c o n t s t /r t r n s t /w r i t s t /r e a d s t / d o s t /c a l l s t /l g x f s t /a r i f s t /g o t o s t /l a s g s t /a s g s t

'INTEGER' •REAL' TYPST: 'DQUBLEPRECISION' •LOGICAL* •COMPLEX' ADCL

DATST: 'DATA' 2. VREF 3 / 4 DVAL s / XT? 236 SFNCST: -X>

'INTEGER* •REAL* FUNCST: •DOUBLEPRECISION' •LOGICAL' I 'COMPLEX* 2 'FUNCTION' 3 n *( *n 7 •------jo------— ----- jo----- JO----

SUBRST: t 'SUBROUTINE' 2 n ^ ( 4 n s ) ^ ° *— ------> ° —

CALLST:

DOST: / 'DO' 2 i 3 4

RDST:

DIMST: I 'DIMENSION' ADCL^X 3

COMST: / 'COMMON' ^ 7 n ,^7 *CV/ /

EQVST: I 'EQUIVALENCE' J” ( 3 VREF 5" VREF ) 7 ------J0-- ■ ^ 287 EXTST: I 'EXTERNAL* Z n 3

ASGST: t VREF * = 3 EXPR -A

LASGST: I 'ASSIGN' a i J 'TO' 4 n JE .------Jc----Jo------p --- X§>

GOTOST: I 'GOTO' 2/fi 3- 4 ( 5 i ^

-»°* ’ >>

ARIFST:

LGIFST: < .1 LEXP% 4 > jg S T M A y g f

STMA: a s g s t /l a s g s t /g o t o s t /a r i f s t /c a l l s t /r d s t / w r s t /r t r n s t /c o n t s t /t e r m s t /a u x s t

FORMST:

RTRNST: I -RETURN- . j

OONTST: I 'CONTINUE' ^

• STOP• P TERMST: -p a u s e - AUXST: • REWINDdpia/tkjhi 288 'BACKSPACE' I ' ENDFILE' 2 i 3 * )o j ©

BDAT ST: 'BLOCKDATA’ ^ ------i ©

ENDST: I 'END* -*§>

EXPR: a e x p r /l e x p r

LEXPR: NOT LPRIM AND. ' OR. '

AEXPR: [2- APRIM i ** • 4 APRIM

LPRIM: / 2 LEXPR AEXPRAEXPR RE LOP VREF FREF • .TRUE. ' '.FALSE.

RELOP: •.LT.'/’.LE.'/'.EQ.'/*.NE.•/'.GT.’/’.GE.' 289

APRIM: AEXPR

CONST VREF FREF

X IOLIST: ->©— p -- jO

VREF

VREF

X ADCL:

X

VREF: S SUBEX

SUBEX:

CVAR:

DVAL: CONST

FREF: 290 CONST:

LABEL: ■*§>

n - variable name i - decimal integer r - real or double-precision constant h - Hollerith string o - octal integer f - Format string eos - end-of-statement mark

/ APPENDIX G ALGOL 60 SYNTAX GRAPHS

PROG: BLOCK 3 ■*§> LABEL

BLOCK: 'end'

'comment c2 DECL

BLOCK STM: BASIC CON D SI FORST -*2> LABEL

BASIC: ASG/GOTO/PRCST

FORST se ^BASIC CONDST: then' ,BLOCK 'else STM'

LABEL

FORST: E VAR 3 5 if STM

FLE: 'while' BEXP

291 292 AEXP ASG: I VAR 2

GOTO: J ’g°' ) j DESE* ^

SDESEX

DESEX: ' i f BEXP 4 ’then* * SDESEX * ' e l s e ' J — ^ ----- )°--- —— ------X>-

? DESEX ^.3

SDESEX: L ^ n ..4 C s AEXP

FDESG: EXPR

PRCST: FDESG

DECL: t d c l /a d c l /s d c l /p d c l

X 'real' 'integer' TDCL: ” 'own' 'Boolean' *

X 'real' 'integer' ADCL: 'own'V 'Boolean'.? ' array\4 n .? C*AEXRJ7 * JlAEXpO ^ ' S r ------— 'Q- 7 ° ----- y O ■ y x Q J 293

SDCL: • 'switch' ,2 n ^ + DESEX ^ 3 • I ...... ------— ^ o .— — \vl

'real' ' integer• PDCL: * 'Boolean' 2 'procedure'^3 PHEAD ^ STM \

PHEAD: I n 2 ( 3 FPL 4 ) IT ; I* 'value ' 7 n * ; 1 SPEC '^n " ; J* ~ - V ---- y> — £>

FP'.r

'real' 'integer' 'array' SPEC: I 'Boolean' 2 'procedure' — ------'string' •label' 'switch• 'array' 'procedure *

EXPR: AEXP/BEXP/DESEX

AEXP:

'else' 294

SBEXP *§> BEXP: ? SBEXP ^ J o

•else'

X

SAEXP: 2 APRIM

i

APRIM: FDESG VAR _

SBEXP: Z BPRIM

'true' false

BPRIM: I SAEXP RELOP .3 SAEX

BEXP FDESG VAR

RELOP:

VAR: 295 n LABEL: I i 2 • *§)

n variable name i - integer r real constant s - character string

1 - letter string

ci - comment string (no comment string (no •;', 'end *, 'else ') C 2 - 296

REFERENCES

1. J. P. Anderson, "A Computer for Direct Execution of Algorithmic Languages," Proceedings of the Eastern Joint Computer Conference^ Vol. 20, pi 184, 1961

2. W. Lonergan and P, King, "Design of the B5000 System," Datamation, Vol, 7, p, 28, 1961

3. A. P. Mullery, R» F. Schauer, and R. Rice, "ADAM - A problem oriented symbol processor," Proceedings of the Spring Joint Computer Conference, Vol, 22, p. 367, 1963

4. D. Hodges, "IPL-VC, a Computer System having the IPL-V Instruction Set," ANL-6 8 8 8 , Argonne Natl. Lab, Argonne, 111, 1964

5. A. J. Melbourne and J, M. Pugmire, "A Small Computer for the Direct Processing of FORTRAN Statements," Computer Journal, Vol, 8 , No. 1, p. 24, 1965

6 . T. R. Bashkow, A. Sasson, and A. Kronfeld, "System Design of a FORTRAN Machine," IEEE Trans, on Elec. Computers, Vol, EC-16, p. 485, 1967

7. H. Weber, "A Microprogrammed Implementation of EULER on the IBM System/360 Model 30," Communications of the ACM. Vol, 10, No, 9, p. 549, 1967

8 . N. Wirth and H. Weber, "EULER: a generalization of ALGOL, and its Formal Definition," Communications of the ACM. Vol. 9, No. 1, p. 13, 1966

9. J. K. Illiffe, Basic Machine Principles, American Elsevier, New York, 1968

10. L. N. McMahan and E. A. Feustal, "Implementation of a Tagged Architecture for Block Structured Langua- ges," Proceedings of a Symposium on High-Level Language Computer Architecture. Univ„ of Maryland, p. 91 , 1973 297 11. M. Sugimoto, "PL/l Reducer and Direct Processor," Proceedings of the ACM National Conf., p. 519, 1969

12. C. McFarland, "A language-oriented computer design," Proceedings of the Fall Joint Computer Conference, Vol. 37, p. 629, 1970

13. K. J. Thurber and J. W. Myrna, "System Design of a Cellular APL Computer," IEEE Trans, on Computers, Vol. C-19, p. 291, 1970

14. R. Rice and W. R. Smith, "SYMBOL - A major departure from classic software dominated von Neumann computing systems," Proceedings of the Spring Joint Computer Conference, Vol. 38, p. 575, 1971

15. Assoc, for Computing Machinery and IEEE, Proceedings of a Symposium on High-Level Language Computer Architecture, Univ. of Maryland, 1973

16. Y. Chu, High Level Language Computer Architecture, Academic Press, New York, 1975

17. W. T. Wilner, "Design of the Burroughs B1700," Proceedings of the Fall Joint Computer Conference, Vol. 41, p. 489, 1972

18. M. D. Shapiro, "A SNOBOL Machine: A Higher Level Language Processor in a conventional hardware framework," Proceedings of IEEE CQMPCQN. p. 41, 1972

19. J. W. Anderberg and C. L. Smith, "High-level language translation in SYMBOL-2R," Proceedings of a Symposium on High-Level Language Computer Architecture, Univ. of Maryland, p. 11, 1973

20 . R. G. Herriot, "GLOSS: A High Level Language Machine," Proceedings of a Symposium on High-Level Language Computer Architecture, Univ. of Maryland, p. 81, 1973

21 . H. M. Bloom, "The Direct High-Level Language Processor," Proceedings of a Symposium on High-Level Language Computer Architecture, Univ. of Maryland, 1973

22. J. S. Miller and W. H. Vandever, "Instruction Archi­ tecture of an Aerospace Multiprocessor," Proceedings of a Symposium on High-Level Language Computer Architecture, Univ. of Maryland, p. 52, 1973 298

23. W. C. Nielsen, "Design of an Aerospace Computer," Proceedings of a Symposium on High-Level Language Computer Architecture, Univ. of Maryland, p. 34, 1973

24. J. B. Johnston, "The Contour Model of Block Structure Processes," Proceedings of a Symposium on Data Structures in Programming Languages, ACM SIGPLAN Notices, Vol. 6 , p. 55, 1971

25. B. W. Wade and V. B. Schneider, "A General Purpose High-level Language Machine for Minicomputers," Proceedings of the ACM Interface meeting on Programming Languages and Microprogramming, p. 169, 1973

26. R. J. Chevance, "A COBOL machine," Proceedings of the ACM Interface meeting on Programming Languages and Microprogramming, p. 139, 1973

27. A. Hassitt, J. W. Lageschulte, and L. E. Lyon, "Implementation of a high-level language machine," Communications of the ACM, Vol. 16, p. 199, 1973

28. P. Sylvain and M. Vineberg, "The Design and Evaluation of the Array Machine: a High level language processor," Proceedings of the 2nd Annual Symposium on Computer Architecture, Univ. of Houston, p. 119, 1975

29. S. Fournier, The Architecture of a Grammar-Programmable High-Level Language Machine, Ph.D. Dissertation, The Ohio State University, Columbus, Ohio, 1975

30. C. R. Carlson, "A Survey of High-Level Language Computer Architecture," in High Level Language Computer Architecture, Y. Chu(ed), Academic Press, New York,1975

31. T. A. Laliotis, "Architecture of the SYMBOL Computer System," in High Level Language Computer Architecture. Y. Chu(ed), Academic Press, New York, 1975

32. W. M. McKeeman, "Language Directed Computer Design," Proceedings of the Fall Joint Computer Conference, Vol. 31, p. 413, 1967

33. H. M. Bloom, "Conceptual Design of a Direct High- Level Language Processor," in High Level Language Computer Architecture, Y. Chu(ed), Academic Press, New York, 1975 299 34. E. C. Yowell, "A Mechanical Approach to Automatic Coding," Journal of the Franklin Institute, Monograph No. 3, p. 103, Philadelphia, Pa., 1957

35. J. E. Sammet, Programming Languages: History and Fundamentals, Prentice Hall, Englewood Cliffs,NJ, 1969

36. P. Naur (ed), "Revised Report on the Algorithmic Language ALGOL 60," Communications of the ACM, Vol. 6 , No. 1, p. 1, 1963

37. U. S. Dept, of Defense, "COBOL-1961 Extended: Extended Specifications for a Common Business Oriented Language," U 0 S. Govt. Printing Office, Washington, D.C., 1961

38. IBM Corp., "IBM System/360 Operating System; PL/l Language Specifications," C28-6571-0, Data Processing Division, White Plains, N.Y., 1965

39. J. McCarthy et al., LISP 1.5 Programmer's Manual. 2nd ed., MIT Press, Cambridge, Mass., 1965

40. R. E. Griswold, J. F. Poage, and I. P. Polonsky, The SNOBQL4 Programming Language, 2nd ed., Prentice Hall, Englewood Cliffs, N.J., 1972

41 . A. van Wijngaarden, B. J. Mailloux, J. E. L. Peck, and C. H. A. Koster, "Report on the Algorithmic Language ALGOL 6 8 ," Numerische Mathematik, Vol. 14, p. 79, Springer Verlag, Berlin, 1969

42. R. E. Griswold, The Macro Implementation of SNQBOL4, W. H. Freeman, San Francisco, 1972 43. T. W. Pratt, Programming Languages: Design and Imple­ mentation , Prentice Hall, Englewood Cliffs, N. J., 1975

44. N. Chomsky, "On Certain Formal Properties of Grammars," Information and Control, Vol. 2, p. 137, 1959

45. J. E. Hopcroft and J. D. Ullman, Formal Languages and their Relation to Automata. Addison Wesley, Reading, Mass., 1969

46. P. Lucas and K. Walk, "On the Formal Description of PL/l," Annual Review in Automatic Programming. Vol. 6 , Part 3, p. 105, Pergamon Press, 1969 300 47. J. A. N. Lee, Computer Semantics, Van Nostrand Reinhold Co., New York, 1972

48. R. Rustin (ed), Formal Semantics of Programming Languages, Prentice Hall, Englewood Cliffs, N.J., 1972

49. J. T. Tou and P. Wegner (eds), Proceedings of a Symposium on Data Structures in Programming Languages, ACM SIGPLAN Notices, Vol. 6 , No. 2, 1971

50. J. von Neumann, Collected Works. Vol. V, A. H. Taub, (ed), Pergamon Press, Oxford, England, 1963

51. M. V. Wilkes, "The Best way to design an automatic Calculation Machine," Computer Inaugural Conference Proceeding, Manchester Univ., p. 16, 1951

52. G. H. Barnes, R. M. Brown, M. Kato, D. J. Kuck, D. L. Slotnick, and R. A. Stokes, "The ILLIAC IV Computer," IEEE Trans, on Computers, Vol. C-47, p. 746, 1968

53. A. Slade and H. O. McMahon, "A Cryotron Catalog Memory System," Proceedings of the Eastern Joint Computer Conference, Vol. 10, p. 115, 1956

54. K. J. Thurber and L. D. Wald, "Associative and Parallel Processors," Computing Surveys, Vol. 7, p. 215, 1975

55. T. Feng (ed), Special Issue: Parallel Processors and Processing, Computing Surveys, Vol. 9, 1977

56. G. A. Anderson and R. Y. Kain, "A Content-Addressed Memory Design for Data Base Applications," Proceedings of the International Conference on Parallel Processing, IEEE, p. 191, 1976

57. J. A. Rudolph, "A Production Implementation of an Associative Array Processor: STARAN," Proceedings of the Fall Joint Computer Conference, Vol. 41, Part 1, p. 229, 1972

58. B. A. Crane, M. J. Gilmartin, J. H. Huttenhoff, P. T. Rux, and R. R. Shively, "PEPE Computer Architec­ ture," Proceedings of IEEE COMPCON, p. 57, 1972 301

59. C. V. Ramamoorthy and H. F. Li, "Pipeline Architecture," Computing Surveys, Vol. 9, p. 61, 1977

60. Go S. Tjaden and M. J. Flynn, "Detection and Parallel Execution of Independent Instructions," IEEE Trans, on Computers, Vol. C-19, p. 8 89 . 1970

61 . D. J. Kuck, Y. Muraoka, and S. -C. Chen, "On the Number of Operations Simultaneously Executable in FORTRAN-Like Programs and Their Resulting Speedup," IEEE Trans, on Computers, Vol. C-21, p.1 2 9 3 . 1972

62. L. Lamport, "The Parallel Execution of DO Loops," Communications of the ACM, Vol. 17, p. 83, 1974

63. H. S. Stone, "One-Pass Compilation of Arithmetic Expressions for a Parallel Processor," Communications of the ACM, Vol. 10, p. 220, 1967

64. H. H. Love and D. A. Savitt, "An Iterative-Cell Processor for the ASP Language," in Associative Information Techniques, E. L. Jacks (ed), p. 147, American Elsevier, New York, 1971

65. D. J. Kuck, "A Survey of Parallel Machine Organization and Programming," Computing Surveys, Vol. 9, p. 29, 1977 66. GRI Computer Corp., GRI-99 System Reference Manual, Newton, Mass., 1973

67. J. Mooney, "SALP Reference Manual," unpublished, Dymo Graphic Systems, Wilmington, Mass., 1976 68. American National Standards Institute, "FORTRAN," X3.9-1966, New York, 1 9 6 6

69. S. A. Griebach, "A new normal form theorem for context- free phrase structure grammars," Journal of the ACM. Vol. 12, No. 1, p. 42, 1965

70. L. S. Haynes, "The Architecture of an ALGOL 60 Computer Implemented with distributed processors," Proceedings of the 4th Annual Symposium on Computer Architecture, p. 95, 1977